Pipeline Report — May 8, 2026
Production healthy — revision coach-dan-video-00011-s7r

Coach Dan Video
Pipeline Health & Fix Report

Two production deploys today closed a 2.5-month-old reliability gap that was silently dropping ~50% of user video uploads. This report covers what was happening, what changed, and what users are actually sending to Coach Dan.

49%
Success Rate (YTD baseline)
Jan–May 2026, n=285
100%
Success Rate Post-Fix
since 03:51 UTC, n=2 (early)
285
User Video Uploads
Jan 1 – May 8, 2026
19.3s
Median Video Length
P95 = 111s, max = 146s
13
Languages Supported
~25-30% non-English
Today's Fix

Before vs After

Two deploys today: pose-detection timeout fix at 03:51 UTC, then 120s input truncation at 06:56 UTC. Together they remove the architectural failure modes that capped success at ~50%.

Before (Feb 24 — May 8)
Success rate (YTD)49%
April success rate41%
Pose timeout300s
Pose-failure handlingcrash
Video length capnone
Long-video success~0%
After (May 8 onward)
Success rate (today)100% (n=2)
Errors since fix0
Pose timeout480s
Pose-failure handlinggraceful
Video length cap120s
Worst-case pipeline~840s
~90%+
Projected success rate after the fix. Combined effects: graceful pose fallback recovers the 30% of failures that crashed mid-pipeline; 120s input truncation brings the previously-impossible 5% of long-video uploads from 0% to ~95% success; pipeline now fits Cloud Run's 900s budget for any input. Validation due May 15.
Failure Pattern

Monthly Success Rate (Pre-Fix)

The video pipeline was deployed late February and degraded over time as load grew. April was the worst month at 41% success, despite double the upload volume of February.

Month Uploads Success Partial Failures Silent Failures Success Rate
2026-01110290%
2026-029152241557%
2026-03492518651%
2026-0411246452141%
2026-05 (partial)22183182%
YTD Total285141925249%

January starts at 0% because the pipeline was not fully deployed until February. May's 82% is a small partial-month sample distorted by the pre-fix early-month failures.

Root Cause

Where Videos Were Failing

Cloud Run pipeline architecture, with the failure point highlighted. Video Intelligence pose detection had a hardcoded 300s timeout — for any video longer than ~1 minute, it routinely exceeded that and crashed the entire pipeline, even when Step 1 (Gemini analysis) had already succeeded.

📥
Download
~10s
✂️
Truncate <120s
NEW
🎬
Step 1: Gemini
~30-60s
🦴
Step 2: Pose
480s timeout
🎙️
Step 3: TTS
~60s
🎞️
Step 5: Compose
~120s
📤
Upload
~30s

The fix (4 minimal edits)

# step2_pose_detection.py — bumped timeout in 2 places - result = operation.result(timeout=300) + result = operation.result(timeout=480) # main.py — wrapped pose call so failure no longer crashes the pipeline + try: + step2_future.result() + except Exception as e: + logger.warning(f"Pose detection failed, continuing without skeleton overlay: {e}") + if pose_data_path.exists(): pose_data_path.unlink() # main.py — truncate inputs >120s to fit pipeline budget + if original_duration_s > MAX_VIDEO_DURATION_S: + _truncate_video(video_path, truncated_path, MAX_VIDEO_DURATION_S) + video_path = truncated_path

The truncation uses ffmpeg stream-copy (-c copy -t 120), which clips without re-encoding — completes in ~1 second regardless of input length. Coach Dan still receives, analyzes, and replies; long videos are simply trimmed to their first 2 minutes server-side. No mobile app changes required.

Upload Patterns

How Long Are User Videos?

Distribution of 83 video durations from Cloud Run logs (last 90 days). The truncation threshold was sized off this data — only ~5% of uploads exceed 120s, so the cap is generous for the long tail without affecting the typical user.

0–15s
37 (44.6%)
15–30s
21 (25.3%)
30–60s
13 (15.7%)
60–90s
7 (8.4%)
90–120s
1 (1.2%)
120–180s
4 (4.8%)
3.0s
Min
19.3s
Median
30.0s
Mean
73.6s
P90
111.1s
P95
145.8s
Max
Content Analysis

What Users Are Sending Coach Dan

Analysis of 254 real-user uploads (excluding dev account). Multi-label categorization based on Coach Dan's caption text. Shooting form dominates by a wide margin — the product is being used as a shot doctor, not a general game-IQ coach.

Shooting form
182 (72%)
Dribbling
144 (57%)
Jump shot
119 (47%)
Footwork
51 (20%)
Layup / finish
49 (19%)
Conditioning
36 (14%)
Game footage
36 (14%)
Non-basketball ⚠️
28 (11%)
3-pointer
25 (10%)
Free throw
20 (8%)
11%
of all uploads were non-basketball content (animations, off-topic clips, 0-frame captures) that produced nonsensical analyses. ~28 wasted pipeline runs in 4 months. Worth adding an upfront content classifier as a separate quality gate.
Coaching Themes

What Coach Dan Keeps Teaching

Frequency of each coaching theme across analyzed uploads (n=225). Two themes appear in >70% of all responses — likely a system-prompt artifact worth auditing.

Balance / base
84%
Explosiveness
73%
Follow-through
71%
Consistency
71%
Elbow alignment
54%
Knee bend
53%
Body posture
50%
Arc / trajectory
40%
Head / eyes up
34%
Reach

Languages

Confirmed non-English coaching responses. ~25-30% of uploads come from non-English-speaking users; pipeline correctly responds in the user's language.

🇫🇷
French
~18% of uploads
🇷🇺
Russian
~7% of uploads
🇩🇪
German
~6% of uploads
🇪🇸
Spanish
spot-checks
🇮🇹
Italian
spot-checks
🇹🇷
Turkish
spot-checks
🇵🇹
Portuguese
spot-checks
🇮🇩
Indonesian
spot-checks
🇭🇷
Croatian
spot-checks
🇵🇱
Polish
supported
🇨🇳
Chinese
supported
🇧🇷
Portuguese (BR)
supported
🇺🇸
English
~70-75% of uploads
Today's Deploys

Production Revision History

Cloud Run service coach-dan-video in level-up-basket, region us-central1. Two deploys today after 2.5 months of stale code.

Time (UTC) Revision Change Outcome
Feb 24, 23:11 coach-dan-video-00009-nft Original prod (stale 2.5 months) 49% baseline
May 8, 03:51 coach-dan-video-00010-hgv Pose timeout 300→480s + graceful fallback healthy
May 8, 06:56 coach-dan-video-00011-s7r + 120s input truncation (current) healthy, serving 100%
Live Health

Production Status (right now)

Revision00011-s7r
Traffic100%
ReadyTrue
/health200 OK
Errors today0
Warnings today0
Pipeline runs2 / 2 ✓
Avg pipeline time186s
CPU8 vCPU
Memory32 GB
Concurrency1
Cloud Run timeout900s
Roadmap

Done Today & What's Next

Completed today

Diagnosed 49% YTD success rate root cause
Video Intelligence pose detection 300s timeout + zero graceful fallback caused most of the 144 failures.
Found stale source path + corrected memory + deploy skill
Real deployed code lives in projects/LevelUpBasket/scripts/coach_dan_video/cloudrun/, not the prototype dir the skill referenced.
Pose timeout 300→480s + graceful fallback
Deployed staging coach-dan-video-00022-5d2 → production coach-dan-video-00010-hgv at 03:51 UTC.
120s input truncation (server-side, no app changes)
Deployed staging coach-dan-video-00023-vzd → production coach-dan-video-00011-s7r at 06:56 UTC. Verified end-to-end on staging with a 176s test clip.
Restricted unrestricted Maps API key
Locked AIzaSyC…xB5w to Places + Geocoding APIs and IP-pinned to both VMs. Was hardcoded in 3 .NET controllers.
Content analysis: 285 uploads characterized
Skill categories, coaching themes, languages, settings. Surfaced 11% non-basketball quality gap and prompt-artifact patterns.

Open items

Commit code changes + open PR
Inner LevelUpBasket repo has uncommitted master changes. Without a PR, next deploy from another machine reverts the fix.
7-day success rate recheck (May 15)
Re-run the firestore success-rate analysis to validate ~90%+ projection with real traffic.
Audit Coach Dan system prompt
"Balance" appears in 84%, "explosiveness" in 73% of analyses — likely a templated coaching pattern that overrides actual video content. Reviewable in dbo.coachdanconfigurations.
Add non-basketball content gate
11% of uploads are off-topic (animations, fully unrelated clips). A lightweight upfront classifier would save pipeline runs and improve UX.
Rotate the production Gemini API key
AIzaSyA…Oxrs is hardcoded in Web.config and currently unrestricted. Lock to Gemini API + IP-pin, then rotate.
Async pipeline architecture (the 99% solution)
Decouple video processing from the HTTP request via Cloud Tasks queue. Removes the 900s ceiling entirely. ~1-2 days of work plus mobile coordination.