coach-dan-video-00011-s7r
Coach Dan Video
Pipeline Health & Fix Report
Two production deploys today closed a 2.5-month-old reliability gap that was silently dropping ~50% of user video uploads. This report covers what was happening, what changed, and what users are actually sending to Coach Dan.
Before vs After
Two deploys today: pose-detection timeout fix at 03:51 UTC, then 120s input truncation at 06:56 UTC. Together they remove the architectural failure modes that capped success at ~50%.
Monthly Success Rate (Pre-Fix)
The video pipeline was deployed late February and degraded over time as load grew. April was the worst month at 41% success, despite double the upload volume of February.
| Month | Uploads | Success | Partial Failures | Silent Failures | Success Rate |
|---|---|---|---|---|---|
| 2026-01 | 11 | 0 | 2 | 9 | 0% |
| 2026-02 | 91 | 52 | 24 | 15 | 57% |
| 2026-03 | 49 | 25 | 18 | 6 | 51% |
| 2026-04 | 112 | 46 | 45 | 21 | 41% |
| 2026-05 (partial) | 22 | 18 | 3 | 1 | 82% |
| YTD Total | 285 | 141 | 92 | 52 | 49% |
January starts at 0% because the pipeline was not fully deployed until February. May's 82% is a small partial-month sample distorted by the pre-fix early-month failures.
Where Videos Were Failing
Cloud Run pipeline architecture, with the failure point highlighted. Video Intelligence pose detection had a hardcoded 300s timeout — for any video longer than ~1 minute, it routinely exceeded that and crashed the entire pipeline, even when Step 1 (Gemini analysis) had already succeeded.
The fix (4 minimal edits)
The truncation uses ffmpeg stream-copy (-c copy -t 120), which clips without re-encoding — completes in ~1 second regardless of input length. Coach Dan still receives, analyzes, and replies; long videos are simply trimmed to their first 2 minutes server-side. No mobile app changes required.
How Long Are User Videos?
Distribution of 83 video durations from Cloud Run logs (last 90 days). The truncation threshold was sized off this data — only ~5% of uploads exceed 120s, so the cap is generous for the long tail without affecting the typical user.
What Users Are Sending Coach Dan
Analysis of 254 real-user uploads (excluding dev account). Multi-label categorization based on Coach Dan's caption text. Shooting form dominates by a wide margin — the product is being used as a shot doctor, not a general game-IQ coach.
What Coach Dan Keeps Teaching
Frequency of each coaching theme across analyzed uploads (n=225). Two themes appear in >70% of all responses — likely a system-prompt artifact worth auditing.
Languages
Confirmed non-English coaching responses. ~25-30% of uploads come from non-English-speaking users; pipeline correctly responds in the user's language.
Production Revision History
Cloud Run service coach-dan-video in level-up-basket, region us-central1. Two deploys today after 2.5 months of stale code.
| Time (UTC) | Revision | Change | Outcome |
|---|---|---|---|
| Feb 24, 23:11 | coach-dan-video-00009-nft |
Original prod (stale 2.5 months) | 49% baseline |
| May 8, 03:51 | coach-dan-video-00010-hgv |
Pose timeout 300→480s + graceful fallback | healthy |
| May 8, 06:56 | coach-dan-video-00011-s7r |
+ 120s input truncation (current) | healthy, serving 100% |
Production Status (right now)
Done Today & What's Next
Completed today
projects/LevelUpBasket/scripts/coach_dan_video/cloudrun/, not the prototype dir the skill referenced.AIzaSyC…xB5w to Places + Geocoding APIs and IP-pinned to both VMs. Was hardcoded in 3 .NET controllers.Open items
dbo.coachdanconfigurations.AIzaSyA…Oxrs is hardcoded in Web.config and currently unrestricted. Lock to Gemini API + IP-pin, then rotate.