AI-powered video production studio that turns research into broadcast-ready YouTube documentaries — in two languages, from a single codebase.
Making one high-quality documentary takes weeks. Doing it in two languages doubles the work. Scaling to 100+ videos? Impossible by hand. HackStudio Pro collapses that timeline by treating every part of the pipeline as code.
Not timelines dragged in Premiere. Content lives in .ts files with full type safety and IDE autocomplete.
Not dragged on a scrubber. Word-level timestamps from MiniMax TTS drive the entire timeline automatically.
Version-controlled, composable, reusable. Charts, diagrams, and maps render as JSX over B-roll backgrounds.
lang="cn" or lang="en" — same pipeline, same components, same render command.
Nine phases. Three of them — Concept, Editor Pass, Validation — exist because we learned the hard way that render time is too expensive to waste on avoidable mistakes.
Every video starts with an editorial angle — what's the gap between how Chinese and Western audiences see this story? video-concept.md pins down spine and tone before any researcher runs.
AI agents search 17+ platforms across Chinese and English ecosystems. Every claim gets bilingual triangulation with 3+ independent sources, saved into a dossier of transcripts, facts, perspectives, and visuals.
AI-DrivenResearch becomes structured TypeScript — narration lines, section titles, chart labels, and verified data points. Must sound spoken, not written: short sentences, no em dashes.
AI-DrivenVideos from official channels only, analyzed frame-by-frame with Gemini 3.1 Flash Lite — fast, cheap, excellent Chinese OCR. Each clip gets a .analysis.md saved beside the .mp4.
The video-editor skill scores the script and auto-picks a documentary director persona — Adam Curtis, Errol Morris, or Alex Gibney — emitting role-tagged B-roll with director-voiced rationale.
MiniMax T2A v2 generates voiceover with voice_modify for passionate delivery. Word-level timestamps mean every subtitle highlights in perfect sync with speech.
Each line becomes a typed sequence — video, chart, title, quote, or ending. PartRenderer dispatches to focused renderers. Calm static backgrounds for data; moving B-roll for narrative.
Five pre-render checks: counts consistency, TTS integrity, breathing time, B-roll overlap, text density. Must pass before the expensive render. Catches what humans miss.
CodeOne remotion render command outputs broadcast-ready .mp4 in both Chinese and English versions. Add --gl=angle --concurrency=1 for Mapbox maps.
A Part used to be "video background with chart overlays". That model broke the moment data got complex — charts fought moving B-roll for attention, and silent title cards dropped the audio. The new model treats each narration line as a typed entry with a kind that picks the right renderer and the right background.
Standard narration. VideoBackground plays a B-roll clip with startFrom trimming. Glassmorphism caption floats on top.
Data visualizations get a StaticBackground — a tonal gradient that doesn't compete for attention. Breathing time validator enforces minimum 4 seconds.
Part titles are tied to a narration line (typically lineIdx: 0). Audio never drops out. Minimum 2.5s breathing time.
Typography-first composition on calm background. Minimum 3.5s breathing time for the line to land.
Returns to moving B-roll for emotional punctuation. Consumes the final slot in the broll-manifest.
Shared across every video. Routes each SequenceEntry to one of five focused renderers. Data flows in as arguments — zero video-specific imports in src/shared/.
Picking B-roll is an editorial decision, not a technical one. The video-editor skill reads the script, scores it against three documentary director profiles, and picks the one whose voice matches the story. The result is B-roll that feels edited, not assembled.
Systems, irony, archive juxtaposition. Picks texture clips that quietly undercut the narration. Works for stories about institutions or ideologies.
SystemsHuman portraiture, close-ups, interrogative stillness. Favors faces and objects over action. Works for stories about individuals and their contradictions.
PortraitureInstitutional accountability, evidence, tension. Favors documents, newsreels, official footage. Works for stories about power and its consequences.
AccountabilityEach clip is assigned a role — anchor, texture, counterpoint, or transition — with a director-voiced rationale. A validator confirms the proposed distribution matches the chosen director's target role mix. Human review, rename .proposed.ts to broll-manifest.ts, move on.
Remotion renders are expensive — minutes per Part, re-renders cost real time. The validation harness runs five static checks on the manifests before you ever spin up the encoder. 🔴 fatal blocks; ⚠ informational is a warning you can accept.
Shared rendering infrastructure is reused across all videos. Adding a new video = new folder + data files + one import.
src/ ├── shared/ # Reusable across ALL videos ├ ├── components/ # PartRenderer, SubtitleOverlay... ├ ├── lib/ # colors, fonts, timing, audio math ├ └── schemas/ # VideoSchema (lang: cn | en) └── videos/ └── xiaomi-su7/ # One folder per video ├── index.tsx # Composition registry ├── components/ # Parts + animated overlays └── data/ # Scripts, B-roll, audio, charts public/<slug>/ # Assets namespaced per video ├── audio/{cn,en}/ # TTS .mp3 files └── videos/ # B-roll .mp4 files
Sequence durations are computed from TTS output. Change the script and timing updates automatically — no manual scrubbing.
All overlays use useTimeScale() so keyframes scale proportionally to the actual sequence duration.
Type safety, imports, and IDE autocomplete for all content, manifests, and chart data.
B-roll validation ensures no two sequences share overlapping time ranges from the same source file.
A cinematic visual language inspired by modern data journalism and Xiaomi's product design precision.
Boundaries through tonal shifts, negative space, and radial gradients. Ghost borders at 15% opacity only when required for accessibility.
Floating cards use semi-transparent surfaces with backdrop-blur: 20-40px over video backgrounds.
No standard drop shadows. Instead: 60-80px blur, 6-10% opacity, tinted with surface color — never pure black.
CTAs and data highlights use a warm gradient from #FFB595 to #FF6700 at 135 degrees.
StaticBackground instead of fighting moving B-roll for attention.kind: "title" sequence tied to a narration line. Audio never drops out.video-editor skill picks Curtis / Morris / Gibney based on story shape. B-roll feels edited, not assembled.useTimeScale() so keyframes stay proportional.Open this repo in Claude Code, Cursor, or any IDE with an AI agent. Tell it what video you want to make. It walks the pipeline with you — research, script, B-roll, TTS, render.
video-concept.md — the story spine, tone, and Part structure. Never start research without this.transcript.md, facts.md, perspectives.md, visuals.md — triangulated across 3+ sources per claim.content-cn.ts, content-en.ts, and chart-data.ts with verified values..analysis.md beside each .mp4 with OCR and entity inference.broll-manifest.proposed.ts with role tags and director-voiced rationale per clip. Override auto-pick with --director curtis|morris|gibney.generate-tts.ts, writes alignment-manifest.ts, and scaffolds each Part following design.md.validate-video.ts with checks for counts, TTS integrity, breathing time, B-roll overlap, and text density..mp4 files in both CN and EN.# Install dependencies bun install # Preview in Remotion Studio bun run dev # Generate TTS for a video (MiniMax T2A v2 with word-level timestamps) bun run scripts/generate-tts.ts --video xiaomi-su7 # Editor Pass — auto-director B-roll tagging /video-editor --video xiaomi-su7 # Pre-render validation harness — 5 checks must pass bun run scripts/validate-video.ts --video xiaomi-su7 # Render final video (Chinese + English) bunx remotion render XiaomiSU7-CN --codec=h264 bunx remotion render XiaomiSU7-EN --codec=h264