Job-Shorts: rendering every chapter of the Book of Job as a 60-second AI video
The Bible is public domain. AI video gen is here. Short-form Bible content has a real audience. I wanted to know if I could turn reading Job into a 42-episode YouTube/TikTok series rendered entirely on my desktop.
I'm calling it Job-Shorts. The pipeline is local-first, commercial-safe, and routes every text-generation call through my Claude Code subscription so the per-chapter cost is the electricity it takes the 3090 to render.
The pipeline
chapter_number + KJV text
│
▼
1. SCRIPT GEN (Claude via subscription, ~10s)
→ 150-word narration (hook / setup / tension / payoff / turn / CTA)
2. BREAKDOWN (Claude, ~15s)
→ 4–6 visual beats + character lock + style lock
3. KEYFRAMES (ComfyUI / Flux Dev / Z-Image)
→ 1 preview image per beat
4. NARRATION (F5-TTS, local)
→ audio + word-level timestamps
5. VIDEO GEN (ComfyUI / LTX 2.3 or HunyuanVideo)
→ each beat at narration-matched length, 2 takes
6. EVALUATOR (Claude vision)
→ picks the best take by scoring rendered frames
7. CAPTIONS (faster-whisper)
→ burned word-by-word captions from narration
8. ASSEMBLE (FFmpeg)
→ concat + narration + music bed (sidechain-ducked) + verse overlays
9. PUBLISHING (Claude)
→ title + description + hashtags + thumbnail
│
▼
output/chapter_N/final.mp4 (1080x1920, vertical, ready to upload)
End-to-end on the 3090: roughly 30–60 minutes per chapter, mostly unattended.
The LLM routing trick
There's a job_shorts.llm module that fronts every text call. It picks one of three backends:
| Backend | When picked | Cost |
|---|---|---|
claude_code | default if claude is on PATH | uses your Claude Code subscription |
claude_api | only if llm_backend=claude_api | pay-per-token |
ollama | fallback | free, local |
claude_code works by invoking claude -p as a subprocess and reading from your existing auth. So all script generation, scene breakdown, evaluator scoring, and publishing metadata go through my Max plan — no per-token spend.
That single decision turns the math on its head. Without it, generating 42 chapters at GPT-4-class quality would cost real money. With it, the only cost is the electricity for ComfyUI to render the video.
Patterns I borrowed (and the ones I had to invent)
Borrowed from other text-to-film projects — these are well-established now:
- Character lock — full physical description injected verbatim into every prompt
- Style lock — 20–40 word visual style string locked across all prompts
- World reconstruction — every prompt fully self-contained, no inter-clip memory assumed
- Storyboard-before-video — cheap keyframe preview before expensive video gen
- Multiple takes + AI evaluator — generate 2–3 takes, LLM scores PASS/FAIL
- Duration calc from word count — narration WPM dictates clip length
- Resume-from-crash state file — JSON state after every step
What I had to add for Bible content specifically:
- Whisper caption timing — accurate word-level burned captions for muted viewing (this is non-negotiable on Shorts/TikTok)
- Series-wide consistency —
series.jsonkeeps Job + style identical across all 42 episodes - KJV auto-fetch — pulls public-domain Bible text from bible-api.com so I never type a verse
- Verse chyron overlay — quoted scripture appears on screen with proper formatting
- Music bed with sidechain ducking — auto-select + duck under narration
- Batch mode — process N chapters overnight unattended
The CLI surface
# See your hardware tier and recommended models
python -m job_shorts.cli info
# Verify the LLM backend works
python -m job_shorts.cli test-llm
# Just write the script — fast, free, no rendering
python -m job_shorts.cli script 1
# Generate one chapter end-to-end (supervised)
python -m job_shorts.cli chapter 1
# Batch chapters 1 through 10
python -m job_shorts.cli batch 1-10
# Fully autonomous overnight: auto-launch services, vision-evaluator, no gates
python -m job_shorts.cli auto-batch 1-42
# Resume a crashed run
python -m job_shorts.cli resume output/chapter_03
The fully-autonomous mode is the one I actually use. Start it, walk away, the rig knows how to relaunch ComfyUI or Ollama if either dies, and there's a JSON state file after every step so a power blip doesn't cost a chapter.
Where it's at
Phase 0 (23 modules) is complete. Phase 1 — first end-to-end render — is the next step. The interesting open questions are around the evaluator: how do you teach a vision model to spot when LTX has slipped into uncanny-valley territory before you commit to a take? The current heuristic is naive (luminance variance + character consistency check). I think there's a smarter version that compares each frame back to the keyframe storyboard.
If you want to read the actual code or watch progress, it lives on my GitHub. The whole point of this project is that anyone with a 12 GB+ GPU should be able to fork it and render their own Bible (or any other public-domain text). The pipeline doesn't care that it's Job.
