---
name: youtube-kit
description: Full YouTube launch package — thumbnails, titles, descriptions — for Every's channel. Extracts video frames, composites text via PIL (never AI-regenerated faces), pulls recent channel descriptions via YouTube API to match Every's voice. Use when publishing a YouTube video, creating thumbnails, or writing video copy.
---

# YouTube Kit

Full launch package for Every's YouTube channel: thumbnail + title + description grounded in real channel data.

## When to Use

- "I need a thumbnail for this video"
- "Help me with YouTube copy for this video"
- "We're publishing a video, need the full package"
- Any video going on the Every YouTube channel

## Quick Start

User provides: a video file (or transcript), optional reference screenshots, and direction on the hook/angle.

```
1. Pull channel style → YouTube API recent descriptions + titles
2. Read transcript → extract the core hook and structure
3. Extract candidate frames → ffmpeg, show user 6-8 options
4. User picks a frame → composite text with PIL
5. Generate title + description → matched to channel voice
6. Deliver → files to ~/Downloads, description to clipboard
```

## Setup

Requires these env vars in `.env.local`:
- `YOUTUBE_API_KEY` — YouTube Data API v3
- `GEMINI_API_KEY` — Only for background extension (never faces)

Every's channel ID: `UCjIMtrzxYc0lblGhmOgC_CA`

Fonts (must be installed locally):
- **Headline:** `TT Interphases Pro Trial Black.ttf` (or Condensed variant)
- **Serif subtext:** `Lora-VariableFont_wght.ttf`
- **Eyebrow/label:** `Switzer Variable.ttf`
- **Fallbacks:** Arial Black, Impact, Futura (system fonts)

Font directory: `~/Library/Fonts/`

## Step 1: Pull Channel Style

Fetch the 5 most recent long-form videos (> 3 min) to study description patterns:

```bash
source .env.local
# Recent videos
curl -s "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCjIMtrzxYc0lblGhmOgC_CA&order=date&maxResults=20&type=video&key=${YOUTUBE_API_KEY}" -o /tmp/yt_recent.json

# Get video IDs, then fetch full details (snippet + contentDetails for duration filtering)
# Filter to videos > 3 min (not Shorts)
curl -s "https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id=ID1,ID2,...&key=${YOUTUBE_API_KEY}"
```

**What to extract:**
- Description structure (opening hook → argument → CTA → timestamps)
- Title patterns (length, keywords, voice)
- View counts (what's working recently)

**Every's description templates (from channel analysis):**

**Dan solo monologue (< 10 min):**
- Declarative opening echoing the video's first line
- Third-person Dan framing ("Every CEO Dan Shipper...")
- Core argument as a numbered list
- "Plus:" teaser for additional value
- Single Subscribe CTA: "Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe"
- Product mention if relevant (Proof, Spiral, etc.)
- Narrative chapter titles (not topic labels)

**Long interview / panel (> 20 min):**
- Multi-paragraph summary with all speakers named
- "If you found this episode interesting, please like, subscribe, comment, and share!"
- "To hear more from Dan Shipper:" block with Subscribe + X links
- Detailed timestamps with full sentences

## Step 2: Read Transcript

If the user provides a transcript file, read it. If they provide a video file, extract it:

```bash
# If youtube-transcript-api is available
python montaigne/fetch_transcript.py VIDEO_ID_OR_URL

# Otherwise ask user to provide transcript or paste key points
```

From the transcript, identify:
- The core hook (first 30 seconds)
- The main framework/argument
- Key quotable lines (for thumbnail text)
- Chapter markers (timestamp + topic shifts)

## Step 3: Extract Candidate Frames

```bash
VIDEO="/path/to/video.mp4"
mkdir -p /tmp/yt-thumb-frames

# Extract ~20 frames spread across the video (skip first 5s and last 30s)
for t in 8 12 17 25 40 60 90 120 150 180 210 240 270 300 330 360; do
  ffmpeg -y -ss $t -i "$VIDEO" -frames:v 1 -update 1 -q:v 1 \
    "/tmp/yt-thumb-frames/frame_${t}s.jpg" 2>/dev/null
done
```

Show the user 6-8 of the best candidates. **Pick frames where:**
- Eyes are on camera (direct eye contact > looking away)
- Mouth is natural (not mid-word with weird shapes)
- Expression matches the hook (confident, surprised, engaged)
- Hands are visible if gesturing (adds energy)
- Lighting is flattering

**CRITICAL: Skip frames that are screen recordings, slides, or b-roll.** Only show frames with the speaker's face.

## Step 4: Composite Thumbnail

### The Golden Rule

**NEVER let Gemini (or any AI image model) regenerate a person's face.** AI models subtly distort facial features — warped smiles, uncanny eyes, wrong proportions. The result always looks "off" even if you can't pinpoint why.

**Use PIL/Pillow for ALL text compositing.** The video frame stays pixel-identical to the source.

### When to Use Gemini

ONLY for extending/outpainting backgrounds where no face is present:
- Extending a wall or backdrop to create more text space
- Filling in a blank region with matched background texture

Never pass a region containing a face to Gemini for editing.

### Thumbnail Composition

```python
from PIL import Image, ImageDraw, ImageFont
import os

FONTS = os.path.expanduser("~/Library/Fonts")

# Load and upscale 2x for crisp 1920x1080 output
hero = Image.open("/tmp/yt-thumb-frames/frame_17s.jpg").convert("RGB")
hero = hero.resize((hero.width * 2, hero.height * 2), Image.LANCZOS)
W, H = hero.size

# Load fonts
f_headline = ImageFont.truetype(os.path.join(FONTS, "TT Interphases Pro Trial Black.ttf"), int(H * 0.18))
f_serif = ImageFont.truetype(os.path.join(FONTS, "Lora-VariableFont_wght.ttf"), int(H * 0.04))

# Draw text
draw = ImageDraw.Draw(hero)
draw.text((int(W * 0.03), int(H * 0.15)), "HEADLINE", font=f_headline, fill=(17, 17, 17))

# Save at 1920x1080
hero.resize((1920, 1080), Image.LANCZOS).save("thumb.jpg", quality=94)
```

### Layout Options

**Text-over-wall:** If the speaker is center-right and there's a light wall on the left, place text directly over the wall. Works when background is light and uniform.

**Parchment/sticker label:** Render text on a cream rounded rectangle, rotate slightly (-3° to -5°), composite over the frame. Good for busy backgrounds (B-roll, themed graphics).

**Bottom band:** Solid cream band across the bottom 20% of the frame. Speaker stays fully visible above. Safe fallback.

**Background extension:** If the speaker is too centered, use Gemini to outpaint more background to the LEFT only. Then composite text on the extended area with PIL.

### Text Styling

- **Headlines:** ALL CAPS, heavy black sans-serif, near-black fill (#111)
- **Subtext:** Sentence case, thin serif (Lora), dark charcoal (#333)
- **Eyebrow:** ALL CAPS, letter-spaced, muted red (#B9361F) or charcoal
- **NO drop shadows, NO outlines, NO gradients on text** (editorial, not MrBeast)
- Exception: white text with heavy black stroke works on very busy/dark backgrounds

### Iteration

Generate 2-3 layout variants per round. Show the user. Expect 1-3 rounds to nail it. Common feedback:
- "Text too small" → bump font size 20-30%
- "Covers his face" → shift text or use background extension
- "Looks generic" → add tilt, eyebrow label, or themed element
- "Expression is weird" → swap to a different frame, don't try to fix with AI

## Step 5: Title + Description

### Titles

Generate 5 options. Principles:
- Under 60 characters (mobile display)
- Lead with the outcome or transformation, not the topic name
- Create a curiosity gap — promise something the viewer can't guess
- **Thumbnail = hook, Title = payoff** (don't repeat the thumbnail text)
- Avoid jargon only insiders know
- Every's audience: AI-curious professionals, not pure developers

### Description

Match the template from Step 1 based on video format. Always include:
- Opening that echoes the video's first spoken line
- Third-person framing for Dan (or speaker)
- Subscribe CTA: `Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe`
- Narrative chapter timestamps (derived from transcript)
- Product links if mentioned in video

Copy the final description to clipboard with `pbcopy`.

## Step 6: Deliver

Save all thumbnail options to `~/Downloads/{video-slug}-thumb-options/`.
Copy description to clipboard.
Present title recommendations with rationale.

## Guidelines

- Always read the transcript before generating anything
- Always pull recent channel data — styles evolve
- Never fabricate quotes or stats not in the transcript
- Default to "I'M A/AN [CONCEPT]" for Dan confession-style videos
- Expect 2-4 thumbnail iterations — that's normal, not failure
- For themed backgrounds (pirate flag, etc.), use the parchment-label approach — it reads cleanly over busy visuals