How a paste
becomes a podcast.
flexVox is structured around three tabs — Script, Production, Export. Each step builds on the last, but every tab is always accessible. You can jump back to the script while audio is generating; the app remembers where you were.
Paste your script. Confirm the lines we weren't sure about.
The editor uses a serif typeface and generous spacing — a calm, literary writing feel. Type or paste any dialogue script. The parser detects speakers across multiple formats:
- HOST: Welcome to the show.
- [Host] Welcome to the show.
- (Host) Welcome to the show.
- A standalone name on its own line, dialogue beneath.
Every detected attribution gets a confidence score. Low-confidence lines are highlighted; the rest is already done. Batch-assign unreviewed turns or merge duplicate speakers with one tap.
ALEX: You hear that?
ALEX: The whole city is on a delay.
JORDAN: [whispers] Three seconds behind the lightning.
[SFX: distant thunder (3s) @underlay volume=0.4]
Give every character a distinct voice.
Open the Voice Library to search the full ElevenLabs catalog, preview voices instantly, and assign them with the casting session open — it auto-advances to the next unvoiced speaker as you go. A counter ("3 of 5 cast") sits at the top.
Or tap Auto-Cast All and let flexVox pick distinct voices for your whole cast based on each speaker's role in the script — manually-assigned voices are preserved.
Open Fine-tune Voice per speaker to dial stability, similarity, style, and speed. The fine-tune section is collapsed by default so your default view stays clean.
Generate dialogue, SFX, and music — in one pass.
The Ready screen is one summary: turn count, speaker count, a background music toggle, and a generation mode picker. Nothing hidden behind disclosure groups. Tap Generate.
flexVox calls ElevenLabs for every line — speech with bidirectional voice context for natural continuity, SFX from your tags, music from your prompts. Background music (when enabled) generates in a second pass sized to your finished episode duration.
When something fails, generation doesn't stop. Failed turns get a badge and an error detail; network errors and rate limits are retried automatically. Cancel anytime — your already-generated audio is preserved.
Regenerate one line. Compare takes. Move on.
Each turn has a play button, status badges, and a flag toggle. Tap a turn to start playback from there. Play from Here advances through the rest of the script automatically.
Swipe right on any turn to regenerate. The new audio saves as a variant — your prior takes are never overwritten. Open the variant picker to compare side-by-side and pick the best.
Mark music or SFX as underlay to play it under dialogue. LUFS-aware auto-ducking measures both tracks and computes the right level on its own. Or set ducking depth, attack, and release by hand.
You hear that? The whole city is on a delay.
00:00.0 → 00:03.4[whispers] Three seconds behind the lightning.
00:03.6 → 00:07.1 2 takesdistant thunder · @underlay · volume 0.40
00:00.0 → 00:09.0[laughs] Promise me we'll never count down together.
00:07.3 → 00:11.0 RECASTMix, normalize, share.
The Export tab is a live teleprompter. Each turn lists with a timestamp and a speaker badge; the active turn highlights and auto-scrolls into view. When alignment data is available, words highlight in real time — bold and brand-colored as they're spoken.
Press Cmd+E or tap Export. Pick a platform preset: Apple Podcasts (−16 LUFS), Spotify (−14 LUFS), YouTube (−14 LUFS), Broadcast (−23 LUFS), or Custom with a slider from −30 to −6. Transcript exports (SRT, VTT, JSON, plain text) live in the same sheet. Share via the iOS share sheet.
M4A · AAC · normalized · ready for any host
That's the whole pipeline.
Five steps, three tabs, one M4A. Download and walk through it yourself — demo mode works without any account.