Skip to content
01 The parser

Read the room. Then read the script.

Paste a script in any reasonable format. flexVox figures out who's speaking, what's a stage direction, and which lines need a human eye — and confidence-scores every attribution so you only review what needs reviewing.

  • 4 input formats
  • Confidence scoring
  • Scenes & chapters
  • Modifier syntax
Paste Parsed
[SCENE: cold open]
HOST: Welcome back to the show.
HOST: [warmly] Today, timing.
[SFX: rain on windows (5s) @underlay volume=0.3 loop]
GUEST: [laughs] You picked the right week.
[Music: warm rhodes intro (10s)]
  1. SCENE cold open
  2. HOST Welcome back to the show.
  3. HOST [warmly] Today, timing.
  4. SFX rain on windows · @underlay · loop · 5s
  5. GUEST [laughs] You picked the right week.
  6. MUSIC warm rhodes intro · 10s
confident review unsure

It reads four formats by default.

Writers don't agree on script formatting. flexVox doesn't make you pick. The parser recognizes the four most common conventions interchangeably and assigns the same internal turn structure to all of them.

01 · Colon HOST: Welcome to the show. The screenplay convention. Most common.
02 · Bracket [Host] Welcome to the show. Common in podcast scripts. Bracket-prefixed speaker.
03 · Paren (Host) Welcome to the show. Stage-direction style. We treat parens like brackets.
04 · Block HOST Welcome to the show. Speaker on its own line, dialogue on the next.

Every attribution comes with a confidence score.

Detection is not certainty. The parser knows when it's guessing. Each turn carries a confidence value, surfaced in the review screen as a single colored dot — green for high confidence, orange for "you might want to look at this," red for "we genuinely didn't know."

Low-confidence rows are pinned to the top of the review screen so you can clear them in a few taps. The parser also suggests corrections for near-matches: Did you mean HOST instead of HOSP?

The point isn't perfect parsing. The point is you never spend time reviewing a hundred lines that were already right.

SFX and music aren't second-class.

Sound effects and music cues use the same tag syntax — and the parser treats them as first-class turns from the moment they land. They get their own row, their own badge, and their own context menu.

  • SpeechALEX:, [Alex], (Alex), or standalone-name block
  • Sound effects[SFX: prompt (duration)] · 0.5–30s
  • Music[Music: prompt (duration)] · 3–600s
  • Scene[SCENE: title] · groups turns under a collapsible header
  • Chapter[CHAPTER: title], [ACT: title], or [PART: title]

Modifiers are the secret door.

Every SFX and music tag accepts a trailing set of modifiers. You can write a complete mix instruction in one line of script — no separate audio panel, no timeline drag-and-drop.

  • @underlayPlay under dialogue instead of sequentially. Goes on a dedicated lane that auto-ducks.
  • volume=0.3Underlay volume, 0 to 1. Defaults to a tasteful −12 dB.
  • loopGenerate seamless looping audio — no audible start or end. Perfect for rain, traffic, hum.
  • influence=0.8How literally the AI follows your prompt, 0 (creative) to 1 (literal).

A complete example: [SFX: rain on tin roof (8s) @underlay volume=0.35 loop influence=0.9]. That one line gets you an 8-second looping ambient bed at 35% volume, ducked under dialogue, with the AI staying close to your prompt.

Re-parse is destructive — and that's by design.

Editing the raw text and re-parsing replaces all existing speaker assignments and deletes any generated audio for the project. flexVox asks before doing this because it's a real choice: a totally fresh parse will be cleaner than a partial reconciliation, and the cost is visible up front rather than hidden in a half-broken state.

For surgical edits — fixing a typo, tweaking a line — use per-turn edit instead. It preserves voice assignments and audio.