It reads four formats by default.
Writers don't agree on script formatting. flexVox doesn't make you pick. The parser recognizes the four most common conventions interchangeably and assigns the same internal turn structure to all of them.
HOST: Welcome to the show. The screenplay convention. Most common. [Host] Welcome to the show. Common in podcast scripts. Bracket-prefixed speaker. (Host) Welcome to the show. Stage-direction style. We treat parens like brackets. HOST
Welcome to the show. Speaker on its own line, dialogue on the next. Every attribution comes with a confidence score.
Detection is not certainty. The parser knows when it's guessing. Each turn carries a confidence value, surfaced in the review screen as a single colored dot — green for high confidence, orange for "you might want to look at this," red for "we genuinely didn't know."
Low-confidence rows are pinned to the top of the review screen so you
can clear them in a few taps. The parser also suggests corrections for
near-matches: Did you mean HOST instead of HOSP?
The point isn't perfect parsing. The point is you never spend time reviewing a hundred lines that were already right.
SFX and music aren't second-class.
Sound effects and music cues use the same tag syntax — and the parser treats them as first-class turns from the moment they land. They get their own row, their own badge, and their own context menu.
- Speech
ALEX:,[Alex],(Alex), or standalone-name block - Sound effects
[SFX: prompt (duration)]· 0.5–30s - Music
[Music: prompt (duration)]· 3–600s - Scene
[SCENE: title]· groups turns under a collapsible header - Chapter
[CHAPTER: title],[ACT: title], or[PART: title]
Modifiers are the secret door.
Every SFX and music tag accepts a trailing set of modifiers. You can write a complete mix instruction in one line of script — no separate audio panel, no timeline drag-and-drop.
- @underlayPlay under dialogue instead of sequentially. Goes on a dedicated lane that auto-ducks.
- volume=0.3Underlay volume, 0 to 1. Defaults to a tasteful −12 dB.
- loopGenerate seamless looping audio — no audible start or end. Perfect for rain, traffic, hum.
- influence=0.8How literally the AI follows your prompt, 0 (creative) to 1 (literal).
A complete example: [SFX: rain on tin roof (8s) @underlay volume=0.35 loop influence=0.9].
That one line gets you an 8-second looping ambient bed at 35% volume,
ducked under dialogue, with the AI staying close to your prompt.
Re-parse is destructive — and that's by design.
Editing the raw text and re-parsing replaces all existing speaker assignments and deletes any generated audio for the project. flexVox asks before doing this because it's a real choice: a totally fresh parse will be cleaner than a partial reconciliation, and the cost is visible up front rather than hidden in a half-broken state.
For surgical edits — fixing a typo, tweaking a line — use per-turn edit instead. It preserves voice assignments and audio.