One tag. One audio asset. Right placement, automatically.
Write [SFX: thunder crack (3s)] at the
point in the script where the thunder happens. flexVox generates the
audio from the prompt, places it as a first-class turn at that
position, and mixes it correctly when you export. No timeline
drag-and-drop. No separate audio panel. Just the words.
Music tags work the same way. [Music: warm rhodes
intro (10s)] generates a ten-second piece of music and inserts
it at that position. Both tags accept a duration in parentheses, and
both respect the platform's documented limits.
- SFX duration0.5 – 30 seconds. Set in the tag, e.g.
(5s). - Music duration3 – 600 seconds. Optional instrumental-only flag.
- PlacementThe audio lands at the position you wrote the tag — no manual nudging.
- RegenerationDon't like the first take? Swipe to regenerate. The new audio is a variant.
The modifier alphabet.
Every SFX and music tag takes a small set of modifiers that change how the audio is generated and placed. The defaults are deliberately conservative — quiet, sequential, lightly literal — because that's what works for most scripts. The modifiers are how you push past the defaults.
@underlayplacementMove the audio onto the dedicated underlay lane so it plays under dialogue instead of sequentially. The mixer auto-ducks underlay when dialogue is active.
[Music: tape-piano hush (12s) @underlay]
volume=0.30level
The level of an underlay track, 0 to 1. The default is a tasteful
−12 dB-ish. Drop to 0.2 for whispered scenes.
[SFX: rain (5s) @underlay volume=0.20]
looptreatmentGenerate audio with no audible start or end — ideal for ambient beds like rain, traffic, crowd noise, or engine hum that need to run under a whole scene.
[SFX: forest at dusk (8s) @underlay loop]
influence=0.85creativityHow literally the AI follows your prompt — 0 lets it interpret, 1 demands a faithful render. Pull this up for specific sounds, pull down for evocative ambience.
[SFX: old typewriter (4s) influence=0.9]
Modifiers stack in any order. A complete tag:
[SFX: rain on tin roof (8s) @underlay volume=0.35 loop influence=0.9].
One line. Eight seconds. Looping. Ducked. Almost-literal.
Background music. A second pass, sized exactly right.
Some shows want music under the whole episode, not just one scene. That's what the background music feature is for. You describe a mood, pick a volume, and turn it on. flexVox generates dialogue and SFX first, measures the resulting duration, and then generates background music in a second pass sized to fit. The duration is always correct because it's always computed last. Background music is a Studio feature.
Most tools loop a track and clip it. flexVox composes one piece, the exact length your show ended up being. No fades-to-silence-at-the-end.
What the mixer actually does.
Audio export concatenates all active, non-excluded turns into a single M4A. The mixer supports four parallel tracks, and every track gets peak-normalized independently before the final loudness pass.
- Track 1 — DialogueSequential speech turns with project-default or per-turn pauses.
- Track 2 — Crossfade overlayUsed for smooth transitions between dialogue takes when needed.
- Track 3 — UnderlaySFX and music marked
@underlay. Auto-ducked under dialogue. - Track 4 — Background musicThe episode-length music bed. Also ducked.
The Sound Library — reusable across projects.
If you record your own SFX or have a library of music stings, the Sound Library lives at the app level and follows you between projects. Import M4A, MP3, WAV, or AIFF. Tag with comma-separated keywords. Search by name or tag, filter by category. Assign any sound to an SFX or music turn in post-production. Sound Library is a Studio feature.