SFX & music — flexVox deep dive

One tag. One audio asset. Right placement, automatically.

Write [SFX: thunder crack (3s)] at the point in the script where the thunder happens. flexVox generates the audio from the prompt, places it as a first-class turn at that position, and mixes it correctly when you export. No timeline drag-and-drop. No separate audio panel. Just the words.

Music tags work the same way. [Music: warm rhodes intro (10s)] generates a ten-second piece of music and inserts it at that position. Both tags accept a duration in parentheses, and both respect the platform's documented limits.

SFX duration0.5 – 30 seconds. Set in the tag, e.g. (5s).
Music duration3 – 600 seconds. Optional instrumental-only flag.
PlacementThe audio lands at the position you wrote the tag — no manual nudging.
RegenerationDon't like the first take? Swipe to regenerate. The new audio is a variant.

The modifier alphabet.

Every SFX and music tag takes a small set of modifiers that change how the audio is generated and placed. The defaults are deliberately conservative — quiet, sequential, lightly literal — because that's what works for most scripts. The modifiers are how you push past the defaults.

@underlayplacement

Move the audio onto the dedicated underlay lane so it plays under dialogue instead of sequentially. The mixer auto-ducks underlay when dialogue is active.

[Music: tape-piano hush (12s) @underlay]

volume=0.30level

The level of an underlay track, 0 to 1. The default is a tasteful −12 dB-ish. Drop to 0.2 for whispered scenes.

[SFX: rain (5s) @underlay volume=0.20]

looptreatment

Generate audio with no audible start or end — ideal for ambient beds like rain, traffic, crowd noise, or engine hum that need to run under a whole scene.

[SFX: forest at dusk (8s) @underlay loop]

influence=0.85creativity

How literally the AI follows your prompt — 0 lets it interpret, 1 demands a faithful render. Pull this up for specific sounds, pull down for evocative ambience.

[SFX: old typewriter (4s) influence=0.9]

Modifiers stack in any order. A complete tag: [SFX: rain on tin roof (8s) @underlay volume=0.35 loop influence=0.9]. One line. Eight seconds. Looping. Ducked. Almost-literal.

Background music. A second pass, sized exactly right.

Some shows want music under the whole episode, not just one scene. That's what the background music feature is for. You describe a mood, pick a volume, and turn it on. flexVox generates dialogue and SFX first, measures the resulting duration, and then generates background music in a second pass sized to fit. The duration is always correct because it's always computed last. Background music is a Studio feature.

Most tools loop a track and clip it. flexVox composes one piece, the exact length your show ended up being. No fades-to-silence-at-the-end.

What the mixer actually does.

Audio export concatenates all active, non-excluded turns into a single M4A. The mixer supports four parallel tracks, and every track gets peak-normalized independently before the final loudness pass.

Track 1 — DialogueSequential speech turns with project-default or per-turn pauses.
Track 2 — Crossfade overlayUsed for smooth transitions between dialogue takes when needed.
Track 3 — UnderlaySFX and music marked @underlay. Auto-ducked under dialogue.
Track 4 — Background musicThe episode-length music bed. Also ducked.

The Sound Library — reusable across projects.

If you record your own SFX or have a library of music stings, the Sound Library lives at the app level and follows you between projects. Import M4A, MP3, WAV, or AIFF. Tag with comma-separated keywords. Search by name or tag, filter by category. Assign any sound to an SFX or music turn in post-production. Sound Library is a Studio feature.

One tag. One audio asset. Right placement, automatically.

The modifier alphabet.

Background music. A second pass, sized exactly right.

What the mixer actually does.

The Sound Library — reusable across projects.

→ Keep going

Auto-ducking

The parser

Export