The audio pipeline

From microphone to PAD score in under 100 milliseconds.

Every FluentPlay game shares the same browser-side audio engine. Your microphone feeds a real-time analysis pipeline that classifies every audio frame — about 60 per second — into a feature stream. That stream feeds the PAD scorer. No audio is recorded. Nothing leaves the browser except one cloud call for phoneme-level accuracy.

🎙

Microphone

Browser capture

⚡

Analysis Engine

Real-time · 60 fps

▦

DFS

Disfluency Feature Stream

◆

PAD Scorer

Per-syllable PAD score

●

PAD Score

Output · per syllable

02 · What the pipeline produces

Four features per frame. Each one feeds a PAD component.

01 / 04

RMS Intensity

Root-mean-square energy of the audio frame. Tracks how much acoustic energy the speaker is producing. Drops during delayed-onset events, spikes during forced articulation. A stable RMS across syllables is a smoothness signal.

Feeds → A (Acoustic component)

02 / 04

Onset Count

Cumulative count of voiced onsets detected in the session. Each time the pipeline sees a transition from silent or building to voiced, the counter increments. Repeated onsets on the same syllable window signal repeated-onset events.

Feeds → P (Prediction component)

03 / 04

Voiced Duration

Cumulative time spent in the voiced state, in milliseconds. The ratio of voiced-to-total time tracks productive speech output. Long stretches of building without voiced indicate motor-planning stalls in the pre-articulatory window.

Feeds → G (Gate component)

04 / 04

Delayed-onset Flag

Binary flag that fires when the pipeline detects a sustained building state exceeding a duration threshold. The state is not silence — it is active motor effort without articulation. When the flag fires, the PAD scorer weights that syllable window accordingly.

Feeds → λ (attenuation parameter)

03 · Architecture

Design constraints that are features.

🌐

Browser-native

Runs in any modern browser. No install, no plugin, no app store. Single-file HTML deployments via Netlify Drop.

🔒

No audio recorded

Audio is analyzed in real time and discarded frame by frame. Nothing is stored. Nothing leaves the device except one cloud call for phoneme-level pronunciation assessment.

⚡

Sub-100ms latency

From speech onset to scored feature output in under 100 milliseconds. Frame rate of ~60 fps. Fast enough for real-time visual feedback during practice.

🔌

Separable layers

The audio pipeline and the PAD scorer are architecturally independent. License the pipeline, the scoring framework, or the full integrated stack.

The full picture

The first platform built around the pre-articulatory window.

The audio pipeline captures pre-articulatory timing instability in real time. PAD scores it per syllable. Every game in the FluentPlay library ships with both layers hardwired in. The pipeline and the scoring framework are architecturally independent — licensable separately or as an integrated stack. Patent pending under U.S. Provisional 64/016,001.

How PAD Works

End-to-end platform walkthrough. Start here.

Open walkthrough →

PAD Explainer

Scoring framework deep dive — Prediction, Acoustic, Gate, λ.

Open explainer →

PAD Signal Profile

Interactive 3D visualization. Drag to rotate, adjust the rubric.

Open 3D profile →

EEG Station

Pre-SMA ready-state research rig. fNIRS & EEG alignment.

Open station →

Request a meeting

Tell us what you're working on.

Whether you're evaluating the audio pipeline, the PAD scoring framework, or the full integrated stack — describe your use case and we'll schedule a founder call.

From microphone to PAD score in under 100 milliseconds.

Watch the pipeline process a real utterance.

Four features per frame. Each one feeds a PAD component.

Design constraints that are features.

The first platform built around the pre-articulatory window.

Tell us what you're working on.