VocalGender - Voice Gender Analyzer

Ready

Gender analysis every: 5s

Min level: −45 dBFS Enable min level

How It Works

🎵

Pitch Detection

Your microphone is sampled 30 times per second using the YIN-Lite pitch detection algorithm, pulling out your fundamental vocal frequency (F0). The Hz value is mapped to a voice-type band (Bass, Baritone, Tenor, Alto, Mezzo-Soprano, Soprano, High) so you get an immediate sense of where your voice sits.

📊

Formant Analysis

A custom FFT runs on each audio frame to pull out the first three vocal tract resonances (F1, F2, F3). These formants reflect the physical shape of your vocal tract. Shorter vocal tracts tend to push formant frequencies higher by around 10-20%. Breathiness is estimated separately from spectral noise above the fundamental.

🤖

AI Gender Classification

Every few seconds, a chunk of audio is resampled to 16 kHz and run through a wav2Vec2 neural network trained on Mozilla's Common Voice dataset. It runs entirely in your browser via WebAssembly, so no audio ever leaves your device. The result is a Feminine / Masculine score that updates each cycle. Model: prithivMLmods/Common-Voice-Gender-Detection-ONNX

🔊

Resonance Balance

Over a rolling 10-second window, spectral energy is split into three bands: Chest (low, body resonance), Mask (mid, forward resonance in the face), and Head (high, upper resonance). It gives a broader picture of where your voice resonates that a single pitch number can't capture.

Why Are the Scores Always Near 100%?

It's a binary classifier. The model was trained with exactly two labels: Feminine and Masculine. There's no middle ground in the output. Every analysis picks one side, and the two scores always add up to 100%.

The math pushes scores to extremes. The model uses a softmax function at the end, which amplifies even small differences in its internal confidence into very lopsided probabilities. A difference of just 5 internal units already comes out as roughly 99% vs 1%. Male and female voices differ quite a bit in pitch and formant frequencies, so the model is rarely uncertain, and the output reflects that.

A 99% Feminine score doesn't mean your voice is locked in. It just means the AI made a confident binary call based on what it heard in that clip. Voices exist on a spectrum; this meter is one model's snapshot of a single moment.

Formant & Resonance

How It Works

Why Are the Scores Always Near 100%?