Home AGI Voice recognition

Voice recognition

History / Edit / PDF / EPUB / BIB /
Created: February 4, 2017 / Updated: December 21, 2025 / Status: in progress / Readability: technical / 2 min read (~201 words)
artificial-general-intelligence

MFC/MFCC
Recognize speakers using per speaker models
- Start with a single model for all speakers, and slowly figure out when a given speaker speaks, then retrain their individual model to become more and more accurate

Determine vocal tract size

The goal of this project is to recognize a person based on a record of his/her voice.

Record audio
Convert a certain window size (e.g., 20ms long) of the signal into the frequency domain using a fast Fourier transform

Factors of variation
- Age
- Sex
- Accent
- Words spoken

Application matters
- Styles of speech
  - Read
  - Conversational
  - Spontaneous
  - Command/control
- Issues
  - Disfluency/Stuttering
  - Noise
  - Microphone quality/Number of channels
  - Far field
  - Reverb/Echo
  - Lombard effect
  - Speaker accents

Vowels can be classified by their two first formants (F1 and F2)
- Resonant frequencies that can roughly be associated with the size of specific cavities in the vocal tract
- F1: Pharyngeal cavity
- F2: Front cavity