Voice recognition
History /
Edit /
PDF /
EPUB /
BIB /
Created: February 4, 2017 / Updated: July 24, 2018 / Status: in progress / 2 min read (~201 words)
Created: February 4, 2017 / Updated: July 24, 2018 / Status: in progress / 2 min read (~201 words)
- MFC/MFCC
- Recognize speakers using per speaker models
- Start with a single model for all speakers, and slowly figure out when a given speaker speaks, then retrain their individual model to become more and more accurate
- Determine vocal tract size
The goal of this project is to recognize a person based on a record of his/her voice.
- Record audio
- Convert a certain window size (e.g., 20ms long) of the signal into the frequency domain using a fast Fourier transform
- Factors of variation
- Age
- Sex
- Accent
- Words spoken
- Application matters
- Styles of speech
- Read
- Conversational
- Spontaneous
- Command/control
- Issues
- Disfluency/Stuttering
- Noise
- Microphone quality/Number of channels
- Far field
- Reverb/Echo
- Lombard effect
- Speaker accents
- Styles of speech
- Vowels can be classified by their two first formants (F1 and F2)
- Resonant frequencies that can roughly be associated with the size of specific cavities in the vocal tract
- F1: Pharyngeal cavity
- F2: Front cavity