Voice recognition

History / Edit / PDF / EPUB / BIB /
Created: February 4, 2017 / Updated: July 24, 2018 / Status: in progress / 2 min read (~201 words)

  • MFC/MFCC
  • Recognize speakers using per speaker models
    • Start with a single model for all speakers, and slowly figure out when a given speaker speaks, then retrain their individual model to become more and more accurate

  • Determine vocal tract size

The goal of this project is to recognize a person based on a record of his/her voice.

  • Record audio
  • Convert a certain window size (e.g., 20ms long) of the signal into the frequency domain using a fast Fourier transform

  • Factors of variation
    • Age
    • Sex
    • Accent
    • Words spoken

  • Application matters
    • Styles of speech
      • Read
      • Conversational
      • Spontaneous
      • Command/control
    • Issues
      • Disfluency/Stuttering
      • Noise
      • Microphone quality/Number of channels
      • Far field
      • Reverb/Echo
      • Lombard effect
      • Speaker accents

  • Vowels can be classified by their two first formants (F1 and F2)
    • Resonant frequencies that can roughly be associated with the size of specific cavities in the vocal tract
    • F1: Pharyngeal cavity
    • F2: Front cavity