Research

Thesis: Probabilistic source filter model for speech

block_diag

Speech production:

Speech is a very crucial part of life. I am particularly interested in the process involved in converting the thoughts into pressure variation from lungs to the lips. The best part is we get to observe only the one-dimensional pressure variation outside the lips.

Question is “what can we infer from this one dimensional signal about the all the background process”?

According to me there two main components of this speech generation process.

  1.  Learning: Where the thoughts are getting mapped to a time-varying muscle controls. Muscles control the particular geometric configuration from lungs to the lips.
  2.  Physics:  Given the muscle controls, the physical laws playing the role to create this nice pressure variation from lungs to lips.

Understanding:

I am working on building generative models from laws physics (Navier Stokes equation) to explain the speech generation process. The model should be able to model general fluid motion through a time-varying tube structure with the acoustic field. According to me, the main criteria is

“It is not enough if the model explains the speech (like CFD simulation). Given the speech, the parameters estimation (interpretable) should be possible.”

Application:

I am also motivated by the quote from the “Wavenet” paper (modified):

We believe that WaveNets provide a generic and flexible framework for tackling many applications that rely on audio generation (e.g. TTS, speech enhancement, voice conversion, speech separation).

I believe that the models motivated by speech productions are also can be a unified framework that can be applied to multiple problems. (Especially in the biomedical domain)

Speech Problems:

  1. Voice conversion.
  2. speech enhancement.
  3. spoof detection.
  4. speaker id.

Bio-medical:

  1. Snorer group classification.
  2. parkinson
  3. ALS
  4. glottal dysfunction
  5. cold detection
  6. Asthma detection.

Other

  1. Information about the heart from heart sound. (Modelling the oscillations of the heart valves)
  2. Lung sound classification.