Research Expertise
We are actively engaged in research.
We have developed some original techniques and algorithms in the following areas.
The techniques and codes are available for out-right sale either on exclusive or non-exclusive basis.
We undertake projects for customer requirements and offer consultancy.
We make a brief presentation of three major areas of research.
Speech Recognition:
Emphasis is on the front-end - in finding robust, speaker independent, context independent acoustic features.
Text-to-Speech Synthesis:
Emphasis is on the Articulatory modeling and development of synthesis model.
Speech Coding:
Use of Voice Source Model in LP (Voice Excited Linear Prediction - VSLP) and Formant Based Synthesis.
Speech Recognition Techniques
- Manner classification and Segment Boundary Detection: Robust Identification of Silence or Voiced (Vowel Vs Voiced Consonant) or Unvoiced (Fricative Vs Burst) Segment. Silence detection in not merely a threshold on the intensity. Non-speech regions with intensity above threshold are marked as silence. Segment boundaries based on a new definition of slope of intensity.
|
|
-
Invariant acoustic cues: Recognition based on robust 'acoustic properties' referred to as
computational distinctive features. vowel identification is based on computed features Front or Back, High or
Low, Jaw Open or
Jaw Close. Computed feature is compared to a threshold. Threshold is independent of speaker, gender, context. Approach is speaker independent and language independent. Highly successful vowel identification has been achieved.
Figure below (Left: Male speaker, Right: Female Speaker) shows computed features High-Low, Front-Back for vowles 'aa', 'ee', "A', 'O' and 'oo'. Red line indicates the threshold. Note that vowel 'aa' is correctly identified as 'Back-Low'; vowel 'ee' as 'High-Front';vowel 'A' as diphtongised; vowel 'O' as Mid-Back; vowel 'oo' as 'High-Back'.
Figure below on left shows computed features High-Low, Front-Back, Jaw Open-Close for vowle 'ee' in the context of 'speech' spoken by four speakers. Red line indicates the threshold. Note that vowel 'ee' is correctly identified as 'High-Front' for all speakers. Figure on right shows vowel 'ee' in different contexts (E-set : bee, cee, dee etc.) spoken by the same speaker. Note that vowel 'ee' is correctly identified as 'High-Front' in all contexts.
Highly successful identification of 's', 'sh', 'z'.
Results are used in articulation therapy for hard of hearing children such as distingushing between 'snake' Vs 'sheep' or 'snake' Vs 'zeebra' or to teach phonation (shown as sun shine) Vs frication (shown as rain fall).
Work on classification of other phonemes is in progress.
Speaker adaptation: Adaptive estimation and cancellation of spectral tilt
due to the influence of speaker's voice.
Synthesis Model for TTS
- Default articulatory positions and their dynamics are
specified
- Articulatory parameters are interpolated
- Formant data are
computed
- Anantha's Voice source Model is
used
- Default intonation is used
Sample outputs using articulatory synthesiser*:
Ex.1: Some select English Words
Ex.2: An English Sentence
Ex.3:A Hindi Sentence in the context of announcement in a railway platform
See below for the text.
Coding based on VSLP and Formant vocoder models
-
A good formant tracking algorithm has been developed
-
Anantha's Voice source Model is used
-
Robust Voice, Unvoice, Silence, Burst detection
-
Segment boundaries are detected
-
Low bit rate coding can be achieved by interpolating the quantized parameters between segment
boundaries
Examples of coding:
VSLP model output*
Speaker transformation using formant coding:
Orginal*
Transformed*
*
Wave Files. Please select "Open" in the 'File Download Window'.
Ex.1: Alchohol, Bucket, Capacity, Elastic, Post-office, Typical, Traffic Police
Ex.2: Calcutta is a big city.
Ex.3: Bhoopaal Calcutta express chaar per aa rahee hain.
|Home|
Voice Awareness|
Products |
Vagmi Rental|
Technical Support|
Clientele |
Resume |
Publications |
Contact Us |
|