SSL – Speech Science Lab

A comprehensive software package for education and research in speech science.

SSL is easy to operate and user friendly software package. It requires no programming skills. SSL is compatible with 16 bit mono PCM WAV format. Analysis/synthesis tools of SSL are language independent and XP/Win7 compatible.

 Application areas of SSL

  • Education and Research in Speech Science, Phonetics, Linguistics, Speech and Hearing.
  • Forensic Sciences
  • Speech Analysis, Synthesis, Perception Studies
  • Speech analysis generates the parameters for speaker recognition, speech recognition, coding etc
  • Spectral Analysis of chanting, music, animal sounds etc.
  • Spectral analysis in general (of machinery noise, animal noise, bird songs, music etc.

Several doctoral theses, graduate level dissertations have been done using SSL. A large number of projects have been done using SSL. Some Examples: Speech Analysis of Indian languages; development of speech in children; Voice transformation; Analysis of frog calls, cuckoo calls; Quality control of automobile horns etc.

There are two modules under SSL: SSL-T and SSL-R.

SSL-T

SSL-T can be used for a practical laboratory self-teaching course in speech science, speech signal processing. SSL-T works only on the specific database which is supplied along with it. For conducting live projects and research, SSL-R has to be used. SSL-R includes SSL-T.

An Experimental Course in Speech Science
SSL can be used as an educational tool for an experimental course in speech science. Necessary speech database is provided. A comprehensive course book guides the user while systematically explaining the concepts in speech science. It covers topics such as temporal and spectral properties of speech signals, spectrogram reading, Articulatory-acoustics etc.

SSL-R

Extensive Documentation
SSL-R comes along with an extensive documentation covering:

  • User Guide, Technical Support Manual and Quick Reference Guide: User Guide explain the overall organization of the software and the formats of the various files created by SSL software. The block diagram and details of the analysis/synthesis methods are also given. Technical support manual and Quick Reference Guide explain the procedure for recording signals, pre-cautions to be taken while recording, proper choice of microphones, trouble shooting etc.
  • SSL- Tutorials:
    • Analysis-Editing-Synthesis with duration scaling
    • Analysis-Synthesis-Perception: Manipulation of formant transition duration and generation of a series of stimuli and conducting perception experiment.
    • Speaker verification
    • Spectrograph generation
  • Experimental Course in Speech Science: same as in SSL-T module

Modules in SSL

WaveSpec: With facility to display a selected segment of the signal at the cursor location, its autocorrelation function, its short-time spectrum with/without LP envelope.

WaveKep: With facility to display a selected segment of the signal at the cursor location, its cesptrum, its short-time spectrum, cepstrally smoothed envelope with/without LP envelope.

Objective comparison for speech/speaker recognition: This is common to the above two modules. In case of two channel signals, objective distance of the two feature vectors for the selected segments and graphic comparison of the ‘patterns’ of the feature vectors. This can be used for a comparative study of the feature vectors for two different phones or for different speakers for the same phone.

Spectrograph- One channel and Two channels: A tool for an efficient computation and presentation of spectrogram in a variety of formats:

  • Broad-band, Narrow-band and Very broad-band spectrograms
  • User selected duration
  • Gray or color spectrogram
  • Frequency axis can be in Hz, mel or Bark.
  • Frequency-scale – Full, half, quarter
  • Contrast/Gain control
  • Superpose pole tracks, formant tracks and Pitch (F0) track

Labeling : There are three sub-modules:

  • Manual: Highlight and assign a phone label; specify compression/expansion scale factor for the purpose of interactive synthesis
  • Semi-automatic: Analyze speech signal and ensure proper voiced/unvoiced/silence classification and specify the phone string for labeling;
  • Dynamic Time Warping: Using the analyzed parameters and labels to warp the duration of phones of the selected utterance against a reference utterance and synthesize with the altered durations – Obtain an objective distance measure after warping.

ACOPHON-I: Analysis-Editing-Synthesis:

  • Analysis options: Uniform frame rate analysis with user controlled frame duration, frame rate, number of LPCs, pitch range etc. to extract V/U/S labels, speech/source intensity, pitch (F0), glottal parameters, autocorrelation coefficients, LPCs, parcor coefficients, formant frequencies and bandwidths, MFCCs etc.
  • Editing Options: Manually edit any unexpected errors with a facility to display the parameters along with the speech wave
  • Intentionally edit the parameters for perception experiments such as for example, scaling of F0, scaling of formant frequencies, interpolation, averaging etc.
  • Synthesis Options: Facility to synthesize signal with a variety of options: (i) using LP model (ii) using formant based model with all or only select number of formants (iii) paste the unvoiced segments from the original (iv) control the overall rate of speech (compress/expand) (v) control the duration of a specific labeled segment etc.
  • PATPLAY Synthesis: This is another option where the acoustic parameters may be assigned to user chosen target locations and the parameters can be interpolated between the target locations for synthesis. A variety of stimuli for perception experiments can be quickly prepared using this option.

ACOPHON-II: Interactive Analysis-Editing-Synthesis:

  • Analysis-Editing: Interactive analysis can be used in order to obtain very high accuracy for a chosen segment of speech using the temporal, spectral match and inverse filtered signal. Analysis and editing can be done interactively till the desired accuracy is achieved. Pole-zero pairs can be introduced. Either cascaded formant model or a hybrid model may be used. The estimated parameters may be saved into a file along with the time instant of the analysis and a phone label. Analysis is usually performed at some select locations such as the beginning, mid-part and ending of the phone. That is at phone transition boundaries and mid-parts.
  • Synthesis-Editing: Since the analysis is done at some sparse locations, for synthesis, the parameters are interpolated at desired temporal resolution. During synthesis, the parameters can be edited. A variety of stimuli for perception experiments can be quickly prepared using this option.

ARTACO: Articulatory-Acoustics (vowels):

A mid-sagittal profile of the human vocal tract is presented. The positions of the articulators such as the tongue body center, jaw angle, lip opening and protrusion, hyoid height can be controlled. For a given set of articulatory positions, the vowel sounds can be synthesized. Also, the articulatory positions may be tuned for the user specified formant data. This option can be used to learn a large number of concepts relating to articulatory-acoustic relations.

Utility

  • Signal Recording Speech signal can be recorded and saved into a file. Sampling frequency, duration of signal to be recorded can be specified.
  • Signal Display, Segmentation and Editing: Speech signal (of even very long duration) can be displayed partly or wholly for segmentation or editing. Select parts can be segmented. Edit options: copy, delete, insert silence, insert file.
  • Signal Manipulation: Basic signal processing tasks such as adding signal files, scaling, normalizing, lowpass filtering, pre-emphasis, down-sampling, up sampling, down-sampling etc are provided.
  • Play Batch or Prepare Audio Tape: For perception experiments, a sequence of stimuli can be randomized and presented with controlled inter-stimulus-interval either in manual mode or automatic mode. Also, the entire sequence can be transferred to a single file. In manual mode, the user’s response (word or yes/no) can also be saved. Instructions can be played. A tone can be played to calibrate the SPL output.

Some Select Screen Shots:

Typical Screen Output of Wavespec Module for single channel

 

 

(a) Comparison of feature vectors of a phone from two different utterance for the same speaker

(b) Comparison of feature vectors of the same phone from two different utterance for two different speakers

 

WB Spectrogram in ‘Bark’ scale with superposed formant tracks for voiced segments

 

 

 

Screen Output of ACOPHON-I Module: Graphic Editing

 

 

Modules of SSL
The Modules of SSL are listed below. Details of the modules are given towards the end.

  • WaveSpec: Temporal & Spectral properties and comparison of one or two signals using LPCs.
  • WaveKep:Temporal & Spectral properties and comparison of one or two signals using Cepstrum.
  • Spectrograph (Voice Print): One or two channels; gray/coloured; wide-band, narrow-band, very-narrow band; frequency axis in Hz, mel, bark; controllable frequency range; controllable contrast, brightness; superpose pole tracks, formant tracks, F0 track; etc.
  • Labeling: Manual mode, semi-automatic with phone input, Dynamic time warping.
  • ACOPHON-I: ACOPHON stands for acoustics-phonetics. Analysis-Editing-Synthesis: Uniform frame rate ANALYSIS for extracting vocal tract filter parameters and source parameters – LP and formant based. Powerful GRAPHIC EDITOR to correct/manipulate acoustic parameters. Controlled SYNTHESIS, compression, duration change of selected segments.
  • A COPHON-II: Interactive Analysis-Editing-Synthesis:User controlled interactive analysis for high accuracy formant based.
  • ARTACO: Articulatory-Acoustics: Articulatory synthesis: facility to set the articulatory position of jaw, lips, tongue, hyoid height; obtain formant data; synthesize vowels.
  • Utility Module: Signal recording, signal segmentation, signal editing, scaling, signal processing (lowpass, preemphasis etc.)