Products > Speech Science Lab (SSL)
SSL-Professional Edition
a comprehensive software package for education and research in speech science
Application areas of SSL
-
Education and Research in Speech Science, Phonetics, Linguistics, Speech and Hearing.
-
Speech Analysis, Synthesis, Speech Recognition, Perception, Speaker Recognition etc.
-
Spectral Analysis of chanting, music, animal sounds etc.
-
Spectral Analysis of machinery noise etc.
-
Forensic Sciences
Several Doctoral theses, graduate level dissertations have been done using SSL.
A large number of projects have been done using SSL. Some Examples: Speech Analysis of Indian languages - Development of speech in children -
Voice transfomration - Analysis of frog calls, cuckoo calls - Quality control of automobile
horns etc.
SSL is easy to operate and user friendly software package.
It requires no programming skills.
An
Experimental Course in Speech Science
SSL can be used as an educational tool for an experimental
course in speech science. Necessary speech database is provided. A comprehensive Course
book guides the user while systematically explaining the concepts in speech science.
It covers topics such as temporal and spectral properties of speech signals, Spectrogram
reading, Articulatory-acoustics.
Course Detail
Modules
Utility Module
Signal Recording, Display and Editing:
Speech signal can be recorded and saved into a file. Sampling frequency,
duration of signal to be recorded can be specified. Speech Signal can be displayed
partly or wholly. Signal can be normalized; signal can be
scaled;,a part or whole of a signal can be played.
Edit options: copy, delete, insert silence, insert file.
Signal Manipulation:
Basic signal processing tasks such as adding signal files, scaling,
lowpass filtering, pre-emphasis, down-sampling etc are provided.
Wavespec Module:
The user can visualize the signal waveform along with the short-time spectrum,
spectral envelope and auto-correlation function. Variables can be set by the user
to obtain these functions. Intensity, fundamental frequency and formants for a
segment around a marked location are shown. Two signals can be opened for a comprative study.
Labeling Module
Labeling:
The signal file to be labeled is opened. The Program automatically marks the segments
based on the manner of production into classes such as 'voiced', 'unvoiced',
'burst or stops', 'silence', 'mixed' etc. Also, tentative segment boundaries
are shown. The user has to highlight a phonetically significant segment and
attach a phoneme or allophone label and context. The user assigned label
along with the beginning and ending locations, phonetic context are saved in
a database along with the Language and Speaker's identity taken from a header
attached to the signal file. Label file can be printed in text mode with label,
context, beginning and ending locations in msec.
Database Access
The user can retrieve the
speech signal corresponding to any desired phonetic segment.
For example, the user can inquire and retrieve segments of
vowel /a/ in the entire database occurring in a particular
CVC context for a particular speaker for a particular language.
All occurrences of vowel /a/ for the specified conditions can be
pulled out from the files and concatenated. Thus the variation in
vowel /a/ as a function of speaker or phonetic context or grammatical
category can be studied.
Spectrograph Module
Spectrograph is a tool for the generation of
'spectrogram' - a three dimensional pictorial representation of a signal.
The x-axis represents the time, the y-axis represents the frequency and
the energy at that time-frequency location is shown by the gray level or
in color. The dynamic variation of temporal and spectral properties in
the signal are clearly seen in the spectrogram.
Initially spectrograph was developed as an aid for
the hearing impaired so that they can visualize the
speech and be able to read the spectrogram as a substitute
for auditory processing. But spectrogram is now an indispensable
tool for a phonetician to make fine distinctions and to note the
subtle variations of a given phoneme in different contexts.
SSL provides a tool for an efficient computation
and presentation of spectrogram in a variety of formats. Broad-band,
Narrow-band and Very broad-band spectrograms of any desired duration
of a signal can be obtained. Optionally, spectral section at the
marked time location can be obtained. Mouse pointer reads out the
dB level at any chosen time-frequency location. Gray or color spectrogram
can be obtained. The frequency axis can be in Hz, mel or Bark.
The frequency-scale can be expanded. The contrast can be enhanced or
reduced and dB level can be increased or decreased.
If the signal file has been labeled by
the user then the labels are shown below the spectrogram. The
label fields can be edited using the spectrogram as a reference.
If the signal file has been analyzed by the user then
Formant and Pitch (F0) tracks can be superposed on the spectrogram.
Two channel spectrogram
is useful to compare temporal
and spectral properties of two signals such as original
and synthesized speech signals. Facility is available for
contrast enhancement; Gain editing; Color Vs Gray Scale; y-axis
scale in Hz or mel or Bark.
ACOPHON
ACOPHON is an acronym for acoustic-phonetics. Analysis programs in SSL are of two types
-
Acophon-I: Block analysis or Uniform frame rate analysis - There are two models: LP based and Formant based.
-
Acophon-II: Interactive Analysis at user specified locations, interpolation and synthesis - Formant based.
ACOPHON-I Module
Block Analysis-Editing-Synthesis:
In Analysis, speech signal is divided into a number of
overlapping blocks or frames. For example frames or
blocks of 40 msec with inter-frame interval (resolution) of
10 msec yields: Frame #1 between 0 to 40 msec, Frame #2 between
10 and 50 msec, Frame #3 between 20 and 60 msec and so on. Thus
there will be 100 frames per second of speech. For each frame or
block the following acoustic parameters are extracted: Linear
prediction coefficients, Cepstral coefficients, autocorrelation coefficients,
Parcor coefficients, Formant frequencies, F0 or Pitch, Source and Speech Intensity,
glottal parameters, manner class (voiced/unvoiced/burst/mixed etc).
Once the parameters have been extracted they can be used for a variety of
applications such as speech recognition, coding, efficient storage or
compression of speech, voice mail, speech synthesis etc.
|
 |
|
Two models are available for synthesis: Voice Source excited Linear
Prediction model and Formant based model. A Graphic editor cum synthesis
tool is available to correct or to purposely manipulate the source and
filter parameters. Acoustic parameters can be averaged or linearly interpolated
or scaled between any two locations marked by the user. Speech signal can be synthesized.
Estimated and edited parameters can be transferred to a database of acoustic
parameters along with a phoneme (allophone) label and context. The parameters
along with the speaker's and language identity are saved in the database. Synthesized
signal can be saved. Acoustic parameters from the database can be loaded at any
desired location.
ACOPHON-II Module
Interactive Analysis-Synthesis:
There are two models, Cascaded Formant Model and Hybrid (Modified Klatt's) Model.
Speech signal is made up of phonemes. In an utterance each phoneme has
three major events or targets: Onset, mid-part and transition into the next phoneme.
In Interactive Analysis the user selects the phonetically significant
events or targets in a given utterance. Analysis is performed interactively
at the chosen targets. Estimated parameters can be edited and validated using
segmental analysis-synthesis approach. There is a facility to introduce
pole-zero pairs. The acoustic parameters analyzed at a target can be transferred
to a database. The analysis parameters are labeled according to the phoneme category
and context. Also, analyzed parameters corresponding to a series of targets can be
saved in a file. An Interactive editing tool can be used to either create a new
parameter file or edit an existing file. Parameters can be loaded from a database
and concatenated. During editing, mixed excitation with controlled voiced Vs noise
intensities, source-filter interaction etc. can be introduced. The concatenated
Parameters' file can be saved into a database. Synthesized speech signal can also be saved.
In ACOPHON-II synthesis,
the acoustic parameters are linearly interpolated between the targets.
After a sufficient number of sentences are analyzed, a generalization can be
arrived leading to high quality Text-to-Speech Synthesis system for any desired language.
There is sufficient flexibility in the models
so that they may be adopted for the synthesis of speech sounds that occur in various
languages such as aspirated stops, retroflex consonants, nasalized sounds etc.
Articulatory Acoustics (Vowels)
In SSL, it is possible to position the articulators and
obtain the formant frequencies for the set positions. Also vowel sound can be
synthesized and played. Conversely, given a vowel its formant frequencies can be
estimated. Then the articulatory positions of the model can be manipulated till the
spectrum generated by the model matches with that of the signal. Thereby the ariculatory
positions for a given pronunciation can be deduced. This needs to be extended to consonant
production.
PC Requirement, Support Hardware and Accessories
(Click to open the link.)
Go Back
Copyright 2002 Voice and Speech Systems
|
|