Speech Perception Laboratory

UT Dallas Logo

Cochlear implants

Our research related to cochlear implants (CIs) spans a number of areas:


Many of our experiments make use of acoustic simulations of cochlear implant (CI) processing based on channel vocoders. Below are some audio examples of what speech sounds like through these “CI simulations.” The audio demos are accompanied by their respective spectrograms, which visually represent the speech signals in time-frequency space. The sample sentence we use below is taken from the IEEE corpus.


Sample sentence

target


The sample sentence, "The heap of fallen leaves was set on fire," is uttered by a male speaker. The bright bands in the spectrogram represent frequency regions of high energy density, known as formants. The perceptual category of a given spoken vowel varies predictably with the first and second formants, and the vowel change in the audio can be tracked visually by following the first and second bright bands on the spectrogram.


CI simulation of the sample sentence

target


The above audio example demonstrates the original sentence processed through a cochlear implant simulation. The speech is divided and sent through 22 bandpass filters, then 8 channels with the highest amplitudes are selected at each time instant (maxima selection). The cutoff frequencies for the bandpass filters are the same as those used in the ACE processing strategy in Cochlear Ltd. devices. Compared with the unprocessed speech signal, the formant cues are partially preserved and the F0 cues are not fully available to the listener.


Sample sentence in noise

target


In this demo, cafeteria noise is added to the original sentence at a signal-to-noise ratio of +10 dB. As can be seen in the horizontal structure of the spectrogram, F0 cues are still available to the listener.


CI simulation of the sample sentence in noise

target


The above mixture is processed using the same cochlear implant simulation as the CI simulation in unprocessed speech example. Note that the maxima selection processing (i.e. ACE processing strategy) has selected the noise dominated channels in the 2-5 kHz region. CI users experience great difficulties in the presence of background noise and reverberation and must exert a large amount of listening effort to understand speech under such circumstances.


back to projects