Audio Demonstration

Telecommunications & Signal Processing Laboratory

Audio Demonstration

Spacializing Simultaneous Speech with Application to Increasing Understanding
B.Eng. (Honours) Thesis, April 1997.
supervisor: P. Kabal
author contact: plains@TSP.ECE.McGill.CA

How well we understand speech from several simultaneous speakers depends on several factors, including whether we process the speech in a serial or parallel fashion. Our ability to understand one speaker amongst many is quite strong but a much greater challenge is to understand two speakers at once, especially if we are hearing these speakers over headphones. An enhancement to the speech which will aid us in this endeavour is desired.

One proposed method of increasing this ability is to cause the listener to perceive the speaker as being separated in space. This paper will examine how this can be done using digital signal processing to allow the listener to hear the moved speech over headphones.

Our perception of a speaker's location has do with the speaker's direction relative to the listener and the environment (room, open space, concert hall, etc.) around them. The speaker's direction can be simulated by filtering the speech though a stereo finite impulse response filter called an HRTF (Head Related Transfer Function). The speaker's distance can be simulated by sending reverberation to the listener. Reverberation is composed of early and late reflections of sound off the surfaces in a room. The ratio of direct sound to reverberant sound is a strong cue to the distance of a sound source.

An algorithm was implemented to perform these transformations and tested with several subjects. The subjects were able to determine direction fairly well although the well documented front-back reversal error was often encountered. Distance is difficult to model properly in headphones due to the sound source being right beside the ear. Consequently, tests on subject's estimation of distance resulted in the judgment being quite a bit shorter than the desired distance. Reverberation, however, clearly helped in externalizing the sound from the head.

Spatialization of sound was then applied to the problem of parallel speech understanding, and several tests were performed. The results indicated that parallel speech was indeed easier to understand when the speakers were separated and externalized from the head. Understanding was higher for separated speakers than for one speaker in each ear (one mono-phone speech file per ear) and for both speakers superimposed at a distance external to the head.

Demonstration sound files:

Testing 3-D spatialization.

3dMMono.au [194 kB]: Monophonic speech (equal sound in each ear) [male speaker]
3dMNoReverb.au [390 kB]: HRTF processed speech, 25 degrees to the left, no reverberation [male speaker]
3dMEarlyReverb.au [392 kB]: HRTF processed speech, 25 degrees to the left, early reverberation added [male speaker]
3dMReverb.au [392 kB]: HRTF processed speech, 25 degrees to the left, all reverberation added [male speaker]
3dFReverb.au [398 kB]: HRTF processed speech, 25 degrees to the left, all reverberation added [female speaker]

Testing parallel listening / understanding.

3dFDichotic.au [379 kB]: Two utterances, one in each ear [female speaker]
3dFFront.au [381 kB]: Two utterances, HRTF processed, both at 90 degrees [female speaker]
3dLeftRight.au [398 kB]: Two utterances, HRTF processed, one at 25 degrees to left, the other at 25 degrees to the right [female speaker]