Eddie Lun Tik Choy
Waveform Interpolation Speech Coder at 4 kb/s
M.Eng. Thesis, August 1998.
supervisor: P. Kabal
author contact: email@example.com
Speech coding at bit rates near 4 kbps is expected to be widely deployed in applications such as visual telephony, mobile and personal communications. This research focuses on developing a speech coder based on the waveform interpolation (WI) scheme, with an attempt to deliver near toll-quality speech at rates around 4 kbps. A WI coder has been simulated in floating-point using the C programming language. The high performance of the WI model has been confirmed by subjective listening tests in which the unquantized coder outperforms the 32 kbps G.726 standard (ADPCM) 98% of the time under clean input speech conditions; the reconstructed speech is perceived to be essentially indistinguishable from the original. When fully quantized, the speech quality of the WI coder at 4.25 kbps has been judged to be equivalent to or better than that of G.729 (the ITU-T toll-quality 8 kbps standard) for 45% of the test sentences. Further refinements of the quantization techniques are warranted to bring the coder closer to the toll-quality benchmark. Yet, the existing implementation has produced good quality coded speech with a high degree of intelligibility and naturalness when compared to the conventional coding schemes operating in the neighbourhood of 4 kbps.
All speech files are sampled at 8 kHz. The first two groups of sentences show the performance of the unquantized WI model for clean speech and for speech with background acoustic noise. The last group of sentences compares the performance of a fully quantized WI coder operating at 4.25 kb/s with the ITU-T G.729 coder operating at 8 kb/s.
The test sentences were not used in the development of the WI coder. The WI coder does not include a postfilter.