Telecommunications & Signal Processing Laboratory

Thesis Abstracts, 1999-2001

Nader Sheikholeslami Alagha

Modulation, Pre-Equalization and Pulse Shaping for PCM Voiceband Channels

Ph.D. Thesis, December 2001

Supervisor: P. Kabal

For the past few decades, analog voiceband modems have been used extensively to carry digital information over the Public Switched Telephone Network (PSTN). Conventional voiceband modems treat the PSTN as an analog communication channel. However, today's PSTN is mostly a digital network except for the basic telephone services provided via analog subscriber lines. The conventional model of analog voiceband channels is no longer adequate to characterize the physical connection between terminals with direct digital access to the network and voiceband modems connected to analog subscriber lines. Such a connection requires a different model in each direction. There are now international modem standards which support rates of up to 56 kbits/s for the down-stream channel.

This dissertation examines the more challenging part of the voiceband communication channel, i.e., the up-stream direction connecting an analog subscriber to the digital network. The major source of distortion on the up-stream channel is quantization error caused by analog-to-digital conversion performed as part of the encoding to Pulse-Code Modulation (PCM) process. A bandpass filter prior to the PCM encoder restricts the bandwidth while the sampling rate of the PCM encoder is predetermined by the network. Signalling in the presence of such constraints lead to theoretical problems as well as practical concerns in modem design.

Communication models that characterize PCM voiceband channels are developed. We investigate modulation design and related issues including index mapping, constellation design and constellation probability assignment to match the pre-determined structure of the detector at the receiver, i.e., the PCM encoder at the central office.

We develop a framework for transmitter structures that can avoid or reduce Inter-Symbol Interference (ISI) at the receiver in order to sidestep the limited bandwidth of the up-stream channels and the fixed sampling rate of the up-stream channel. Techniques employed include linear filtering, spectral shaping and precoding to reduce the ISI, while limiting the average transmitted signal power. A filterbank structure for pre-equalizing channels with spectral nulls is also described.

A new method for pulse shaping design is proposed. The new pulse shaping filters provide a compatible design that can be used for the up-stream PCM channel as well as to the cascade of the up-stream and the down-stream channels.

Compared to conventional modem design, the proposed modulation and pre-equalization techniques together allow for an increase of the data transmission rates in the up-stream direction of up to 50%.

Ejaz Mahfuz

Packet Loss Concealment for Voice Transmission over IP Networks

M.Eng. Thesis, September 2001

Supervisor: P. Kabal

Voice-over-IP (VoIP) uses packetized transmission of speech over the Internet (IP network). However, at the receiving end, packets are missing due to network delay, network congestion (jitter) and network errors. This packet loss degrades the quality of speech at the receiving end of a voice transmission system in an IP network. Since the voice transmission is a real-time process, the receiver cannot request for retransmission of the missing packets. Concealment algorithms, either transmitter or receiver based, are used to replace these lost packets. The packet loss concealment (PLC) techniques described in the standards ANSI TI.521 (Annex B) and ITU-T Rec. G.711 (Appendix I), have good performance, but these algorithms do not use subsequent packets for reconstruction. Furthermore, there are discontinuities between the reconstructed and the subsequent packets, especially at the transitions from voiced to unvoiced and phoneme to phoneme.

The goal of this work is to develop an improved PLC algorithm, using the subsequent packet information when available. For this, we use the Time-Scale Modification (TSM) technique based on Waveform Similarity Over-Lap Add (WSOLA) to reconstruct the dropped or lost packets. The algorithm looks ahead for subsequent packets. If these packets are not available for reconstruction, the algorithm uses information from past packets. Subjective tests show that the proposed method improves the reconstructed speech quality significantly.

Wesley Pereira

Modifying LPC Parameter Dynamics to Improve Speech Coder Efficiency

M.Eng. Thesis, September 2001

Supervisor: P. Kabal

Reducing the transmission bandwidth and achieving higher speech quality are primary concerns in developing new speech coding algorithms. The goal of this thesis is to improve the perceptual speech quality of algorithms that employ linear predictive coding (LPC). Most LPC-based speech coders extract parameters representing an all-pole filter. This LPC analysis is performed on each block or frame of speech. To smooth out the evolution of the LPC tracks, each block is divided into subframes for which the LPC parameters are interpolated. This improves the perceptual quality without additional transmission bit rate. A method of modifying the interpolation endpoints to improve the spectral match over all the subframes is introduced. The spectral distortion and weighted Euclidean LSF (Line Spectral Frequencies) distance are used as objective measures of the performance of this warping method. The algorithm has been integrated in a floating point C-version of the Adaptive Multi Rate (AMR) speech coder and these results are presented.

Mohammad M. A. Khan

Coding of Excitation Signals in a Waveform Interpolation Speech Coder

M.Eng. Thesis, July 2001

Supervisor: P. Kabal

The goal of this thesis is to improve the quality of the Waveform Interpolation (WI) coded speech at 4.25 kbps. The quality improvement is focused on the efficient coding scheme of voiced speech segments, while keeping the basic coding format intact. In the WI paradigm voiced speech is modelled as a concatenation of the Slowly Evolving pitch-cycle Waveforms (SEW). Vector quantization is the optimal approach to encode the SEW magnitude at low bit rates, but its complexity imposes a formidable barrier.

Product code vector quantizers (PC-VQ) are a family of structured VQs that circumvent the complexity obstacle. The performance of product code VQs can be traded off against their storage and encoding complexity. This thesis introduces split/shape-gain VQ - a hybrid product code VQ, as an approach to quantize the SEW magnitude. The amplitude spectrum of the SEW is split into three non-overlapping subbands. The gains of the three subbands form the gain vector which are quantized using the conventional Generalized Lloyd Algorithm (GLA). Each shape vector obtained by normalizing each subband by its corresponding coded gain is quantized using a dimension conversion VQ along with a perceptually based bit allocation strategy and a perceptually weighted distortion measure. At the receiver, the discontinuity of the gain contour at the boundary of subbands introduces buzziness in the reconstructed speech. This problem is tackled by smoothing the gain versus frequency contour using a piecewise monotonic cubic interpolant. Simulation results indicate that the new method improves speech quality significantly.

The necessity of SEW phase information in the WI coder is also investigated in this thesis. Informal subjective test results demonstrate that transmission of SEW magnitude encoded by split/shape-gain VQ and inclusion of a fixed phase spectrum drawn from a voiced segment of a high-pitched male speaker obviates the need to send phase information.

Joachim Thiemann

Acoustic Noise Suppression for Speech Signals using Auditory Masking Effects

M.Eng. Thesis, May 2001

Supervisor: P. Kabal

The process of suppressing acoustic noise in audio signals, and speech signals in particular, can be improved by exploiting the masking properties of the human hearing system. These masking properties, where strong sounds make weaker sounds inaudible, are calculated using auditory models. This thesis examines both traditional noise suppression algorithms and ones that incorporate an auditory model to achieve better performance. The different auditory models used by these algorithms are examined. A novel approach, based on a method to remove a specific type of noise from audio signals, is presented using a standardized auditory model. The proposed method is evaluated with respect to other noise suppression methods in the problem of speech enhancement. It is shown that this method performs well in suppressing noise in telephone-bandwidth speech, even at low Signal-to-Noise Ratios.

Anne Kumar

Tandem Free Operation Simulator for Wireless Communications

M.Eng. Project, January 2001

Supervisor: M. L. Blostein

Speech coding is used as an effective method to achieve bandwidth efficiency. The disadvantage of using speech compression is that it introduces signal distortion, which becomes more severe if a signal is passed through a series of coder/decoder operations. There are encoders and decoders (or transcoders) present in the mobile handsets as well as in the base stations. The transcoders in the base stations provide a considerable gain in system capacity but degrade the speech quality. Any call originating at a mobile is first converted into a compressed digital bit stream at the base station, then decoded to a 64 kbps digital signal for transmission over the terrestrial network. If the call destination is another mobile, another coding from 64 kbps to the compressed digital bit stream must take place, degrading the speech signal a second time. The speech quality may be improved if the compressed signal is not decoded at the terrestrial interface, but rather is transmitted as a data signal to the destination mobile, avoiding one of the coding steps.

This approach has recently been standardized as the Tandem Free Operation (TFO) protocol. TFO bypasses the decoder if a call is destined to another mobile telephone. The motivation behind this is that by removing the transcoding performed in the base stations, better end-to-end speech quality is realized.

This report studies the behavior of the TFO protocol standard. TFO is applicable only if the transmitting and destination mobiles use the same speech coding algorithms. The standard describes two different state machines that perform the same TFO functionality. The difference being that one state machine has additional states to handle hand-offs and the other does not. The main goal of this report is to investigate the need for the extra states. In this study, a high-level language is used to simulate the TFO protocol. Specific tests are performed, such as, for local and distant hand-offs. The results reveal that the additional hand-off information does not contribute significantly to its performance.

Simon E. M. Plain

Bit Rate Scalability in Audio Coding

M.Eng. Thesis, April 2000

Supervisor: P. Kabal

In recent years, audio coding has proved to be of great importance for applications such as mobile communication and multimedia systems. Our research has focused on adapting a low bit rate/fixed bit rate audio coder to handle signals at a wide range of bit rates and sampling rates. Bit rates as low as 4 kbps and sampling rates between 4 kHz and 16 kHz were incorporated. Further, the coder can change bit rates during transmission in order to adapt to the available channel bandwidth. The coder was implemented in the C programming language and its performance evaluated. It was found that the quality of the reproduced audio for this coder requires some improvement, but the scalability performs smoothly, with transitions causing no artifacts, and reasonable loss of quality at the lowest bit rates.

Tamanna Islam

Interpolation of Linear Prediction Coefficients for Speech Coding

M.Eng. Thesis, April 2000

Supervisor: P. Kabal

Speech coding algorithms have different dimensions of performance. Among them, speech quality and average bit rate are the most important performance aspects. The purpose of the research is to improve the speech quality within the constraint of a low bit rate.

Most of the low bit rate speech coders employ linear predictive coding (LPC) that models the short-term spectral information as an all-pole filter. The filter coefficients are called linear predictive (LP) coefficients. The LP coefficients are obtained from standard linear prediction analysis, based on blocks of input samples.

In transition segments, a large variation in energy and spectral characteristics can occur in a short time interval. Therefore, there will be a large change in the LP coefficients in consecutive blocks. Abrupt changes in the LP parameters in adjacent blocks can introduce clicks in the reconstructed speech. Interpolation of the filter coefficients results in a smooth variation of the interpolated coefficients as a function of time. Thus, the interpolation of the LP coefficients in the adjacent blocks provides improved quality of the synthetic speech without using additional information for transmission.

The research focuses on developing algorithms for interpolating the linear predictive coefficients with different representations (LSF, RC, LAR, AC). The LP analysis has been simulated; and its performance has been compared by changing the parameters (LP order, frame length, window offset, window length). Experiments have been performed on the subframe length and the choice of representation of LP coefficients for interpolation.

Simulation results indicate that speech quality can be improved by energy weighted interpolation technique.

Hossein Najafzadeh-Azghandi

Perceptual Coding of Narrowband Audio Signals

Ph.D. Thesis, April 2000 (2002-03-12, corrections to graph axis label, p. 30)

Supervisor: P. Kabal

New applications such as Internet broadcast and communications, consumer multimedia products, digital AM broadcast and satellite networks are emerging. Those applications require moderate audio quality without annoying artifacts at bit rates below 16 kbit/s. Although speech coders provide high speech quality at bit rates around 8 kbit/s, they perform poorly when encoding audio signals. In this thesis, we present a novel transform coding paradigm based on the characteristics of the human hearing system. The proposed encoder, i.e., Narrowband Perceptual Audio Coder (NPAC), can accommodate a wide range of narrowband audio inputs without annoying artifacts at bit rates down to 8 kbit/s.

NPAC employs a variety of algorithms to remove the perceptually irrelevant parts and statistical redundancies of the input signal. The new algorithms used in NPAC include a perceptual error measure in training the codebooks and selecting the best codewords, perceptually-based bit allocation algorithms and an adaptive predictive scheme to vector quantize the scale factors.

The proposed encoder has moderate complexity and delivers good quality for narrowband audio inputs at around 1 bit/sample. Informal subjective tests have been conducted to compare the performance of NPAC with an 8 kbit/s commercially-available audio coder. The tests results show that NPAC performs better for both music and speech inputs.

Thesis titles.