Perceptual Postfiltering for Low Bit Rate Speech Coders

M.Eng. Thesis, November 2007

Supervisor: P. Kabal

Adaptive postfiltering has become a common part of speech coding standards based on the Linear Prediction Analysis-by-Synthesis algorithm to decrease audible coding noise. However, a conventional adaptive postfilter is based on empirical assumptions of masking phenomena, which sometimes makes it hard to balance between noise reduction and speech distortion.

This thesis introduces a novel perceptual postfiltering system for low bit rate speech coders. The proposed postfilterworks at the decoder, as is the case for the conventional adaptive postfilter. Specific human auditory properties are considered in the postfilter design to improve speech quality. A Gaussian Mixture Model based Minimum Mean Squared Error estimation of the perceptual postfilter is performed with the received information at the decoder. Perceptual postfiltering is then applied to the reconstructed speech to improve speech quality. Test results show that the proposed system gives better perceptual speech quality over conventional adaptive postfiltering.

M.Eng. Thesis, March 2007

Supervisors: R. Rose and P. Kabal

This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, frequency warping based speaker normalization, and discriminant feature-space transformation (DFT). These interactions were quantified by performing experiments on the connected digit utterances comprising the Aurora 2 database, using continuous density hidden Markov models (HMM) representing individual digits.

Firstly, given that the performance of speaker normalization techniques degrades in the presence of noise, it is shown that reducing the effects of noise through environmental compensation, prior to speaker normalization, leads to substantial improvements in ASR performance. The speaker normalization techniques considered here were vocal tract length normalization (VTLN) and the augmented state-space acoustic decoder (MATE). Secondly, given that discriminant feature-space transformation (DFT) are known to increase class separation, it is shown that performing speaker normalization using VTLN in a discriminant feature-space leads to improvements in the performance of this technique. Classes, in our experiments, corresponded to HMM states. Thirdly, an effort was made to achieve higher class discrimination by normalizing the speech data used to estimate the discriminant feature-space transform. Normalization, in our experiments, corresponded to reducing the variability within each class through the use of environment compensation and speaker normalization. Significant ASR performance improvements were obtained when normalization was performed using environment compensation, while our results were inconclusive for the case where normalization consisted of speaker normalization. Finally, aimed at increasing its noise robustness, a simple modification of MATE is presented. This modification consisted of using, during recognition, knowledge of the distribution of warping factors selected by MATE during training.

Youssouf Ould-Cheikh-Mouhamedou

On Distance Measurement Methods for Turbo Codes

Ph.D. Thesis, November 2005

Supervisor: P. Kabal

New digital communication applications, such as multimedia,
require very powerful error correcting codes that deliver low error rates while
operating at low to moderate signal-to-noise ratios (SNRs). Turbo codes have
reasonable complexity and can achieve very low error rates if a proper
interleaver design is in place. The use of well-designed interleavers result in
very low error rates, especially for medium to long interleavers where turbo
codes offer the greatest potential for achieving high minimum distance (*d*_{min})
values.

The reliable determination of a code’s error performance at
very low error rates using simulations may take months or may not be practical
at all. However, the knowledge of *d*_{min} and its multiplicities
can be used to estimate the error rates at high SNR. This thesis is concerned
with efficient and accurate distance measurement methods for turbo codes. Since
high values of *d*_{min} can be caused by high input weight values,
say up to 20 or higher, if a brute force algorithm is used the accurate
determination of *d*_{min} requires that all possible input
sequences of input weight up to 20 be tested. Testing all possible input
sequences becomes impractical as the size of the interleaver and the value of
input weight increase. Thus, the accurate determination of the distance
spectrum, or at least *d*_{min} and its multiplicities, is a
significant problem, especially for interleavers that yield high *d*_{min}.
Based on Garello’s true distance measurement method, this thesis presents an
efficient and accurate distance measurement method for single- and double-binary
turbo codes that uses proper trellis termination such as dual-termination or
tail-biting. This method is applied to determine the distance spectra for the
digital video broadcasting with return channel via satellite (DVB-RCS) standard
double-binary turbo codes. It is also used to design new interleavers for
DVB-RCS that yield a significant improvement in error performance compared to
those in the standard.

This method fits particularly well with tail-biting turbo
codes that use structured interleavers. This is because the distance properties
repeat and this method can use this knowledge to reduce the search space. The
reduction in search space results in significant reduction in complexity (i.e.,
execution time), which allows the determination of high *d*_{min }
values in reasonable time. This efficiency is demonstrated for both single- and
double-binary turbo codes, using structured interleavers that have high *d*_{min}
values for various code rates. This method reduces the execution times by a
factor of 40 to 400.

A Study of Bit Allocation for Gaussian Mixture Model Quantizers and Image Coders

M.Eng. Thesis, September 2005

Supervisors: P. Kabal and R. Rose

This thesis describes different bit allocation schemes and
their performances when applied on coding line spectral frequencies (LSF) using
the GMM-based coder designed by Subramaniam and a simple image transform coder.
The new algorithms are compared to the original bit allocation formula, the *
Pruning* algorithm used by Subramaniam, Segall's method and the *Greedy*
bit allocation algorithm using the Log Spectral Distortion and the Mean-Square
Error for the LSF quantizer and the Peak Signal-to-Noise Ratio for the image
coder.

First, a Greedy level allocation algorithm is developed based on the philosophy of the Greedy algorithm but it does so level by level, considering the best benefit and bit cost yielded by an allocation. The Greedy level allocation algorithm is computationally intensive in general, thus we discuss combining it with other algorithms to obtain lower costs.

Second, another algorithm solving problems of negative bit
allocations and integer level is proposed. The level allocations are to keep a
certain ratio with respect to each other throughout the algorithm in order to
remain closest to the condition for lowest distortion. Moreover, the original
formula assumes a 6dB gain for each added bit, which is not generally true. The
algorithm presents a new parameter *k*, which controls the benefit of
adding one bit, usually set at 0.5 in the high-rate optimal bit allocation
formula for MSE calling the new algorithm, the Two-Stage Iterative Bit
Allocation (TSIBA) algorithm. Simulations show that modifying the bit allocation
formula effectively brings about some gains over the previous methods.

The formula containing the new parameter is generalized into
a formula introducing a new parameter which weights not only the variances but
also the dimensions, training the new parameter on their distribution function.
The TSIBA was an *a-posteriori* decision algorithm, where the decision on
which value of *k* to select for lowest distortion was decided after
computing all distortions. The Generalized TSIBA (GTSIBA), on the other hand,
uses a training procedure to estimate which weighting factor to set for each
dimension at a certain bit rate. Simulation results show yet another improvement
when using the Generalized TSIBA over all previous methods.

An Enhanced Joint Source-Channel Decoder

M.Eng. Thesis, May 2005

Supervisor: F. Labeau

Tandem coding and decoding has been demonstrated to yield arbitrarily low error rates provided a sufficiently large block length is used. When applied to practical systems that are inherently limited to a finite complexity and therefore to finite block lengths, such a strategy may be largely suboptimal. Indeed, a tandem decoding strategy ignores two types of information: the source memory and the residual redundancy of the source coder. Moreover, conventional source decoders, within a tandem decoding strategy, are designed to perform the inverse operation of the source coder and may severely decrease performance if errors are still present at their input. One viable alternative, that has been demonstrated to yield gains, is the design of a joint source-channel decoding scheme that would take the additional sources of redundancies - the source memory and the source coder’s residual redundancy - into account. In this context, we propose a novel, iterative joint source-channel decoding algorithm. The proposed scheme is derived from a Bayesian network representation of the coding chain and incorporates three types of information: the source memory; the residual redundancy of the source coder; and finally the redundancy introduced by the channel coder. Specifically, we modify an existing algorithm by first deriving a new, equivalent Bayesian network representation of the coding chain. Next, we derive a fully consistent methodology, within the framework of Bayesian networks, for performing the iterations. The proposed algorithm is shown to yield significant gains along with a drastic reduction in computational complexity when compared with the existing one. Finally, we outline additional possible improvements on the proposed algorithm. They include methods for further reductions in computational complexity at no cost in performance in one case, and at a slight cost in performance in the other.

Quantization Noise Shaping in Oversampled Filter Banks

M.Eng. Thesis, April 2005

Supervisors: F. Labeau, P. Kabal

The use of a noise-shaping system in oversampled filter banks has been shown to improve the effective resolution of subband coders. Although the filter banks directly determine the noise-shaping coefficients, a comparison between theoretical and simulated results has not been done, while the effect of the selection of the filter banks on the performance of the noise-shaping system has not yet been evaluated. Therefore, an algorithm for the generation of cosine-modulated perfect-reconstruction filter banks is presented, such that the generated filters could be used as a test bed. The optimal noise-shaping coefficients are then derived, and the noise-shaping system is inserted into the subband coder. It is found that the theoretical results agree with the simulations, but that the performance of the noise-shaping system is limited by ill-conditioning at higher system orders. An increase in filter length and an increase in the degree of overlap between neighbouring channels contribute independently to a better performance. Also, it is seen that near-perfect reconstruction filter banks are limited by their reconstruction error but yield good results at low bitrates.

Simplified Trellis Decoding of Block Codes by Selective Pruning

M.Eng. Thesis, January 2005

Supervisor: F. Labeau

Error correcting codes are of paramount importance for reliable communications. By adding redundancy to the transmitted data they allow the decoder to detect and correct errors. However in favorable channel conditions, a part of this redundancy can be removed in order to increase throughput. Unfortunately most coding schemes are poorly adapted to these higher coding rates. For example, the decoding of block codes grows exponentially with code length. In this thesis we propose a novel solution to this problem: selective trellis pruning. Selective trellis pruning reduces decoding complexity by removing certain codewords from the trellis. This reduction is accomplished by making hard decisions on the values of bits in the received sequence above the certainty threshold. This method can produce near-optimal results with only a fraction of the operation required by full decoding thanks to the reduced trellis size. In this work we also introduce an innovative way of obtaining the pruned trellis directly from a simplified version of the generator matrix. By using this method we avoid the long process of constructing and then pruning the full trellises, thus making the selective trellis pruning algorithm an efficient decoding tool. Finally we apply this algorithm to the parallel concatenated turbo block code decoder in order to reduce its complexity.

Audio Coding with an Excitation Pattern Distortion Measure

M.Eng. Thesis, January 2005

Supervisor: P. Kabal

A new distortion measure for audio coding is proposed, posed as a distance measure on the space of excitation patterns. We investigate the psychoacoustic properties of the measure, as well as the implementation issues that arise under a constrained-distortion coding structure. Experimental results show that the excitation distortion metric produces higher-quality coded files than the usual Noise-to-Mask ratio measure, at the same rate.

Thesis titles.