Full Text (PDF, 243KB), PP.46-52

Views: 0 Downloads: 0


Ye Li 1,* Jingde Xu 2 Qinghua Li 2 Huijuan Cui 2 Kun Tang 2

1. Shandong Computer Science Center Shandong Provincial Key Laboratory of computer Network

2. EE Department, Tsinghua University Beijing, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2010.02.07

Received: 5 Sep. 2010 / Revised: 12 Oct. 2010 / Accepted: 17 Nov. 2010 / Published: 8 Dec. 2010

Index Terms

Speech coding, superframe, pitch quantization, lsf quantization


Parameter quantization is very important for the synthetic speech quality of the vocoder. A new distortion measure for pitch as well as lsf quantization in ultra low bit rate Vocoder, whose parameters for several consecutive frames are grouped into a vector and jointly quantized to obtain high coding efficiency, is proposed based on mixed excitation linear prediction(MELP) vocoder. The product of sum of band pass voicing coefficients and gain parameter is used to denote the weighting factor of pitch as well as lsf parameters of current speech frame in the consecutive frames using weighted squared Euclidean distance measure to search the vector codebook. Comparing with the traditional method for a constant weighting factor by distinguishing Voiced/Unvoiced(UV) pattern of each speech frame, objective test results show that the quantization distortion of pitch is reduced by 3.3% and the mean opinion score (MOS) is increased by almost 0.1(3.5%).

Cite This Paper

Ye Li,Jingde Xu,Qinghua Li,Huijuan Cui,Kun Tang, "A NEW DISTORTION MEASURE FOR PARAMETER QUANTIZATION BASED ON MELP", IJIGSP, vol.2, no.2, pp.46-52, 2010. DOI: 10.5815/ijigsp.2010.02.07


[1]D.P. Kemp, J.S. Collura, T.E. Tremain, “Multi-frame coding of LPC parameters at 600-800 bps,” Proc. IEEE Inter. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 609-612, 1991.

[2]Ching W S, Wong W C, Bay H S. A very low bit-rate matrix quantized speech coder with Gray coding. International Symposium on Speech, Image Processing and Neural Networks Proceedings ISSIPNN-1994. New York, NY, USA: IEEE Press, 1994. 468~471.

[3]Athaudage C N, Bradley A B; Lech M. Optimization of a temporal decomposition model of speech. Proceedings of the Fifth International Symposium on Signal Processing and its Applications ISSPA-1999. Brisbane, Qld., Australia: Queensland Univ. Technol, 1999. 471~474.

[4]Griffin D W, Lim J S. Multiband Excitation Vocoder. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1988, 36(8): 1223~1235.

[5]Ney H. A dynamic programming technique for nonlinear smoothing. ICASSP, Atlanta, USA: IEEE Press, 1981:62-65.

[6]Sung-Joo K, Yung-Hwan O. Efficient quantisation method for LSF parameters based on restricted temporal decomposition. Electronics Letters, 1999, 35(12): 962~964.

[7]Phu C N, Akagi M. Improvement of the restricted temporal decomposition method for line spectral frequency parameters. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP2002. Orlando, FL, USA: IEEE Press, 2002. 265~268.

[8]Wei X, Dang X Y, Cui H J, et al. Voiced/Unvoiced classification recovery in the speech decoder based on GMM. International Conference on Signal Processing, Beijing: IEEE Press, 2008:546-548.

[9]T. Wang, K. Koishida, V. Cuperman, A. Gersho, J. S. Collura, “A 1200 bps speech coder based on MELP,” Proc. IEEE Inter. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, 5-9, June 2000, pp. 1375 – 1378

[10]XU Ming , LI Ye , Cui Hj , TANG K. Joint optimization algorithm for multi-parameter codebook size. The 9th International Conference on Signal Processing (ICSP), Peking, China, Oct 2008,Vol (1):514-517.

[11]A.V. McCree, T.P. Barnwell III. “A mixed excitation LPC vocoder model for low bit rate speech coding,” IEEE trans. Speech Audio Process., 1995, 3(4), pp. 242-250.

[12]McAulay R J, Quatieri T F. Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps. ICASSP, Dallas, USA: IEEE Press, 1987:1645-1648.

[13]Kohler M A. Comparison of the new 2400 Bps MELP federal standard with other standard coders. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP-1997. Munich, Germany: IEEE Press, 1997. 1587~1590.

[14]McCree A V, Barnwell T P. A mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Transactions on Speech and Audio Processing, 1995.

[15]T. Wang, K. Koishida, V. Cuperman, A. Gersho, J. S. Collura,, “A 1200/2400 bps coding suite based on MELP”, Speech Coding IEEE Workshop Proceedings, 6-9 Oct 2002, pp. 90 – 92.

[16]J. Makhoul, R. Viswanathan, R. Schwartz and A. W. F. Huggins, “A mixed-source model for speech compression and synthesis,” J. Acoust. Soc. Amer., vol. 64, pp. 1577-1581, Dec. 1978.

[17]S. Y. Kwon and A. J. Goldberg, “An enhanced LPC vocoder with no voiced unvoiced switch,” IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-32, pp. 851-858, Aug. 1984.

[18]ZHAO Ming, Research on ultra low bit rate speech coding techniques and algorithms[D]. Tsinghua University, Beijing, 2004. (in Chinese)

[19]K. Zeger and A. Gersho, “Pseudo-Gray Coding,” IEEE Trans. On Communications, vol. 38, pp. 2147-2158, Dec. 1990.