To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set.
To promote the performance of the traditional multichannel filter bank which leads to speech quality degradation,an efficient design method of the non-uniform cosine modulated filter bank(CMFB) based on the audiogram for digital hearing aids is proposed. First, a low-pass prototype filter is designed by the linear iterative algorithm. Secondly,the uniform CMFB is achieved on the basis of the principle formulas. Then, the adjacent channels of a uniform filter bank which have low or gradual slopes are merged according to the trend of audiogram of the hearing impaired person. Finally,the corresponding non-uniform CMFB is obtained. Simulation results show that the signal processed by the proposed filter bank is similar to the original signal in a time-domain waveform and spectrogram without significant distortion or difference. The speech quality results show that the personal evaluation of speech quality(PESQ) of non-uniform CMFB is 35% higher than that of the traditional design, and the hearing-aid speech quality index(HASQI) increases by about 40%.
声源定位技术是语音增强、语音识别技术的前提和基础。基于麦克风阵列的声源定位技术已经成为一大研究热点,其广阔的应用前景得到了广泛的关注。本文提出基于变步长标准最小均方差VLMS(Variable Step Size Least Mean Square)的声源定位算法。该算法利用VLMS算法自适应估计声源到麦克风的脉冲响应系数,进而估计出各麦克风之间时延,并利用几何方法定位声源在3D空间的位置。此外,本文设计了基于Cortex-A8嵌入式平台的声源定位系统,并进行了相应的硬件选型与调试及算法移植工作。实时实验显示,本系统的方案合理有效,能够较好的实现声源定位。
To alleviate the conflict between audibility and distortion in the conventional loudness compensation method, an adaptive multichannel loudness compensation method is proposed for hearing aids. The linear and wide dynamic range compression (WDRC) methods are alternately employed according to the dynamic range of the band-passed signal and the hearing range (HR) of the patient. To further reduce the distortion caused by the WDRC and improve the output signal to noise ratio (SNR) under noise conditions, an adaptive adjustment of the compression ratio is presented. Experimental results demonstrate that the output SNR of the proposed method in babble noise is improved by at least 1.73 dB compared to the WDRC compensation method, and the average speech intelligibility is improved by 6.0% and 5. 7%, respectively, compared to the linear and WDRC compensation methods.