Abstract:
The existing acoustic echo cancellation algorithms based on deep learning mainly adopt an end-to-end structure, which makes it difficult for neural network models to explain their internal mechanisms. To solve this problem, an echo cancellation method combining the Kalman filter and Taylor residual expansion was proposed, which can provide better interpretability for each layer of the network structure. The method consists of two parts, i.e. linear adaptive filtering and deep neural network. Firstly, Neural Kalman Filtering (NKF) is used as an adaptive filter to remove linear noise and obtain a rough spectral estimation of the target speech. Then, Taylor expansion is used to gradually learn the value of the rough spectral estimation, suppress nonlinear residual echoes, and gradually repair the complex spectrum of the target speech. In the Taylor expansion neural network, an encoding and decoding network integrating time-frequency features of different scales was designed for zero-order term estimation. A lightweight high-order term estimation network was constructed to reconstruct the target speech complex spectrum from large to small granularity. The experiment shows that the proposed method has significant performance improvement compared to existing mainstream echo cancellation methods. In the case of double lectures, the Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) were greatly improved. In the single lecture case, the Echo Return Loss Enhancement (ERLE) measure was greatly improved, achieving an excellent performance of 56.106, which has a 6.5% improvement over the advanced UNET neural network method.