Note: This content is accessible to all versions of every browser. However, this browser does not seem to support current Web standards, preventing the display of our site's design details.

  

Time- and pitch-scale modification of speech

Author(s):

T. Geyer, A. Koster
Conference/Journal:

Holmdel, NJ, Diploma thesis at the Bell Labs
Abstract:

A comparative study of algorithms for time- and pitch-scale (TS/PS) modification of speech signals in the time domain was conducted, involving the AUTOC, SIFT and AMDF pitch period detectors and the pitch-synchronous overlap/add algorithm (PSOLA). In order to perform a pitch-synchronous TS/PS modification of speech, time domain algorithms estimate the output of the excitation source by spectrally flattening the input signal and correlating the flattened signal. The structure of the algorithms implies a preprocessor involving framing for block-wise processing, low-pass filtering and a silence detector. Following the preprocessor, the pitch detector determines a pitch period estimate reviewed in the postprocessor and eventually smoothed. The TS/PS modification block decomposes the signal in order to modify the sequence of vocal tract impulse responses and finally overlap/adds the resulting signals, thus altering either the pitch or the length of the original signal. Our approach was to implement the standard algorithms as suggested in literature and to trace reasons for failures and potential problems. We enhanced the standard algorithms by our own improvements. We should mention the introduction of a very efficient and favorable adaptive frame size algorithm, a pitch decision algorithm correcting almost all erroneous decisions of the standard algorithms, and a simplified TS/PS modification block avoiding a complex algorithm and requiring less computation. We also investigated suggestions (like postprocessor smoothing of pitch periods) or various modification stages in the TS/PS modification block (like averaging) but found most of them to be superfluous. A C environment with graphical output was created in order to trace the reasons for sound degradation. Our stand-alone Windows 9x application is able to process any speech file in PCM-format without further parameter settings for any fractional time-scale and/or pitch-scale modification factors. Parameters and plotting options are accessible through a menu to allow further fine-tuning. The resulting speech files are of better quality than from the standard commercial applications we compared it to. We finally implemented the algorithm in Assembler on a Lucent DSP 16210 and optimized the algorithm for speed. For further information please feel free to contact Tobias Geyer, geyer@aut.ee.ethz.ch.

Year:

2000
Type of Publication:

(12)Diploma/Master Thesis
Supervisor:

W. Etter

No Files for download available.
% Autogenerated BibTeX entry
@PhdThesis { GeyKos:2000:IFA_565
}
Permanent link