Phase Vocoder

Phase Vocoder is one of the most popular methods of analysis/resynthesis of sound spectrum that develops, in the digital domain, concepts already adopted for the Vocoder by Homer Dudley. Traditionally adopted for voice signals, the Phase Vocoder, similar to the analog device, is structured in two sections useful to the analysis and synthesis.

Brief history – A first version of Phase Vocoder was developed in 1966 by James Flanagan and Roger Golden, researchers at Bell Labs. Here, Dudley had developed the Vocoder. The principle from which the two researchers started, was that by passing a sound signal through a bank of parallel filters are not compromised the quality of the output signal. This principle, indeed, is also applied in the Vocoder. But, while this latter, for each filter, permits the transmission only of amplitude values, the technique adopted by Flanagan and Golden also extended to the phase, hence the name Phase Vocoder.[1]

Final result – Indeed, compared to the initial intentions, the research was not satisfactory. The two researchers were able in order to improve the synthesis process, so the audio quality, but all the data collected during the analysis process returns a file with size greater than the original signal. In practical terms this meant a considerable computational effort. The first computer simulation was conducted with an IBM 7094 and a software written in BLODI-B language, specifically developed to process digitized speech signals.[2]
The difficulty in the management of data made it difficult, for several years, the use of the Phase Vocoder. Later, in the mid-seventies, the researcher Portnoff developed a Phase Vocoder even more efficient by implementing inside the Fast Fourier Transform.[3]

Analysis/resynthesis – As in VOCODER, even in the Phase Vocoder passage of the original signal is through a battery of parallel filters, to cover the entire bandwidth of the original signal. In this case, the filters allow you to measure both the amplitude and phase of each sinusoidal sound for each frequency. This is called the analysis step. From these values we derive two envelopes: for amplitude and frequency. During the synthesis step, the data obtained with the analysis can be left unchanged (in this way, theoretically, we have a signal identical to the original), or changed (for example by acting on two envelopes), to achieve different timbres.

Analytical step – For the purposes of the final result, the phase of the analysis is extreme important, so you need to pay particular attention to user-defined parameters. The choice of these parameters (Frame Size – Window – FFT Size – Size Hop), should be evaluated based on the intrinsic qualities of the original signal, is based on the characteristics that should be resynthesized sound. In general the rule is to remember that the better analysis and more like the original sound will be synthesized. Analysis step is carried out considering two factors in particular: frequency and time. As regards frequency, we must remember that during the analysis the original signal spectrum is divided into frequency channels, where the bandwidth of each channel is calculated by dividing the sampling frequency for the frame size. The number of channels, which is closely related to the characteristics of the original signal is obtained by dividing the sampling frequency for the fundamental frequency of the signal.[1]

Frame Size – Frame size is one of the key parameters for conducting a good analysis. It has an impact, in particular, to the frequency and time. The size of each frame, which is measured in number of samples used, must be an integer of 2 (64, 128, 256 samples, etc..). As regards frequency, we can say that by the size frame depends the number of frequencies, of the original signal, stored, and its resolution in terms of frequency. This means that the higher the frame size, the higher will be the number of frequencies, and vice versa. If we wanted to, for example, analyze a very low sound , where it is more difficult the selection of frequencies, then we tend to set a Frame Size high enough to have a higher accuracy of the frequencies present.[3] The relevance of this parameter also affects time resolution. If we previously told that the spectrum is divided into frequency channels must be added that the original signal is divided in portions of time. Unlike the previous case, however, to obtain a good temporal resolution is necessary to keep down the value of the frame, for reasons we will see. For now just remember that, according to frame size, more detailed is the frequency analysis less accurate will be that of time, and viceversa.

Window Type – Choice of the Window Type becomes very important when you want do an analysis very carefully, because each window tends to introduce distortion, so to alter the analysis process. Typically, today, any Phase Vocoder allows the choice of standard windows such as Hamming, Hanning, Gaussian truncated, Blanckman-Harris or Kaiser.

FFT Size – The choice of this value depends on the degree of transformation that we exert on the original signal. The value must be an integer power of two, so that it is at least twice the frame size. Since the size of the FFT impact on processor performance and since its value is closely related to that of the frame size, we understand how the choice of the latter is also important for the FFT.

Hop Size – Hop size is also called windows overlap factor. Determines the number of samples that, during analysis, are skipped every time you make a new spectrum measurement. How much smaller is that value, more overlapping will be windows. Do not forget that resynthesis requires a minimum number of overlaps (about eight).[3]

Conclusions – Beyond that which can be developed the study of these parameters or reasoning about their choice, do not forget a general indication: there are no values that can be considered suitable for any situation. Certainly we can find, within certain range, values which can be considered acceptable to a wider variety of situations, but the fact remains that every sound signal requires an analysis studied case by case basis. Despite being a popular method because it efficiently, it should be noted that the Phase Vocoder is also among the most demanding methods in terms of computation; for this, various ways that reduce the size of the analysis file have been developed.
Among the many composers who have made use of the Phase Vocoder, we remember especially Charles Dodge and Trevor Wishart, have shown more than other interesting musical applications of this method. Mark Dolson, finally, at the beginning of the eighties, he worked to implement the technique of the Phase Vocoder also within the Carl System by Richard Moore and Gareth Loy.


For this topic I’ve read:

[1] John Gordon, John Strawn, An Introduction to the Phase Vocoder, Proceedings, CCRMA, Department of Music, Stanford University, February 1987.
[2] James Flanagan, Roger Golden, Phase Vocoder, The Bell System Technical Journal, Novembre 1966.
[3] Curtis Roads, The Computer Music Tutorial, MIT Press, 2004.

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>