A deeper understanding of sound, electronics and applications
By Prashant Govindan
Early beginnings
The whole premise of audio reproduction electronically is the conversion of an audio wave into an electrical signal using a transducer. From the time Thomas Alva Edison successfully converted the human voice into a mechanical vibration that was “recorded” on his phonograph device, it was imperative that this stored information be reproduced back into audible sound.

While early devices were primarily mechanical and were amplified acoustically using a horn, the invention of the vacuum tube (or valves as they were commonly known) changed the way an electrical signal was reproduced and transmitted with its magnitude increased several thousand times. This process is called amplification of the electrical signal and the invention of the diode and eventually the triode changed how we humans perceive sound once and for all.
Enter the solid-state semiconductor diode and transistor and gone were the pre-war days of huge, bulky devices with innumerable glowing vacuum tubes, though fascinating; was also highly inefficient and unreliable. The solid-state era changed electronics and laid the foundation of what would become modern audio amplification and signal processing devices.

R. Hilsch and R. W. Pohl in 1938 demonstrated a solid-state amplifier using a structure resembling the control grid of a vacuum tube, though this was largely unusable as a practical device as it had a cut-off frequency of one cycle per second. The first working transistor was a point-contact transistor invented by John Bardeen, Walter Houser Brattain, and William Shockley at Bell Labs in 1947. In 1954, physical chemist Morris Tanenbaum fabricated the first silicon junction transistor at Bell Labs. However, early junction transistors were relatively bulky devices that were difficult to manufacture on a mass-production basis, which limited them to a number of specialized applications.

What is an audio signal?
Without getting too deep into the theory, to put it simply: an audio signal is a reproduction of a sound wave in the real world that needs to be captured, recorded or transmitted with minimal loss of the original information contained within. This could be a speech signal; music or simply ambient sounds as would be captured. The invention of the microphone along with the attendant electronics to amplify the signal and then transmit it for further processing or broad¬cast laid the foundation of what modern signal processing would be.
In the rudimentary sense, a audio signal can be visualized as a sine wave with frequency between the human auditory range of 20Hz to 20kHz. This sine wave or combination of sine waves would need to be initially amplified (per-amplified), combined (mixed) and then various aspects of the signal such as the frequencies, the dynamics (the ratio between the quietest and the loudest sounds) and eventually the amplitude of the signal would need to be brought up to a level that can drive a coil, in a magnetic field attached to a diaphragm–essentially a loudspeaker.
This is the basis of signal processing, from the source to the destination, whether it be a small public address system, a recording system, or a transmission system for broadcasting. All of these systems require signal processing at some point to make the signal amenable to its targeted application.
The audio signal chain as we see in the illustration above comprises of a source which could be a microphone or a playback device, a per-amplifier, a mixing or routing device (in case there are multiple sources), filters for attenuating or undesired frequencies or complete bands of frequencies or maybe even boosting a set of frequencies, a set of additional filters to divide the frequency bands that can then be sent to loudspeakers that are best optimized to those frequencies, and finally power amplifiers that amplify the signals or set of signals that can finally drive the loudspeakers as intended. For recording the same signals after mixing and processing is sent to recording devices that store either one source all by itself or a “mix” that can later be retrieved and “mastered” for the final “mix” before being released on the intended medium, be it tape, vinyl, CD or as a digital release.
Analog and Digital Signal
In the old days the signal chain was completely analog, which meant that the signal remained in the analog domain without being digitized. The mixers, filters and equalizers were all built from analog filter circuits that used a combination of resistors, capacitors, and inductors to cut or pass frequencies. Usually, for a complete live sound setup, there would be multiple stages of filters, equalizers and mixers aside from microphone and instrument per-amplifiers. Usually this meant a room full of audio gear that would not only be complex to set up and replicate but would also mean that in less skilled hands would be less than optimal sounding. Add to this the signal loss and noise picked up by cables and the equipment themselves, and it was a frustrating experience to get a basic sound setup done within a reasonable time frame.
With the advent of computer technology in the late 60s and early 70s, it was successfully demonstrated that any analog signal could be converted into a digital signal with 0s and 1s that could then be stored, replicated and transmitted with no loss. Furthermore, this digital signal could now be re-converted back to analog for amplification to drive loudspeakers. This breakthrough also meant that the original analog signal could now be manipulated more efficiently in the digital domain as a digital signal stream without any of the attendant losses as encountered in the analog domain.
The only challenge that now remained was that of the faithful reproduction of the original audio signal. As would be expected, the process of digitization was not completely one to one. In converting the signal from the analog domain to the digital one, a process called A/D conversion, some key parameters such as the sampling rate and the quantization bits would make all the difference. Fortunately due to the many developments in the computing domain, we now have access to A/D converters that can sample up to as high as 384000 times a second, (384kHz) and the resulting sample can be stored in 32 bits which far exceed the demands of human hearing.
The Nyquist theorem states that an analog signal can be digitized without aliasing error if and only if the sampling rate is greater than or equal to twice the highest frequency component in a given signal. Since the highest frequency humans can hear is 20kHz; a sampling frequency of twice as much is sufficient. This translates into 40kHz and therefore a sampling frequency of 48kHz, that is the de-facto standard in professional audio is more than sufficient.
On the other hand, the bit depth in digital audio stands for the levels of audio in the amplitudes domain that we need to accurately capture and reproduce the loudest audio signals that can be heard by humans. This is usually represented as decibels, which is actually a logarithmic value, which means that it increases or decreases exponentially. Although an increase of 3dB represents a doubling of the sound pressure, an increase of about 10dB is required before the sound subjectively appears to be twice as loud. The smallest change we can hear is about 3dB. The subjective or perceived loudness of a sound is determined by several complex factors. 24-bit audio could theoretically encode 144dB of dynamic range, and 32-bit audio can achieve 192dB, but this is almost impossible to achieve in the real world, as even the best sensors and microphones rarely exceed 130dB.
Once in the digital domain, the signal is now a stream of 0s and 1s that may be added, subtracted or multiplied just as any other digital value in a computer. These calculations are performed in blindingly fast computing devices with processors that operate at a few million floating point operations per second (MFLOPS).
All digital signal processors are essentially a bank of filters, that enable the user to manipulate frequencies and mimic analog filters.
DIGITAL FILTERS | ANALOG FILTERS |
High Accuracy | Less Accuracy – Component Tolerances |
Linear Phase (FIR Filters) | Non-Linear Phase |
No Drift Due to Component Variations | Drift Due to Component Variations |
Flexible, Adaptive Filtering Possible | Adaptive Filters Difficult |
Easy to Simulate and Design | Difficult to Simulate and Design |
Computation Must be Completed in Sampling Period – Limits Real Time Operation | Analog Filters Required at High Frequencies and for Anti-Aliasing Filters |
Requires High Performance ADC, DAC & DSP | No AOC, OAC, or OSP Required |
Most analog filters such as high pass, bandpass or low pass filters in the frequency domain can be modeled in the digital domain. Generally most filters are implemented using the Finite Impulse Response (FIR) or Infinite Impulse Response (IIR) models.
IIR FILTERS | FIR FILTERS |
More Efficient | Less Efficient |
Analog Equivalent | No Analog Equivalent |
May Be Unstable | Always Stable |
Non-Linear Phase Response | Linear Phase Response |
More Ringing on Glitches | Less Ringing on Glitches |
CAD Design Packages | CAD Design Packages Available |
No Efficiency Gained by Decimation Available | Decimation Increases Efficiency |
Unlike analog filters, the characteristics of digital filters can easily be changed simply by modifying the filter coefficients. This makes digital filters attractive in communications applications such as adaptive equalization, echo cancellation, noise reduction, speech analysis and synthesis, etc.
Applications of DSP in professional audio
DSP finds diverse applications across many domains in professional audio, including automatic mic mixing, gain sharing mixers, matrix mixing, parametric equalisers, feedback suppression, compression, limiting etc.
While most leading brands offer most of the above in a single unit as general purpose DSP devices, there are others that are purpose built for applications such as far end conferencing, sound reinforcement, speech processing, feedback elimination, loudspeaker processing and so on.
Again, within DSP devices, most of these are either fixed path or open architecture that allow the user to create their own processing chain. Fixed path DSPs are usually designed for a set of certain applications, while open architecture may be used for multiple applications. As a downside, these may not be optimized for any application thus taking a one-size-fits-all approach.
Some of the earliest examples of fixed path purpose designed DSP have been in speech processing by automating the function of an operator manning an audio console. This is particularly useful in meeting room, conferencing facilities or classrooms where its either not feasible or impractical to have an audio mixing console. Some of the more popular models include the Shure SCM 810 and the Automix from Peavey. Other manufacturers such as Rane and Biamp also made analog automixers. The digital counterpart of these products have been implemented as DSP programmable software blocks as below.
Similar to this, we can see that analog products like graphic and parametric equalizers are now available in their digital avatars, as part of DSP hardware.
Most of these software implementations are primarily filter designs based on IIR filters, which are then modeled using software such as Matlab and then a familiar GUI is skinned on for familiarity, as may be seen above. This ensures that the operator/engineer has the same familiar look and feel as the original analog product.
Most open architecture DSP will allow multiple equalisers on each input and outputs, depending on the availability of overall DSP processing resources. Similarly, a bank of filters may be used to shape the output waveform to suit the application.
Amongst adaptive filter implementations, the acoustic echo cancellation remains one of the more popular ones with extensive usage in conferencing products that involve multiple microphones and need connecting over voice or video links.