Demystifying audio in video capture and live event streaming
As AV folks, we talk about audio encoding and codecs all the time, but what is an audio codec exactly? An audio codec is essentially a device or an algorithm capable of encoding and decoding a digital stream of audio.
In practical terms, the audio pressure waves that are transmitted through the air to our ears are continuous, analog signals. The signals are brought into the digital world by a device called an analog-to-digital converter (ADC) and back out again, for our enjoyment, by a digital-to-analog converter (DAC). The codec is found in between these two functions, and it is here where a number of important options can be adjusted to successfully capture, stream and record quality audio: the codec algorithm, the sample rate, the bit depth and the bit rate.
The three most common audio codecs are: Pulse-Code Modulation (PCM), MP3, and Advanced Audio Coding (AAC). The codec selected determines the compression and quality for recording.
PCM is a codec used in computers, Compact-Discs, digital phones, and the uncommonly used super-audio discs. The source signal for PCM is sampled at regular intervals and each sample represents the amplitude of the analog waveform as a digital value. PCM is the most basic form of encoding, and is usually just the raw output from the analog-to-digital conversion process.
Given the right parameters, this digitized waveform can be perfectly reconstructed back to analog at the far end, hence is ‘lossless’. This lossless codec provides high fidelity to the original audio, but unfortunately, it’s not very economical and results in very large files that are not feasible for live streaming. I recommend using PCM when recording digital ISOs for your sources or when you’re doing heavy audio post-production.
Luckily, we have the choice of several other codec algorithms that can compress the digital data (compared to PCM) using some clever observations about how audio waveforms behave. The trade-off is that these algorithms are considered ‘lossy’ since it’s not possible to perfectly reconstruct the original signal, but the results are still good enough so that most people can’t tell the difference.
MP3 is an audio encoding format using a lossy algorithm that compresses the same kind of sampled digital information into a much smaller file. MP3 is the most commonly used codec when referring to consumer audio for music and storage. I recommend only using MP3 for streaming content, because it uses less bandwidth.
AAC is a newer, lossy digital audio encoding standard. It was designed to be the successor of MP3 compression. AAC has become a standard for MPEG-2 and MPEG-4 specifications. It is essentially a compression codec that provides better sound quality than MP3 while maintaining similar bitrates. I recommend using this codec when live streaming.
Sample Rates (kHz)
Sample rate is the number of times a sample of audio is taken per second. Sample rates are measured in Hertz (Hz) or Kilohertz (kHz,) one kHz equaling 1,000 Hz. As an example, 44,100 samples per second can be represented as either 44,100 Hz or 44.1 kHz. The sample rate selected will determine the maximum frequency that can be reproduced, and a Swedish-born engineer named Nyquist showed in the early 1900s, that the sample rate needs to be approximately double the highest frequency in order to do the job.
As an example, the average human ear can interpret frequencies between 20 Hz and 20 kHz. Using this range of human hearing and the table below, we can see why 44.1 kHz was chosen as the sample rate for audio CDs and is still considered a very good rate for reproduction of the source material.
Below is a reference displaying the maximum frequency per given sample rates:
There are a number of reasons for picking a higher sample rate, even though you would think it would be a waste to reproduce frequencies above the range of human hearing. But the average listener will consider 44.1 – 48 kHz to be more than good enough for most purposes.
Along with sample rate, there is also bit depth to consider. The bit depth is the number of digital bits of information used to encode each sample. In simple terms, bit depth measures “precision”. The higher the bit depth, the more accurately a signal can communicate the amplitude of the actual analog sound source. With the lowest possible bit depth, we only have two choices to measure the precision of sound: 0 for complete silence and 1 for full volume. The higher the bit depth, the more precision one has over their encoded audio. As an example: CD quality audio is a standard 16-bit, which gives 216 (or 65,536) volumes to choose from.
Bit depth is fixed for PCM encoding, but for lossy compression codecs (like MP3 and AAC) it is calculated during encoding and can vary from sample to sample.
Bit rate is the number of bits that are processed or transmitted over a unit of time. Generally this is expressed as a number of bits (or kilobits) per second (often kbps or kbits/second). For linear PCM, bit rate is a simple calculation.
bit rate = sample rate × bit depth × channels
For systems like Pearl, which encode linear PCM at 16-bit (bit depth of 16) this calculation can be used to determine the how much extra bandwidth is needed for PCM audio. For example, for a stereo (two channel) signal sampled at 44.1 kHz in 16-bits, the bit rate is calculated as follows (remember that 1 Hz is 1/second, so the units end up as kbits per second).
44.1 kHz × 16 bits × 2 = 1,411.2 kbits/second
Meanwhile lossy audio compression mechanisms, like AAC and MP3 have fewer bits to transfer (that’s their whole purpose), so they use much smaller bit rates. Generally they use values anywhere from 96 kbps to 320 kbps. For these codecs, the higher the bitrate you choose, the more room you’ll have for audio bits per sample taken, so the better quality you’ll hear in your audio encoding.
Audio sample codecs, sample rates and bit rates in the real world
Audio CDs, one of the first popular consumer mechanisms for storing digital audio, use a sample rate of 44.1 kHz (20 Hz – 20 kHz, the human hearing range) and bit depth of 16-bits. The values were chosen to be able to get as much audio as possible on the CD while maintaining good audio fidelity.
When video was added to audio by way of the DVD and later Blu-Ray discs, a new standard was created. DVDs and Blu-Rays usually use linear PCM format with sample rates of 48 kHz (stereo) or 96 kHz (5.1 surround sound) and a bit depth of 24. These values were selected as the ideal choices to keep the audio in sync with the video and to get the best quality possible using the additional disk space available in these mediums.
With audio CDs, DVDs and Blu-Ray, the goal is to present you a high quality program in a fixed way for replay. The goal of experience is to provide top quality audio (and video) without much concern about size of the resulting media (as long as it fits on the disc). These formats use Linear PCM because of the quality it provides.
By contrast, mobile media and streaming media have a different goal – to use as low a bitrate as possible while still maintaining audio that is “good” enough for the listener. For this application, algorithms with compression are a better choice.
You can use the same principles in your own recordings.
When recording audio with your video…
Whenever possible, for a recording that will be used for post-production or as an ISO of your program, use PCM encoding with a sample rate of 48 kHz and the highest possible bit depth (16 or 24) to achieve the best quality audio. In the case of Pearl, I recommend PCM encoding with 48 kHz to achieve the highest audio quality.
When streaming audio with your video…
When streaming or creating recordings that will be played back via streaming (e.g. video on demand), get a good quality audio track while using less bandwidth by using the AAC or MP3 audio codec with a sample rate of 44.1 kHz and a bit rate of 128 kbps or higher. This ensures that you still maintain audio credibility while making your stream more widely available to your audience.