dBFS

A review and some remarks on lossless and lossy sound formats, audio codecs, and trends (fashions)

These are some remarks on my recent review of comparisons of bit rates, sample rates, and the endless arguments about whether to use 128kbps, 192kbps, 256lbps, or even 320kbps bit rates with lossy audio compression formats like MP3 at 44.1kHz, as well as a review of the debate about whether it matters whether consumer recordings use "only" CD Quality 16-bit sample bit depth at 44.1kHz sample rate, or higher sample bit depths like 24-bit and higher sample rates like 48kHz, 96kHz or even 192kHz for lossless formats, so-called "high(er) definition" audio.

I also have guide on "high definition audio" for Mac users here: Mac OS X: HOWTO adjust your system's sound quality, and record or find "high definition" audio sources.

Human beings happily enjoying music without being sad about bit rates and compression formats

WARNING: this page is fairly littered with links to MP3 bit rate and high definition audio comparison tests.
Find also various sample test files via: Audio engineering test/sample file resources, and online generators and online audio tests

Firstly, I'll put myself out there. If a musical composition and recording is basically quite good, and you can't even enjoy it at all in MP3 format on a small personal music playing device at 128kbps or even "as low as" 96kbps, then your view of life is broken, and you should go back in time to the trenches of WWI or perhaps explore a firestorm of WWII and see how much fun that is, until you realise how lucky you are to be alive and to be able to own and carry a cheap personal music playing device smaller than your pocket that holds days, weeks, or even months of music.

I listen to SBS Chill on my nice DAB+ digital radio at "only" 56 kbps/AAC (HE-AAC) and I manage to enjoy it.
I would prefer them to allocate a bit more, but the music is simply amazing anyway.

It is of course quite nice if you can have your personal music collection at higher bit rates, and if you have genuine professional audio needs or very - and I mean very - fancy speakers at home it is "nice" to use 320kbps MP3 or perhaps even a lossless format at 16-bit 44.1kHz, assuming the original source was "only" CD quality.

Or if the original source of the incredibly cleanly recorded music really warrants it, you might convince yourself (despite numerous professionally, scientifically conducted double-blind experiments that prove you won't be able to tell the difference) that you need 24-bit sample depth for your personal music collection, possibly even at one of the higher sample rates like 96kHz, presumably because you have very - and I mean very - sharp hearing better than nearly all other human beings, including "audiophiles", who have participated in those blind tests and all failed to consistently notice any difference.

But I really don't believe your ears and home speakers are so good you ever "need" 24-bit/192kHz lossless in your personal music collection.

Some background on lossy compressed and lossless audio formats

So, enough of the opinions for now, and on to the research and summary. It is useful to know that:

- From Wikipedia: Bit rate:

'the number of bits that are conveyed or processed per unit of time.'

- From Wikipedia: Sample rate:

'defines the number of samples per unit of time (usually seconds) taken from a continuous signal to make a discrete signal.'
..
'The full range of human hearing is between 20 Hz and 20 kHz. The minimum sampling rate that satisfies the [Nyquist-Shannon] sampling theorem for this full bandwidth is 40 kHz. The 44.1 kHz sampling rate used for Compact Disc was chosen for this and other technical reasons.'

Dogs and cats and some other animals can hear higher frequencies it seems, but they don't usually use iPods, although they might listen to music on very expensive sound systems in the loungerooms of audiophiles.

I recommend also this tutorial series by Dave Marshall from 2001: Implications of Sample Rate and Bit Size

Comparing bit rates of lossy compressed samples without considering the sample rate of the encoded source is inconsistent. For example, one might compare MP3 bit rates of different sample files assuming the same sample rate (44.1 kHz for traditional reasons) but it "aint necessarily so". Mostly it is (because mostly an MP3 is encoded from a 44.1 kHz CD quality source). But not always.

The bit rate works together with the sample rate in a subtle way to give what you perceive as sound quality. From Wikipedia:MP3:

'Compression efficiency of encoders is typically defined by the bit rate, because compression ratio depends on the bit depth and sampling rate of the input signal. Nevertheless, compression ratios are often published. They may use the Compact Disc (CD) parameters as references (44.1 kHz, 2 channels at 16 bits per channel or 2×16 bit), or sometimes the Digital Audio Tape (DAT) SP parameters (48 kHz, 2×16 bit). Compression ratios with this latter reference are higher, which demonstrates the problem with use of the term compression ratio for lossy encoders.'
..

'Several bit rates are specified in the MPEG-1 Audio Layer III standard: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s, with available sampling frequencies of 32, 44.1 and 48 kHz.'

'A sample rate of 44.1 kHz is almost always used, because this is also used for CD audio, the main source used for creating MP3 files. A greater variety of bit rates are used on the Internet. The rate of 128 kbit/s is commonly used, at a compression ratio of 11:1, offering adequate audio quality in a relatively small space. As Internet bandwidth availability and hard drive sizes have increased, higher bit rates up to 320 kbit/s are widespread.'

Uncompressed audio as stored on an audio-CD has a bit rate of 1,411.2 kbit/s, so the bitrates 128, 160 and 192 kbit/s represent compression ratios of approximately 11:1, 9:1 and 7:1 respectively.'

The compression ratio for 320kbps MP3 at 44.1kHz is 4.4:1, at which point - if you care about the sound quality so much - you might as well ask yourself why not just use a lossless compression format like Apple Lossless ALAC (which BTW is also (now) supported by all iOS device (iPod, iPad, and iPhone) models) or FLAC with a compression ratio of about 2:1.

OK, let's look at bit depths for uncompressed lossless like WAV and AIFF:

- From Wikipedia: Audio bit depth:

'In digital audio using pulse-code modulation (PCM), bit depth is the number of bits of information in each sample, and it directly corresponds to the resolution of each sample. Examples of bit depth include Compact Disc Digital Audio, which uses 16 bits per sample, and DVD-Audio and Blu-ray Disc which can support up to 24 bits per sample.'

..

'Bit depth is only meaningful in reference to a PCM digital signal. Non-PCM formats, such as lossy compression formats like MP3, AAC and Vorbis, do not have associated bit depths. For example, in MP3, quantization is performed on PCM samples that have been transformed into the frequency domain.'

'The bit depth has no impact on the frequency response, which is constrained by the sample rate.'

- From Pulse Code Modulation (PCM):

'a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, Compact Discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps. PCM streams have two basic properties that determine their fidelity to the original analog signal: the sampling rate, the number of times per second that samples are taken; and the bit depth, which determines the number of possible digital values that each sample can take.'

Just quoting bit rates without stating specifics of the encoding method is dangerous (error prone). You can't compare bit rates between say lossy MP3 and AAC with sample/audio bit depths of lossless uncompressed WAV PCM or lossless compressed ALAC or FLAC without specifying exactly what was done and how in the processing and encoding. Also, in the past there was a wide range of quality in MP3 and AAC encoders, although this is less so in 2013.

So here is the crash course in what really counts, unless you are a genuine audio professional wrangling with issues specific to high-end professional audio production (not your iPod's music collection or your home movies):

- Music CDs use 16-bit, DVD-Audio and Blu-ray can support 24 bits per sample, and they can support a range of sample rates higher than the 44.1kHz used for CDs. A lot of people listened to 16-bit CDs for a long time and it didn't kill them, and it's still not dangerous to listen to "only" 16-bit at 44.1kHz if the music is good. See also my summary at: Mac OS X: HOWTO adjust your system's sound quality, and record or find "high definition" audio sources

- From Wikipedia: Advanced Audio Coding (AAC)

'a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates'

There is increasing support for AAC in consumer devices, but MP3 is still probably more widely supported (still the "de facto"), although the gap is closing fast. Therefore:

MP3 (a lossy compression format) is not suddenly evil just because Advanced Audio Coding (AAC) (another lossy compression format) is clearly usually better as an algorithm. There are still a lot of devices that still don't support AAC, and if you are preparing music for somebody who has a device that only handles MP3 (and more still do), or if you are unsure, then use MP3. There is no shame in it, you will not be less cool than somebody who uses AAC, you will merely need a bit more (say 20% to 30%) more disc/storage space and more kbps to get about the same sound quality, depending on the type of music (see comparison links at the end of this article).

And even if you use Apple devices (as I do), you are still allowed to use MP3s, in 2013.

- High Efficiency AAC is typically used by broadcasters who can only offer lower bit rates; the algorithm is tuned to work well with less data and with streaming.

- Apple's popular .m4a suffix does not tell you automatically what the audio format is. It is a container format, and could, for example, contain ACC lossy compressed audio or ALAC lossless compressed. One needs to open the container and look inside to know.

- Some people claim that at lower bit rates the free open source Vorbis lossy compression format performs slightly better than AAC, but at higher bit rates above 128kbps it is likely indistinguishable.

- If you insist on using completely lossless compression, it does not matter much whether you use Apple's Lossless Audio Codec (ALAC) (often stored inside a special MP4 container with the filename extension .m4a) or the Free Lossless Audio Codec (FLAC). Really, it doesn't.

- ALAC 'Testers using a selection of music have found that compressed files are about 40% to 60% the size of the originals depending on the kind of music, which is similar to other lossless formats'

- FLAC 'Digital audio compressed by FLAC's algorithm can typically be reduced to 50–60% of its original size'.

So you can win a bet at a pub with an audiophile by quoting Wikipedia ! It depends a bit on the type of audio tested of course. But not that much.

If you are a musician or sound engineer working with professional sound recording and mixing you will need to stay lossless, and some sound editing systems support working "directly" in FLAC or ALAC - instead of uncompressed WAV or AIFF - and thus save typically around 50% storage space along the way (sometimes at the price of a bit of compression/decompression time).

Otherwise, unless you are being naughty distributing stolen PCM-sourced music via torrent sites and wish to save the torrent pirates some disk space and your torrent "customers" some download time, there is really barely any reason to not simply compress the music using a lossy format like AAC or MP3 (at 160kpbs, or perhaps 192kpbs, or if you insist you can hear the difference then even at 320kbps) and you still win massively on storage space.

There is an excellent summary of the (ridiculously) large number of audio formats at: http://en.wikipedia.org/wiki/Comparison_of_audio_codecs. But it does not show it seems a comparison of file sizes for various formats for "comparable" music styles.

Ok, time for some "expert" assessments of MP3 bitrates vs. CD quality

At the lower end of the scale, there is a "self-test" comparison by Daniel Potts (PC World) from 2002 at 'Audio compression formats compared'. It is interesting to note that he claims that MP3 with constant bit rate (CBR) can achieve 'CD quality' with 128kbps and a filesize of 960kB/min, whereas he claims AAC can achieve 'CD quality' with 80kbps and filesize of about 600kB/minute, a significantly smaller file size.

Really, 128kbps MP3 CBR ? That is probably far lower than most audio professional would equate to CD quality. Here is another comparison from How Stuff Works 'How MP3 Files Work' (and since they know how stuff works we trust them more):

'Using a bit rate of 128 Kbps usually results in a sound quality equivalent to what you'd hear on the radio. Many music sites and blogs urge people to use a bit rate of 160 Kbps or higher if they want the MP3 file to have the same sound quality as a CD.'

'Some audiophiles - people who seek out the best ways to experience music - look down on the MP3 format. They argue that even at the highest bit rate settings, MP3 files are inferior to CDs and vinyl records. But other people argue that it's impossible for the human ear to detect the difference between an uncompressed CD file and an MP3 encoded with a 320 Kbps bit rate.'

[It is assumed above that they are talking here about Constant Bit Rate (CBR) not Variable Bit Rate (VBR).]

Here's another example of mp3 vs cd audio quality tests from Sam Lin asserting that CD quality requires a bit higher MP3 bit rate, including some frequency analysis and nice graphics to prove it. He not only tested an orchestral piano piece and a pop song, he also tested some pink noise. He did blind listening tests and used a range of different sorts of speakers:

'There has been much debate on the sound quality of MP3's vs the 16-bit linear PCM used in producing audio CD's. Not being able to find much in the way of critical test results, I set out to perform some tests of my own. As a baseline, I chose 192Kbps as the lowest MP3 bitrate, since this seems to be a commonly agreed upon threshold for "near CD quality," and most of the MP3's I've listened to encoded below 192Kbps have sounded too degraded for my tastes.'

Some opinions, from a musician (live performer)

Oops, what was that I read above ? Many experts seem to agree that most/many people can't hear the difference between a 16-bit 44.1 kHz CD and an MP3 at 192kbps (CBR). And I recall well when CDs came out that lots of people seemed to enjoy the music on them ! Maybe it was ... because of good music, with good musicians, with good songs played well .. maybe it wasn't so much because of every last digital bit.

Most posh wine "experts" can't tell the difference between red wine and white wine when their nose is pegged and some can't even tell the difference when just blindfolded; it must be true if The Guardian says it too. And many - if not most - audiophiles bathe in their own self-absorbed obsession with bit rates, bit depths, and sample rates. A similar sentiment is expressed in this humorous MacWorld article 'Listen- (or shut-) up', which includes some nice tests.

Having some of your music in only 128kbps MP3 (instead of 192kbps or 320kbps MP3, or better AAC, or even in a nice compressed lossless format like FLAC or ALAC) is not a good reason to be grumpy or whinge and complain:

If you are still sad, maybe you have chosen the wrong music instead of the wrong bit rate or audio format ? Or perhaps you could learn how to sing a song or play a musical instrument instead. It may even be more fun than fussing about bit rates. And it might even sound better live, too.

You are also not suddenly a better human being than somebody else if your entire personal music collection is all, only, strictly, religiously, in 320kbps. Up to the challenge? Do 320kbps mp3 files really sound better? Take the test!

Live is best, uncompressed !

I am a live musician, a performer, and I know from experience that as long as you are an entertainer, you can bring people enjoyment, if you have the will to do it. And I also know, that nothing, no recording technology, no digital anything, will ever reproduce the sound of a live instrument, ever, anywhere. Ask my bongos (wood and skins/hide); they do some amazing things that no computer will ever do, and no number of bits are ever going to match them. Or stand near a nice brass trumpet played well and listen to it. Doh, computers and most speakers are not made of wood or hide or brass !

But what if I am a big-time DJ playing big music on big speakers to a big crowd at a really big gig ?

You mean like at your girlfriend's 18th birthday party, where all of her (heavily drinking) friends might notice if you only use 192kbps MP3s of Lady Gaga ? Maybe, just to be safe, you should instead use only 32-bit floating point lossless with 192kHz sample rates, so they can hear those really special "bright" sounds (that were never present in the original recordings anyway) above the ... noise ?

Or maybe there is a dog or cat at the party (oops, "gig") with really sharp hearing ?

If you do seriously have the chance to present your audio wares professionally on quality audio equipment, for people who seriously care (or can even notice), then by all means use 320kbps MP3 (or similar high bit AAC) or, if you have the storage space to spare, then simply use uncompressed lossless WAV or AIFF, or compressed lossless ALAC or FLAC (assuming you can play FLAC directly without decompression on your Mac).

Some arguments for consistent use of 320kbps MP3s and even uncompresssed lossless by DJs are made in this article A DJ’s Guide to Audio Files and Bitrates by Dan White (Sep 2012) (although the article also makes some terminology mistakes, such as in one place confusing bit rates of lossy codecs with sample bit depth and sample rates of lossless ones).

One potentially good argument is that if you are processing the music on the fly, such as tempo shifting for tempo matching, then - if you really have to work with lossy compressed MP3 - 320kbps is more forgiving, but of course it also means your processors have to work a bit harder. In any case, storage is now cheap and compact, and processing power is getting better all the time.

"And I am DJing with MP3s because ..."

If you really are a "big time" DJ, what are you working with MP3s for ? If you are seriously DJing professionally,
you don't have to worry about whether or not all of your music resources will fit on your iPhone.

Audiophiles insisting they can hear better than 16-bit / 44.1kHz Compact Disc quality are (probably) kidding themselves

I recommend that anybody who still seriously doubts this reads this detailed article by Monty, Mar 2012, a humorous and technically rich challenge: 24/192 Music Downloads ...and why they make no sense. Accurate, scientific, fantastic ! Also tells you how your ears work, and provides some fabulous audio test files (including some very quiet ones, and some very high frequency ones, for you to _not_ hear noise). After explaining well why 196kHz sample rate won't help you (and may even do some harm) he explains how dithering can push the dynamics of a 16-bit system down below the usually quoted RMS figure of -96dB down to -120dB, and gives test files to prove it ! And his conclusion:

'16 bits is enough to store all we can hear, and will be enough forever.'

..

When does 24 bit matter?

Professionals use 24 bit samples in recording and production for headroom, noise floor, and convenience reasons.

16 bits is enough to span the real hearing range with room to spare. It does not span the entire possible signal range of audio equipment. The primary reason to use 24 bits when recording is to prevent mistakes; rather than being careful to center 16 bit recording-- risking clipping if you guess too high and adding noise if you guess too low-- 24 bits allows an operator to set an approximate level and not worry too much about it. Missing the optimal gain setting by a few bits has no consequences, and effects that dynamically compress the recorded range have a deep floor to work with.

An engineer also requires more than 16 bits during mixing and mastering. Modern work flows may involve literally thousands of effects and operations. The quantization noise and noise floor of a 16 bit sample may be undetectable during playback, but multiplying that noise by a few thousand times eventually becomes noticeable. 24 bits keeps the accumulated noise at a very low level. Once the music is ready to distribute, there's no reason to keep more than 16 bits.

..

Listening tests

There are numerous controlled tests confirming this, but I'll plug a recent paper, Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback, done by local folks here at the Boston Audio Society.

..

This paper presented listeners with a choice between high-rate DVD-A/SACD
[DVD-Audio (supports up to 2-channel 24-bit 192 kHz) and Super Audio CD] content, chosen by high-definition audio advocates to show off high-def's superiority, and that same content resampled on the spot down to 16-bit / 44.1kHz Compact Disc rate. The listeners were challenged to identify any difference whatsoever between the two using an ABX methodology. BAS conducted the test using high-end professional equipment in noise-isolated studio listening environments with both amateur and trained professional listeners.

In 554 trials, listeners chose correctly 49.8% of the time. In other words, they were guessing. Not one listener throughout the entire test was able to identify which was 16/44.1 and which was high rate, and the 16-bit signal wasn't even dithered!'


Some other useful audio and sound engineering guides and resources concerning formats

- From Paul Sellars of Sound on Sound, an absolutely fabulous "must read" description of MP3s and the MP3 encoding/decoding process: Perceptual Coding: How Mp3 Compression Works (May 200).

- From The Great MP3 Bitrate Test: My Ears Versus Yours:

' .. three songs chosen from vastly different genres, encoded from CD and transcoded into the various popular bitrates available for MP3s (64k, 96, 128, 160, 192, 256, and 320kbps with VBR off) ..'

His conclusion is that in some cases he could hear nothing better above 192kbps, but in some cases he reckons he could hear improvements at 256kbps and 320kbps.

- From PC Pro: 24-bit audio: the new way to make you pay more for music? By Barry Collins, Feb 2011

- Mac software to play FLAC files

- An Overview of Apple Lossless Compression Results by Kirk McElhearn, May 2011. He notes that the Apple Lossless codec has gone open source. Provides some nice tests on various styles of music demonstrating ALAC:

'The range of compression for these examples is from 36% to 68%, with the majority of the examples clustering around the 50% level.' .. ' (These file sizes are similar for other lossless formats, such as FLAC, SHN and APE.)'

So hopefully there's another argument, ALAC vs FLAC, that we no longer have to have (ever again).


I hope this review was of some interest to you and don't forget:

Entertainment, sentiment, performance, and participation are more important than bit rates and sample rates. Love beats technology, and live is best.

A summary of a review of music levels for broadcasting, personal use, recording and mastering, including the new LOUDNESS measures

This page started because I began reading recently about the new(ish) loudness measures and standards, especially those of the European Broadcasting Union (EBU) (I did not examine the slightly lower US SMPTE recommendation in depth). From EBU: Loudness:

'In August 2010, the EBU published its Loudness Recommendation EBU R128. It tells how broadcasters can measure and normalise audio using Loudness meters instead of Peak Meters (PPMs) only, as has been common practice.

..

-23 LUFS

Basically EBU R128 recommends to normalize audio at -23 LUFS +/- 1 LU, measured with a relative gate at -10 LU. The metering approach can be used with virtually all material. To make sure meters from different manufacturers provide the same reading, EBU Tech 3341 specifies the 'EBU Mode', which includes a Momentary (400 ms), Short term (3s) and Integrated (from start to stop) meter. Already more than 60 vendors have reported to support 'EBU Mode' in their products.'

Now I am not a broadcaster, but my review of this matter of loudness sent me on a very interesting trip right back to the fundamentals of analog and digital audio engineering and levels, and I attempt to share that journey here, including some examples of dBFS and LUFS statistics processing with some free tools for Mac OS X.

I include some tips and research links on how these loudness measures relate to metering, recording and mastering levels, and how to react to the broadcasting loudness measures and recommendations pragmatically, namely in advance:

- The "best" level(s) for digital recording are different from the best levels for digital delivery, and depend on whether you will use your recorded resources to be mixed with other music, or as an end mix (or for simple capture), and they also depend critically on what devices your end mixes and masters will be played on (served via), and to some extent also on the chosen audio format.

- Levels and loudness considerations for mastering are very different from levels for recording live music, and from levels appropriate for preparing music collections for playing on personal music devices (as opposed to broadcasting) may be different again.

- There is a consensus that the -23 LUFS European Broadcasting Union (EBU) standard is fine for some media (TV, radio etc.) but not at all appropriate for personal music playing devices such as iPods, mobile/smart phones etc, where pushing it a good deal louder is handy.

- The recently refined EBU (and SMPTE) measures of loudness are beginning to penetrate the world of Digital Audio Workstation (DAW) software, with new loudness meters already included by many audio software vendors.


Some background on audio levels

In order to understand my summary one needs to at least be familiar with the following:

- From Wikipedia: Decibel:

'The decibel (dB) is a logarithmic unit used to express the ratio between two values of a physical quantity (usually measured in units of power or intensity). One of these quantities is often a reference value, and in this case the dB can be used to express the absolute level of the physical quantity.

The number of decibels is ten times the logarithm to base 10 of the ratio of the two power quantities.

A change in power by a factor of 10 is a 10 dB change in level. A change in power by a factor of two is approximately a 3 dB change. A change in voltage by a factor of 10 is equivalent to a change in power by a factor of 100 and is thus a 20 dB change. A change in voltage ratio by a factor of two is approximately a 6 dB change.

..
The decibel unit can also be combined with a suffix to create an absolute unit of electric power. For example, it can be combined with "m" for "milliwatt" to produce the "dBm". Zero dBm is the level corresponding to one milliwatt, and 1 dBm is one decibel greater (about 1.259 mW).

In professional audio, a popular unit is the dBu (see below for all the units). The "u" stands for "unloaded", and was probably chosen to be similar to lowercase "v", as dBv was the older name for the same thing. It was changed to avoid confusion with dBV. This unit (dBu) is an RMS measurement of voltage which uses as its reference approximately 0.775 V RMS. Chosen for historical reasons, the reference value is the voltage level which delivers 1 mW of power in a 600 ohm resistor, which used to be the standard reference impedance in telephone audio circuits.'

..
In professional audio, equipment may be calibrated to indicate a "0" on the VU meters some finite time after a signal has been applied at an amplitude of +4 dBu. Consumer equipment will more often use a much lower "nominal" signal level of -10 dBV. Therefore, many devices offer dual voltage operation (with different gain or "trim" settings) for interoperability reasons. A switch or adjustment that covers at least the range between +4 dBu and -10 dBV is common in professional equipment.

..

dBFS (digital)

dB(full scale) – the amplitude of a signal compared with the maximum which a device can handle before clipping occurs. Full-scale may be defined as the power level of a full-scale sinusoid or alternatively a full-scale square wave. A signal measured with reference to a full-scale sine-wave will appear 3dB weaker when referenced to a full-scale square wave, thus: 0 dBFS(ref=fullscale sine wave) = -3 dBFS(ref=fullscale square wave).

dBTP

dB(true peak) - peak amplitude of a signal compared with the maximum which a device can handle before clipping occurs. In digital systems, 0 dBTP would equal the highest level (number) the processor is capable of representing. Measured values are always negative or zero, since they are less than or equal to full-scale. '

Now before proceeding any further, let's look at one very important point about decibels as applied to dBFS. The formula for calculating dBFS is equivalent to the formula for calculating dB relative to a voltage (not a power), so the formula is:

LdB = 10 * log(V2/V02) = 20 * log10(V/V0)

where V0 is the reference. That is, the digital amplitude is handled as a "field" value just like electrical voltage, and not like a sound pressure or power ! Here are some typical values rounded for some amplitude ratios:

1.0000 =   0.000 dB
0.5000 =  -6.021 dB
0.2500 = -12.041 dB
0.1250 = -18.062 dB
0.1000 = -20.000 dB
0.0625 = -24.082 dB
0.0100 = -40.000 dB
0.0010 = -60.000 dB

This gives us a golden rule of thumb for digital:

Increasing the number of bits by 1 doubles the number of available quantisations,
and thus corresponds to about 6dB increase in the dynamic range.

- From Wikipedia: dBFS: decibels relative to full scale for digital:

'0 dBFS is assigned to the maximum possible digital level. For example, a signal that reaches 50% of the maximum level at any point would reach -6 dBFS at that point, 6 dB below full scale. Conventions differ for RMS measurements, but all peak measurements will be negative numbers, unless they reach the maximum digital value.'

- From Wikipedia: RMS levels:

'Since a peak measurement is not useful for qualifying the noise performance of a system, or measuring the loudness of an audio recording, for instance, RMS measurements are often used instead.

There is a potential for ambiguity when assigning a level on the dBFS scale to a waveform rather than to a specific amplitude, since some choose the reference level so that RMS and peak measurements of a sine wave produce the same number, while others want the RMS and peak values of a square wave to be equal, as they are in typical analog measurements.'

- From Wikipedia: Dynamic range:

'The measured dynamic range of a digital system is the ratio of the full scale signal level to the RMS noise floor. The theoretical minimum noise floor is caused by quantization noise. This is usually modeled as a uniform random fluctuation between −1/2 LSB and +1/2 LSB. (Only certain signals produce uniform random fluctuations, so this model is typically, but not always, accurate.)'

Some other useful audio and sound engineering guides and resources concerning levels and metering include:

- Understanding & Measuring Digital Audio Levels by Glen Kropuenske, 2006 (PDF) this is an excellent introduction to levels with some nice comparison graphics and discussion of digital vs analog:

'dB or decibels

Audio signal or sound levels are measured using a decibel (dB) system. The dB system is used to compare two levels or a change in signal voltage or power. One dB is the level change that is just noticeable by most people. A 6 dB change is considered to be about twice the volume.

Sound signal level in dB can be considered either as a power or as a voltage. The level in decibels is 10 times the logarithm of the ratio of two power levels. Where P is the measured power in watts and P Ref. is a reference power in watts.

Sound signal level in dB can be considered as a voltage ratio. The level in decibels is 20 times the logarithm of the ratio of two voltage levels. Where V is the measured voltage and V Ref. is a reference voltage.

The resistance is assumed to be the same so calculations using either the power or voltage formula agree.'

'Units of Sound Level Measurement

Sound signal level is expressed using various dB units of measurement including:

- dBm: decibels or dB referenced to 1 milliwatt (.001 watt)

- dBu or dBv: decibels or dB referenced to 0.775 volt (dBu is more commonly used)

- dBV: decibels or dB referenced to 1 volt'

'VU Meters

The VU (volume unit) meter is another voltage measurement method for analog audio level measurement. The VU meter is a voltmeter with a response time designed to reflect the loudness of live audio as the ear would interpret the loudness. Relating VU measurement units to the other dB units of measurement for audio can only be done with a sine wave test tone. In a professional audio balanced system, 0 VU corresponds to +4 dBu. You may also see 0VU as +4 dBm although this assumes 600 ohm balanced impedance. This is the only impedance in which 4 dBm equals 4 dBu'

'Analog vs. Digital Levels — the dBFS Scale

Digital audio levels are measured differently than analog audio levels. Yes, yet another and different dB system is used. The dB system in digital audio starts at the top and defines the loudest sound level that is to be digitized. This top or full scale view of the audio levels results in a full scale or "FS" system of dB measurement.'

[ED: Warning: The following numbers do not all agree well with some diagrams or statements made by others quoted below.
Also, it does not state for whether 16-bit or 24-bit (assumed).]

'A 0 dBFS measurement unit is to be the highest audio level. Assuming this is to be at the highest audio level before clipping occurs, this corresponds to an analog level of 24 dBu. Therefore, 4 dBu (dBu =dBv) is the same as - 20 dBFS or 0 VU.

While this is generally accepted as the range of digital audio, it is not a hard standard. When digital audio values are converted back to analog, some digital audio equipment provides level selections to shift the analog output levels of 0 VU to -18 dBFS or -14 dBFS. Lowering the dBFS relationship increases the audio sound levels output from the D/A converter.''

Some explanations with reference to standards are provided by Hugh Robjohns, technical editor of Sound on Sound, in Q. What are the reference levels in digital audio systems?, from which I borrow diagrams for EBU R68 (top) and SMPTE RP155 (below):


So what does this all mean for recording and mastering ?

Let's start getting into some concrete tips for recording (and compare them with mastering). I figure the Final Cut Pro people would know what they are talking about, and their recommendations agree well with the other tips and diagrams I provide below. From Final Cut Pro7: User Manual: About Audio Meters:

'There are several common digital levels used to correspond to 0 dB on an analog [VU] meter:

-12 dBFS: This level is often used for 16-bit audio such as DV audio, and for projects with compressed dynamic ranges, such as those for television or radio.

-18 or -20 dBFS: This level is more common on projects with higher dynamic range, such as professional post-production workflows using 20- or 24-bit audio.'

- And similarly from Final Cut Pro: Understanding Audio Meters :

'As a general guideline, if you are working with 16-bit audio, you should set your audio level around -12 dBFS. If you are working with 20- or 24-bit audio, you should set your audio level around -18 or -20 dBFS.'

- From Audio Metering Introduction: Audio Geek Zine:

'VU

Mic preamps, converters, hardware effect processors are all designed to work optimally at 0 VU. They can usually handle more than that before distorting, but 0 VU is where the signal to noise is best. VU stands for Volume Unit and is the oldest analog metering system. VU meters are relatively slow moving with at 300ms response time. This slow response of a VU meter better represents an averaged volume level close to how our hears work. 0VU is equal to +4dBu or professional line level.

dBu

The dBu scale measures the analog voltage level in our equipment with 0dBu calibrated to about 0.775 Volts. The u in dBu stands for ‘unloaded’ which means that the voltage is measured with a zero resistance load. Again, 0VU or +4dBu is the ideal constant voltage of all your analog components in the recording and monitoring chain.

Here’s an example chain – microphone, mic preamp, compressor, audio interface line input, Analog to digital converter, recording software.

The microphone signal gets boosted up to line level by the preamp. Line level goes into and out of the compressor into the audio interface. The analog to digital converter assigns bits representing the voltage coming in and sends the data to your DAW.

Digital Meters

Once it’s in your DAW the level you see will not be 0 on your track meters, it will actually be closer to -18dBfs depending on the calibration. This may seem like a really low level but this is actually the optimal level that all the analog components that come before it.

Once you build up your song with several other tracks, you’ll be happy you have that extra headroom and lower noisefloor.

0VU = +4dBu = -18dBFS: This is the only thing you need to remember
[ED: it is assumed he is talking about EBU Digital 24-bit, the equivalent for 16-bit digital would be -12dBFS, and for SMPTE digital 24-bit it's -20dBFS.]

dBFS

The dBFS meters show Decibels relative to full scale. Instantaneous digital levels below the 0dBFS absolute peak. When 3 consecutive samples pass 0 the clip light will come on.

dB RMS

Now what’s left is RMS metering. Some DAWs have this in addition to Peak metering on the master. Similar to how VU meters work, RMS meters show an average level. The RMS value relates to how loud a sound is perceived.

These days all music is mastered to peak just below 0dBFS, the unwritten standard is -0.3,
but the song with the higher RMS level will appear to be louder.

There isn’t a widespread calibration standard for RMS metering so you’ll have to compare values from a few references to what you’re working on.'

[ED: this mastering recommendation of -0.3dBFS is relatively high compared with some other recommendations.]

There is a fantastic resource at Audio Studio Recording: Mastering and Gain Structure that enables you to compare live dBFS mappings for different calibrations, along with excellent explanations of all aspects of every metering scale:

'dBFS meter description

dBFS meters are either hardware- or software-based digital meters that can run anywhere from - 40dB to - inf (- infinity) on the low end, but invariably end at 0dB on the high end. Color schemes for these meters vary (especially on the software versions) but typically turn red at or near the - 3dB to 0dB range at the top of the scale.

Many dBFS meters include a single LED or other type or illuminated indicator usually labeled either "OVER" or "CLIP".

dBFS meters display digital levels

dBFS meters visually indicate signal levels as defined by the values of the digital samples of an analog signal that as been converted to digital data. The top of the meter (0dBFS) indicates a digital value where all the bits of a digital sample have a value of 1. A digital value of all 1s is, by definition, the highest possible value that can be represented in a binary digital form. There is nothing louder than a digital value of all 1s. Therefore 0dBFS (the top of the meter) represents the maximum possible volume on any digital signal.

Note that this 0dBFS maximum is true regardless of the digital word length (a.k.a. bit depth) used. Whether we are recording at an 8-bit, 16-bit, or 24-bit word length doesn't matter here; as long as every bit in a sample has a value of 1, it will translate to 0dBFS on the meter.

dBFS meter calibration

dBFS meters do not directly represent analog voltages or signal levels, they provide a graphic representation of binary digital values only. As such, any correlation between analog levels and digital values is determined by the calibration of the analog-to-digital converter (ADC) circuitry in the recording signal chain.

Unfortunately there is no definitive standard of conversion in ADCs for converting from dBu to dBFS; it varies from brand to brand, model to model, even country to country. A pro-grade line level of +4dBu can typically equate to anywhere from -12dBFS to -20dBFS on the digital scale, depending on the individual ADC's calibration. Some ADCs even have switches on them offering multiple calibration settings.

There are many quality ADCs that convert +4dBu to -18dBFS as a default. For this reason, this is what many engineers quote as the conversion factor, and it is also the default display setting for our meter on the left. But the number of ADCs that do or can equate +4dBu to a different digital level than that are at least as numerous as those that equate it to - 18dBFS, so we need to check the specs on our ADCs to ensure we are using the right calibration standard for our recording.

[ED: one can use their cool Analog to Digital Conversion Calculator on the left in their page to see live how the dBFS meter levels can change based upon the calibration of a converter.]

More bits means more range

Because 0dBFS is the absolute top of the digital scale regardless of the number of bits used in our digital samples, the number of bits used does instead matter towards the bottom end of the dBFS scale. The more bits we have, the lower of a volume we can digitally represent, and the greater of a dynamic range we have to work with in the digital domain.

This range can be calculated by multiplying the number of bits by 6dB. Therefore 8 bits gives us a maximum range from 0dBFS to - 48dBFS. 16 bits will go from 0dBFS down to - 96dBFS, and 24 bits from 0dBFS down to - 144dBFS.

The "CLIP" or "OVER" indicator

Many dbFS meters include a separate indicator labeled "CLIP" or "OVER". This lights up when the meter "believes" that the incoming analog signal may have been higher than the digital 0dBFS. Because in the digital realm there can be nothing higher than 0dBFS, anything analog coming in higher than that is simply "clipped off" at 0dBFS during the conversion to digital. The "CLIP" or "OVER" indicators warn us when that clipping may be happening.

Because there is nothing above 0dBFS, the only way a meter can determine if clipping is occurring is by looking for consecutive samples of 0dBFS, the assumption being that a flattened waveform with a flat top of more than one sample in a row at maximum value most probably means that the top of a normal waveform has been clipped off.

Unfortunately here again there is no standard. Some clip lights are programmed to light up as soon as a single sample hits 0dBFS. Others wait for three consecutive 0dBFS samples to confirm that a real clip has taken place before lighting up. There are even others that will wait for as long as 8 consecutive samples before lighting up on the theory that shorter clips than that cannot be heard.'

'+4dBu "Pro" Line Level

In order for different pieces of analog audio gear to be able to properly send signals to each other without those signals being too weak or too strong for any given piece of gear, all such gear is designed to operate at a standard "line level".

"Line level" refers to the average signal voltage at which the standard line inputs and outputs of most of our audio gear is designed to operate. For this reason, the average-reading VU meters on most audio processing gear are calibrated so that a reading of 0VU indicates a line level voltage.

"Pro" line level

Most professional-grade and prosumer audio recording gear is designed to operate at a standard line level of +4dBu (~1.23 volts). However, some gear have switches or circuitry on them that let the user select between a "pro" line level of +4dBu and the "consumer" line level of -10dBV (approx. -7.8dBu or ~0.32 volts.)

Un-level playing fields
Because of the huge difference between "pro" and "consumer" line levels - "pro" line level is almost 4 times the voltage as "consumer" line level - It's important to know at which level your gear operates.

If you run a -10dBV "consumer" signal into a +4dBu "pro" input, the signal will be running almost 12dB lower than expected; having to boost the input that extra 12dB will also increase the noise level of the signal by almost 12dB.

Conversely, running a +4dBu signal into a -10dBV input will be inputting a signal almost 12dB hotter than expected, potentially cutting the amount of peak headroom in the device and opening up the possibility of extra signal distortion.'

I am going to offer you also for your reference during this discussion this excellent (respectively borrowed) summary image of analogue and digital levels; don't be overwhelmed, for this summary I will focus on digital. It is from a highly recommended blog article Digital Recording Levels - a rule of thumb from 2009 by ZedBee". [ED: I can't hotlink to the image, and borrowing it here uploaded is a hopefully forgiven breach of copyright for educational purposes, do please read the original article too.]


Getting a hold on the new Loudness measures

Ok, so let's examine some of the new human-perception-based loudness measures (as opposed to sound pressure dB measures, voltage dB measures, or digital dB measures). I highly recommend you watch firstly this absolutely brilliantly clear screencast video tutorial with live examples by Ian Shepherd from Production Advice UK. He knows exactly what he is talking about, and shows you in RMS/peak on old VU meters and on the new LUFS loudness meters with a real music project (then do please come back here):

- YouTube: LUFS - the new Loudness Units. What do they mean ?

- LUFS, dBFS, RMS… WTF ?!? How to read the new loudness meters (this includes a Pink Noise download so you can compare with his results).

Alright, so you get that there are different kinds of Root Mean Square (RMS) measures for music/sound, and that one has to be careful comparing them, but basically LUFS is like RMS but adjusted for human perception. It is also comparable with relative dB units (but always on the LUFS scale). So adjustments in dB will give a similar adjustment in LU. This means, that even if you don't yet have an official LUFS monitor, you can get a feel (only) using just RMS measures.

Let's explore some RMS and Peak dBFS stats first

My software tips here are all Mac specific (currently Mountain Lion 10.8.5) , but some of these tools are also available for other UNIX/Linux machines, and some even run on Windows (how about that).

I recommend that with some of your audio files and with the Ian Shepherd pink noise test WAV file you explore some stats. Visit also: Audio engineering test/sample file resources, and online generators and online audio tests.

Audacity is a free, open-source, cross-platform audio editor for Mac, GNU/Linux Windows etc. It's not the world's best audio editor (especially not for MP3 or AAC because it imports, processes, then reexports with a tiny quality loss rather than say direct MP3 editing), but it has lots of plugins and FX and is sufficient for experiments. Internally it works in 32-bit floating point LPCM at up to 96kHz.

There is an unofficial Wave Stats plugin for Audacity that performs excellent wave analysis over regions of about 30s length, which is enough for you to explore the difference between dBFS RMS and max peaks, and to get a feel on some different audio files. Typical output from the plugin:

Another useful audio analysis tool is ffmpeg used in command line mode. It is available on Mac using MacPorts. I got it working OK on Mac OS X Mountain Lion, but you should at least be a bit UNIX savvy to try this. You will also need the LAME MP3 Encoder port if you want to deal with MP3.

$ sudo port install ffmpeg

$ sudo port install lame

(As always with MacPorts, don't be scared to use that -f (force) option if you upgraded your os recently !)

Store the following in a file at ~/bin/@ffmpeg-statistics:

#!/bin/bash
ffmpeg -i "$1" -filter:a "volumedetect" -vn -f null /dev/null

Make sure you make it executable with:

$ chmod +x ~/bin/@ffmpeg-statistics

And do when running it on a file, do "quote" your audio file name if it contains any spaces:

$ @ffmpeg-statistics "my chill music audio file.mp3"

The statistics output, for a run on a 128kbps MP3 chill music file, is like:

Duration: 00:05:32.43, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s
Output #0, null, to '/dev/null':
Metadata:
..
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (mp3 -> pcm_s16le)
..
size=N/A time=00:05:32.43 bitrate=N/A
video:0kB audio:57263kB subtitle:0 global headers:0kB muxing overhead -100.000038%
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] n_samples: 29318494
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] mean_volume: -17.1 dB
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] max_volume: -1.4 dB
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_1db: 12
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_2db: 583
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_3db: 9111
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_4db: 42170

There you have it, an indication of whether anything clipped (it didn't, max is negative and less than 0dBFS) and what your mean volume (RMS) is without even loading the audio file in an editor.

You might also understand why I wanted you to see the FFmpeg basic RMS and Peak stats processing before we examine its EBU R128 filter (later), because the RMS and Peak stats tell us important things - like whether we clipped at all, that we need to know anyway.

Here is the result on the Pink Noise WAV file from the Ian Shepherd LUFS tutorial:

Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'Pink_Noise-production-advice-lufs-dbfs-test.wav':
Metadata:
artist : Fred Nachbaur
Duration: 00:00:10.00, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Output #0, null, to '/dev/null':
Metadata:
artist : Fred Nachbaur
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s16le, 44100 Hz, mono, s16, 705 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le)
Press [q] to stop, [?] for help
size=N/A time=00:00:10.00 bitrate=N/A
video:0kB audio:861kB subtitle:0 global headers:0kB muxing overhead -100.002494%
[Parsed_volumedetect_0 @ 0x7ffe53000000] n_samples: 441000
[Parsed_volumedetect_0 @ 0x7ffe53000000] mean_volume: -14.7 dB
[Parsed_volumedetect_0 @ 0x7ffe53000000] max_volume: -2.1 dB
[Parsed_volumedetect_0 @ 0x7ffe53000000] histogram_2db: 13
[Parsed_volumedetect_0 @ 0x7ffe53000000] histogram_3db: 111
[Parsed_volumedetect_0 @ 0x7ffe53000000] histogram_4db: 566

The max_volume and mean_volume values agree well the Peak and RMS values seen in the tutorial video meters.

Let's compare the FFmpeg stats with the Wave Stats plugin for Audacity applied on the same Pink Noise test:

The peak and RMS stats agree exactly with FFmpeg !

FFmpeg command line is very handy and fast, it's nice to not always have to load files in an editor and it is very useful for batch runs over many files (with some simple UNIX Bash shell scripting).

But I found you have to be a bit careful with it. FFmpeg tries to detect the input format, and bit rate or bit depth, but unless you explicitly give an output format, it will assume what it calls 16-bit "pcm_s16le" as output format (which in the command form above gets thrown away anyway). It makes no difference to the stats calculated on the input file, but for example say we are examining a 24-bit pink noise file, then it might be better used thus:

ffmpeg -i PinkNoise-10mins-24bit-48kHz.aiff -acodec pcm_s24le -filter:a "volumedetect" -vn -f null /dev/null

This gives more sensible input and output format identification and mapping:

Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, aiff, from 'PinkNoise-10mins-24bit-48kHz.aiff':
Duration: 00:10:00.00, start: 0.000000, bitrate: 2304 kb/s
Stream #0:0: Audio: pcm_s24be, 48000 Hz, stereo, s32, 2304 kb/s
Output #0, null, to '/dev/null':
Metadata:
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s24le, 48000 Hz, stereo, s32, 2304 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s24be -> pcm_s24le)
size=N/A time=00:10:00.00 bitrate=N/A
video:0kB audio:168750kB subtitle:0 global headers:0kB muxing overhead -100.000013%
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] n_samples: 57600000
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] mean_volume: -24.2 dB
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] max_volume: -12.0 dB
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_11db: 525
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_12db: 8270
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_13db: 38442
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_14db: 131459

It has correctly detected the input file as 24-bit 'pcm_s24be', and it now has a pseudo (discard) output also at 24-bit.

However, I found that FFmpeg failed to detect the bits within the format of a 24-bit FLAC file:

Input #0, flac, from 'PinkNoise-10mins-24bit-48kHz.flac':
Duration: 00:10:00.00, bitrate: 2305 kb/s
Stream #0:0: Audio: flac, 48000 Hz, stereo, s32
Output #0, null, to '/dev/null':
Metadata:
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s24le, 48000 Hz, stereo, s32, 2304 kb/s

And I also don't understand why in the above examples it mentions 's32' in the stream format..

But for the sake of discussion of levels and loudness, the RMS and peak stats make sense and are consistent, so let's move on, this article is not supposed to be a tutorial on FFmpeg. For more on FFmpeg for audio, including description of formats, visit also: FFmpeg: command line and GUI audio/video conversion tool: audio references

We simply note for now that we have to be careful when comparing wave statistics between 16-bit and 24-bit sample depths.

Ok, so far we have looked at old RMS and Peak, but what about LUFS loudness stats ?

Unfortunately, as far as I can tell Audacity does not yet support LUFS, but there is apparently already a plan to change/enhance the VU Meter to conform to the EBU Standard R128.

But FFmpeg does now offer an EBU R128 audio filter as of at least version 2.0.2. To see whether it is available for your version use:

$ ffmpeg -filters | grep -i r128

ebur128 A->N EBU R128 scanner.

You can then perform loudness measurement runs like this (for example, on our pink noise WAV sample):

ffmpeg -i Pink_Noise-production-advice-lufs-dbfs-test.wav -filter:a "ebur128" -vn -f null /dev/null

Output (with most scan lines removed) is:

Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'Pink_Noise-production-advice-lufs-dbfs-test.wav':
Metadata:
artist : Fred Nachbaur
Duration: 00:00:10.00, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Output #0, null, to '/dev/null':
Metadata:
artist : Fred Nachbaur
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le)
Press [q] to stop, [?] for help
[Parsed_ebur128_0 @ 0x7f9452c2d880] t: 0.0999792 M:-120.7 S:-120.7 I: -70.0 LUFS LRA: 0.0 LU
[Parsed_ebur128_0 @ 0x7f9452c2d880] t: 0.199979 M:-120.7 S:-120.7 I: -70.0 LUFS LRA: 0.0 LU
..
Parsed_ebur128_0 @ 0x7f9452c2d880] t: 10.0003 M: -14.4 S: -14.4 I: -14.4 LUFS LRA: 0.1 LU
size=N/A time=00:00:10.00 bitrate=N/A
video:0kB audio:938kB subtitle:0 global headers:0kB muxing overhead -100.002292%
[Parsed_ebur128_0 @ 0x7f9452c2d880] Summary:

Integrated loudness:
I: -14.4 LUFS
Threshold: -24.4 LUFS

Loudness range:
LRA: 0.1 LU
Threshold: -34.4 LUFS
LRA low: -14.5 LUFS
LRA high: -14.3 LUFS

At -14.4 LUFS integrated loudness, the pink noise is way above the EBU R128 broadcast standard of -23 LUFS.

LUFS meters for Audacity

As far as I can tell there is no LUFS meter available specifically for Audacity, however I found the following FREE from Klangfreund: LUFS Meter:

'EBU R128 compliant loudness measurement

The LUFS Meter plugin enables you to deliver loudness-calibrated content.

Multi-Platform, Multi-Format

Available as VST- and Audio Unit-plugin on Mac. On Windows, the LUFS Meter is available as a VST-Plugin. 32 and 64 bit. Support for Linux and other plugin formats is planned.

http://www.klangfreund.com/lufsmeter/download/'

Please note that for Audacity you just use 32-bit version, as Audacity does not support 32-bit VST plugins !

I managed to get it to run (preview) our pink noise test file within Audacity:

But it kept crashing Audacity whenever I clicked the Ok button !

Visit also:

- Mac OS X: EBU R128 compliant loudness meters and batch processing

- Mac OS X: audio engineering plugins

Playing with the loudness: normalization, amplification, attenuation

Audacity has a nice enough Normalize function under Effects, but it only works in terms of the maximum (peaks), it does not let you set RMS values, and certainly nothing fancy like the new LUFS loudness measures.

There is a command line 'normalize' you can also install using mac ports:

sudo port install normalize

It seems to only run on WAV files, I could not get it to see MP3 files.

Let's investigate an MP3 file with chill music I have already normalised in Audacity to -2dBFS. The FFmpeg RMS stats run gives:

Parsed_volumedetect_0 @ 0x7faab2000000] n_samples: 20731486
[Parsed_volumedetect_0 @ 0x7faab2000000] mean_volume: -14.4 dB
[Parsed_volumedetect_0 @ 0x7faab2000000] max_volume: -1.5 dB
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_1db: 23
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_2db: 741
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_3db: 9246
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_4db: 63090

Clearly the peak normalisation in Audacity to -2dBFS was not perfect, as the maximum is -1.5dB, but it is at least in the right ball park.

Compare with the LUFS runs:

Integrated loudness:
I: -12.2 LUFS
Threshold: -22.6 LUFS

Loudness range:
LRA: 3.0 LU
Threshold: -32.5 LUFS
LRA low: -14.1 LUFS
LRA high: -11.1 LUFS

The mean_volume was -14.4 dB, but the integrated loudness was -12.2 LUFS. This is also way above the EBU R128 broadcasting recommendation -23 LUFS; yet it works just brilliantly on my iPhone used as an iPod !

Changing the loudness in LU units to a target value

As a rule of thumb, the LUFS loudness can be adjusted in LU by making the same dBFS amplitude change in dB.

FFmpeg has a simple audio volume filter. See How to change audio volume up-down with FFmpeg:

'To turn the audio volume up or down, you may use FFmpeg's Audio Filter named volume, like in the following example. If we want our volume to be half of the input volume:'

ffmpeg -i input.wav -af 'volume=0.5' output.wav

However, this is not in dB, but we recall that halving is the same as reducing by 6dB. If we go back to my chill track with loudness 12.2LUFS, applying volume=0.25 should reduce the loudness by 12LU to about -24.2LUFS. Performing this adjustment, and rerunning the FFmpeg EBU R128 filter gives:

Integrated loudness:
I: -24.7 LUFS
Threshold: -35.0 LUFS

Loudness range:
LRA: 3.0 LU
Threshold: -45.0 LUFS
LRA low: -26.6 LUFS
LRA high: -23.6 LUFS

The rule of thumb has worked well enough, the integrated loudness is now -24.7 LUFS, the prediction based on -12dB reduction was -24.2LUFS.

Audacity has the ability to easily adjust the volume in dB units: see Amplify and Normalize.

So there it is, we have examined Peak and RMS statistics and LUFS loudness statistics for files and made reasonably accurate loudness adjustments using completely free Mac tools, including on the command line. I know there are now a range of much fancier LUFS tools available for Mac, but it's nice to know one can at least do it this way for nothing. Right, let's throw away that adjusted file, it's far too quiet for playing on my iPod !

Some recommended loudness levels for different applications

So time, for some basic recommendations.

As already illustrated, I am currently into "chill" music, and I want large collections of chill music that play at roughly the same "loudness" for a long time without having to adjust the volume (say if played through speakers while I am working at my computer, I don't want to have to frequently get up to adjust the volume).

And I want these collections to be usable on systems that do not use ReplayGain or Sound Check, the proprietary system for iTunes and iPod. (Besides, if I get it basically right without them I can always also use the same collections with those technologies as well.)

I have chosen my rule-of-thumb standard for "iPod preparation" of chill stuff in MP3 as -2dBFS peak in cases where the 'mean_volume' RMS (according to FFmpeg) is around -17dBFS to -14dBFS, which is about -15LUFS to -12LUFS loudness according to FFmpeg on this kind of music.

This is clearly much higher than the comparable -23LUFS European broadcasting standard. But remember, this is for playing on home devices, ipod, iphone etc.

I am perfectly aware that max peak values do not give a reliable indication of RMS values or LUFS loudness values, but I know in advance that the music I am treating in this case is quite compressed chill. I don't have the facility (yet) for automatically applying an LUFS loudness requirement in batch mode, whereas I can apply max peak normalisation easily, and in any case:

Just applying a high LUFS loudness measure (needed for say iPod) blindly does not ensure there is no clipping !

I find the resulting loudness range when normalising to -2dBFS max peak (for this kind of music) works well on quality headphones on my Macbook Pro, and on my iPhone headphones walking along the street, and just as well when played from an iPod via a mixer through my Opera DB Live powered speakers (yes I use a musician's PA at home instead of high quality "audiophile" speakers).

My rule-of-thumb, using simple peak normalisation, can be applied safely and reliably for lots of different kinds of chill and other music after performing a stats run (and I always do the FFmpeg stats run). Note that I leave still a little bit of room to play at the top end with -2dBFS.

Of course, many recent mastering standards/recommendations, especially for CDs, push it even higher, much closer to 0dBFS max peak, and far less dynamic range than was used in the past.

If you perform some measurements on a wide range of popular music ripped from CD you will find that there is typically a max peak range from as low as -5 dBFS (mostly from the 1980s) right up to 0dBFS and with much higher compression in recent years: See also this fantastic article The Death of Dynamic Range from Bob Speer of CD Mastering Services with some measurements and comparisons between decades, and this wise remark:

'You want your music to be loud? You can make it loud yourself [by TURNING UP YOUR STEREO'S VOLUME CONTROL] -- and the full quality and dynamic range of the music is preserved. .. But when all of your CDs are recorded to be loud right on the discs themselves, you don't have this choice anymore; you no longer have a variety of "loud" music and "quiet" music to choose from and to play at a volume level that suits your musical taste. The record companies are not only filling your CDs with distorted, corrupted audio, they are forcing you to listen to your music in a certain manner -- do you really want that?

Also from Bob Speer in 2001 comes the wonderful What Happened To Dynamic Range?, with this wonderful animation:

Again: My -2dBFS max peak tip (for chill music) is not a recommendation for a professional TV or radio broadcaster or a movie theatre; It is for a personal music collection to be played through a range of devices at home or on the go that will likely find the -23LUFS European broadcasting standard way too low.

So what about live music recording levels for multi-track ?

The recommendation above (-2dBFS peak) is clearly also not a suitable level for most digital recording of live music, and especially not for multi-track recording, where you will be likely reusing and altering a track in different contexts, combined with other tracks, and subjected to various FX and compression etc. Here is a recommendation for recording levels (based on some careful analysis of dynamic range ) from dBzee: Digital Recording Levels - a rule of thumb:

'The rule of digital thumb

1. Record at 24-bit rather than 16-bit.

2. Aim to get your recording levels on a track averaging about -18dBFS. It doesn't really matter if this average floats down as low as, for example -21dBFS or up to -15dBFS.

3. Avoid any peaks going higher than -6dBFS.

That's it. Your mixes will sound fuller, fatter, more dynamic, and punchier than if you follow the "as loud as possible without clipping" rule.'

Also:

'Most interfaces are calibrated to give around -18dBFS/-20dBFS when you send 0VU from a mixing desk to their line-ins. This is the optimum level!
-18dBFS is the standard European (EBU) reference level for 24-bit audio and it's -20dBFS in the States (SMPTE).'

I have found during my recent online research similar recommendations based on very precise analysis of noise features and the capabilities of 24 bit digital systems, typical converters, and above all, the capabilities also of typical analog to digital converters.

And another interesting discussion from Sound on Sound (SOS) Technical Editor Hugh Robjohns: Q How much headroom should I leave with 24-bit recording?:

'The basic idea is to treat -18dBFS as the equivalent of the 0VU mark on an analogue system’s meter, and that’s where the average signal level should hover most of the time. Peaks can be way over that, of course ..

If the material you are recording is well controlled and predictable in terms of its peak levels — like hardware synths tend to be, for example — you could legitimately reduce the headroom safety margin if you really want to. But in practice there is little point.

The only advantage to recording with less headroom is to maximise the recording system’s signal-noise ratio, but there’s no point if the source’s signal-noise ratio is significantly worse than the recording system’s, and it will tend to be that way with most analogue synth signals, or any acoustic instrument recorded with a mic in a normal acoustic space. The analogue electronic noise floor or the acoustic ambience will completely swamp the digital recording system’s noise floor anyway.

Recording ‘hot’, therefore, won’t improve the actual noise performance at all, and will just make it harder to mix against other tracks recorded with a more reasonable amount of headroom. One issue that comes up a lot is the confusion between commercially released media (CD, MP3, for example), which have no headroom margin at all (they peak to 0dBFS), and the requirement for a headroom margin when tracking and mixing.

Going back to traditional professional analogue audio systems, the practice evolved of recording signal levels that averaged around 0VU. OK, you could push things a few decibels hotter sometimes for effect with analogue tape, but a level of around 0VU was the norm, and that normally equated to a signal level of about +4dBu (VU meters are averaging meters and don’t show transient peaks at anything like their true level).

Analogue equipment is designed to clip at about +24dBu, so, in other words, the system was engineered to provide around 20dB of headroom above 0VU. It’s just that the metering systems we use with analogue don’t show that headroom margin, so we forget it’s there. Digital meters do show it, but so many people don’t understand what headroom is for, and so feel the need to peak everything to the top of the meter anyway. This makes it really hard to record live performances, makes mixing needlessly challenging and stresses the analogue monitoring chain that was never designed to cope with +20dBu signal levels all the time.

By recording in a digital system with a signal level averaging around -18 or -20 dBFS, you are simply replicating the same headroom margin as was always standard in analogue systems, and that headroom margin was arrived at through 100 years of development for very good practical reasons.

.. vworking with average levels of around -20dBFS or so is fine and proper, works in exactly the same way as analogue, and will generally make your life easier when it comes to mixing and processing.

The old practice of having to get the end result up to 0dBFS is a mastering issue, not a recording and mixing one. It is perfectly reasonable (after the mix is finished) to remove the (now redundant) headroom margin if that is what the release format demands.
..
A sensible headroom margin is essential when tracking, to avoid the risk of clipping and allow you to concentrate on capturing a great performance without panicking about the risk of ‘overs’. A similar margin is also required when mixing, to avoid overloading the mix bus and plug-ins (yes, I know floating-point maths is supposed to make that irrelevant, but there are compromises involved that can be easily avoided by maintaining some headroom!).

Once the mix is finished, the now redundant headroom can be removed, and that is a standard part of the mastering process for digital media like CD and MP3.'

So this is what I am basically doing when I go for -2dBFS max peak and around -17 to -14dBFS RMS (about -15LUFS to -12LUFS according to FFmpeg) for chill music end mixes. Play it through headphones on your Mac laptop or iPod or iPhone and you'll find out pretty quickly why. Most modern personal devices seem to benefit on playback from way more volume juice than the -23LUFS broadcast standard.

REMEMBER: preparing pre-recorded, pre-mixed music for playback on your personal playback devices (or capturing/stealing from computer audio sources like online radio streams) recording live music tracks, and mastering are completely different exercises !

Some more useful references on digital audio, quantization, and digital vs. analog levels

All About Digital Audio: Pt 2 by high Robjohns: excellent description of digital quantization and digital noise, from 1998, but still very relevant:

'When it comes to quantising the individual samples of an analogue audio signal, it turns out that our ears can easily hear very small errors in the measurements -- even down to tiny errors as small as 90dB or more below the peak level -- so we have to use a very accurate measurement scale. Figure 1 shows a few audio samples being measured against a very crude quantising scale simply to show the principles involved. Each level in the scale is denoted by a unique binary number -- in this case, three bits are used to count eight levels (including the base line at zero).

Some samples will happen to be at exactly the same amplitude as a point on the measurement scale, but others will fall just above or below a division. The quantising process allocates each sample with a value from the scale, so sometimes the quantised value is slightly lower than the true size of the audio sample, and sometimes slightly bigger. These errors in the description of a sample's size are called quantising errors and they are an inherent inaccuracy of the process.

When the digital data representing the quantised amplitude values is used to reconstruct samples for replay, some of those samples will be generated slightly louder or quieter than the original analogue audio signal from which they were derived -- they will not be entirely accurate. However, whether an audio sample falls on, above, or below a quantising level, and by how much a level is missed is essentially random -- and a random signal is noise. Consequently, quantising errors tend to sound like hiss -- white noise -- added to the original audio signal.

The only way to make quantising noise quieter is to reduce the size of the quantising errors, and the only way that can be done is by making the quantising intervals smaller -- in other words, by using a finer, more accurate scale for the measurements -- just like in the carpet example earlier. The errors will still be there, but if you choose small enough quantising intervals, the errors become vanishingly small, as does the hiss. However, finer gradations require more quantising levels, and so more binary digits are needed to count them.

If the number of quantising levels is doubled, the spacing between individual levels must be halved, and so the potential size of quantising errors must be halved as well. A doubling or halving (in terms of dBs) is 6dB; so every time the number of quantising levels is doubled, the hiss caused by quantising errors is reduced by 6dB. In binary counting, each extra bit added to the number allows twice the number of levels to be counted -- three bits can count eight quantising levels, four bits count sixteen, and five bits count 32 levels. This relationship gives us a handy rule of thumb to estimate the potential dynamic range of a digital system: For each extra bit used to count quantising levels, quantising noise is reduced by 6dB.

So, for example, an 8-bit system should have a dynamic range of 48dB, a 16-bit system (such as DAT and CD) should have a range of around 96dB, and a 24-bit system about 144dB.'

From Vincent Kars, 2012 The Well-Tempered Computer: 16 or 24 bits, explains exactly why it is better to record at 24bit:

1 bit=6 dB

SNR=6N+1.8 dB (N in bits) to be exact but for convenience sake, let’s use 6.

The loudest possible signal in digital audio (all bits are 1) is the reference, this is called 0 dBFS (dB Full Scale). All other measurements expressed in terms of dBFS will always be less than 0 dB (negative numbers). 16 bits will go down to -96 dBFS and 24 to -144 dBFS. In essence, 24 bits continue where 16 bits stops. It can resolve micro details 16 bits can’t.

Noise floor

The theoretical maximum signal-to-noise ratio in an analogue system is around 130dB. In practice 120 dB is a very good value. You can’t escape thermal noise

A couple of specs:

Benchmark ADC1 (24 bits 192 kHz) A/D THD+N, 1 kHz at -1 dBFS -102 dBFS, -101 dB, 0.00089%
Benchmark DAC1 THD+N: (w/-3 dBFS input) -107 dB, 0.00045%
Prism Orpheus AD (line in) THD+N -111dB (0.00028%, -0.1dBFS)

Yes 24 bit can capture those very soft tiny details 16 bit can’t but pretty soon you end in the noise floor of the equipment.

The big debate

You can find many debates on the internet about 16 vs. 24 In the pro world this debate has been settled, almost everybody is recording with 24 bits today. They have some very good reasons to do so ..

Also useful concerning levels and metering:

- Meter Madness: Understanding meters and what they're telling us..., By Mike Rivers (RecordingMagazine): Excellent reading, includes the history of VU meters, and the move to digital metering.

- Final Cut Pro: Setting Proper Audio Levels.

- The Well-Tempered Computer: Volume control. In general a super site for discussions on audio. This article compares volume control, quantization errors, and signal-to-noise for 16-bit digital, 24-bit digital, and analog. Excellent calculations and comparison tables, and explains why some audiophiles recommend controlling volume if possible with analog rather than digital (even with 24-bit) to keep noise down (unless you are using floating point digital).

Related: ESS Digital vs Analog volume control slides (PDF). Has excellent graphs in frequency domain of progressive volume reduction in a digital system, showing why it encourages noise, and why (as long as you have nice smooth analog volume control) audiophiles generally avoid digital volume control.

- Wikipedia: DBFS has the following to say on comparing dBFS with analog levels (compare with the graph above):

dBFS is not to be used for analog levels, according to AES-6id-2006. There is no single standard for converting between digital and analog levels, mostly due to the differing capabilities of different equipment. The amount of oversampling also affects the conversion with values that are too low having significant error. The conversion level is chosen as the best compromise for the typical headroom and signal-to-noise levels of the equipment in question. Examples:

- EBU R68 is used in most European countries, specifying +18 dBu at 0 dBFS

- In Europe, the EBU recommend that -18 dBFS equates to the Alignment Level

- European & UK calibration for Post & Film is −18 dBFS = 0 VU

- UK broadcasters, Alignment Level is taken as 0 dBu (PPM4 or -4VU)

- US installations use +24 dBu for 0 dBFS

- American and Australian Post: −20 dBFS = 0 VU = +4 dBu

- The American SMPTE standard defines -20 dBFS as the Alignment Level

- In Japan, France and some other countries, converters may be calibrated for +22 dBu at 0 dBFS.

- BBC spec: −18 dBFS = PPM "4" = 0 dBu

- German ARD & studio PPM +6 dBu = −10 (−9) dBFS. +16 (+15)dBu = 0 dBFS. No VU.

- Belgium VRT: 0dB (VRT Ref.) = +6dBu ; -9dBFS = 0dB (VRT Ref.) ; 0dBFS = +15dBu.

[ED: Warning: the above does not specify the digital bits, usually 24-bit applies here.]

The EBU R68 standard summary 2000 (PDF) makes this important statement:

'The EBU recommends that, in digital audio equipment, its Members should use coding levels for digital audio signals which correspond to an alignment level which is 18 dB below the maximum possible coding level of the digital system, irrespective of the total number of bits available.'

Note that this does agree with standard practice in many application domains for 24-bit, but it is not what many people recommend for 16-bit ! Look at the chart above from Zed Brookes again, and notice the Pro Reference Levels:

+4dBu = 0dBVU = 0VU = -12dBFS(16-bit) = -18dBFS(24-bit EBU) = -20dBFS(24-bit SMPTE)

Some more useful references on loudness, and the "new" European standards vs the USA standards

This one from the BBC is excellent, and at only 13 pages with good summaries well worth reading from top to bottom: White paper: Jan 2011: Terminology for Loudness and Level dBTP, LU and all that by Senior Research Engineer Andrew Mason, available as PDF download. It points out that:

'For broadcasting, there is one loudness measurement technique that we should know about. This has been relatively recently standardised by the ITU, and is known as Recommendation ITU-R BS.1770'

'The measurement uses a “K” weighting, so we have the subscript “K” for the quantity “L”. The
result is expressed in “LUFS” – Loudness Units relative to Full Scale. 1770 still refers to “LKFS”,'

'The 1770 algorithm is defined such that a stereo sine wave at 1kHz, at -18 dBFS, will have a
loudness level, LK, of -18 LUFS'

'Target level – the origin of “-23”

For the sake of a simple life, and reduced audience annoyance, EBU R 128 recommends that all
programmes be normalised to an average foreground loudness level of -23 LUFS. The figure
of -23 LUFS was chosen as the result of a careful study of broadcasting practice, dynamic range
tolerance, and the capabilities of different transmission technologies. Note that this value assumes
that gating is used in the measurement to prevent long pauses in a programme bringing down the
average loudness.'
..

'True Peak

The general shift away from quasi-peak metering towards loudness metering is complemented by
a move towards true peak metering as well. There are three “peak” metering terms that it might
be useful to clarify:

- quasi-peak – not really peak at all. Historically measured with a mechanical meter with controlled
rise and fall times, such as the well-known “PPM”. Now done in software for digital applications
using, for example, a 10ms integration time.

- sample peak – digital measurement of the highest sample value in the signal;

- true peak – digital measurement, interpolating between the actual samples in order to take account
of over-shoots that would occur later, with, for example, sampling rate conversion. Recommendation ITU-R BS.1770 includes an over-sampling true-peak meter.'

Wikipedia: Peak programme meter.

Wikipedia: Loudness monitoring

From LUFS/LKFS:

'Loudness, K-weighted, relative to Full Scale (or LKFS) is a loudness standard designed to enable normalization of audio levels for delivery of broadcast TV and other video. LKFS is standardized in ITU-R BS.1770. Loudness units relative to Full Scale (or LUFS) is a synonym for LKFS that is used in EBU R128.'

From Wikipedia: ReplayGain:

'ReplayGain is a proposed standard published by David Robinson in 2001 to measure the perceived loudness of audio in computer audio formats such as MP3 and Ogg Vorbis. It allows players to normalize loudness for individual tracks or albums. This avoids the common problem of having manually to adjust volume levels between tracks when playing audio files from albums that have been mastered at different loudness levels. ReplayGain is now supported in a large number of media players and portable media players and digital audio players. Although the standard is now formally known as ReplayGain, it was originally known as Replay Gain and is sometimes abbreviated RG.'

From Poll: Is 3 dB, 6 dB or 10 dB SPL double the sound pressure?, an interesting article that discusses the difference between "volume/amplitude" increase and "loudness" perception increase, with this rule of thumb:

'Doubling of the volume (loudness) should be felt by a level difference of 10 dB − acousticians say.
Doubling the sound pressure (voltage) corresponds to a measured level change of 6 dB.
Doubling of acoustic power (sound intensity) corresponds to a calculated level change of 3 dB.

+3 dB = twice the power (Power respectively intensity − mostly calculated).
+6 dB = twice the amplitude (Voltage respectively sound pressure − mostly measured).
10 dB = twice the perceived volume or twice as loud (Loudness nearly sensed − psychoacoustics).'

From Wikipedia: The Loudness War: An excellent discussion of the most of the issues concerning loudness measures, and comparisons over the decades, and remarks on dynamic range advocacy by engineer Ian Shepherd (see resources above).

A fantastic 6-part series by Hugh Robjohns on Sound-on-Sound from 1998 (but still relevant): start at All About Digital Audio, Part 2, it has links to the other parts of the series. Everything you ever wanted to know about quantization, metering, headroom, and dither.

From Dennis Bohn, Rane Corporation, 2008/2012 why there No Such Thing as Peak Volts dBu:

'It is incorrect to state peak voltage levels in dBu. It is common but it is wrong.

It is wrong because the definition of dBu is a voltage reference point equal to 0.775 Vrms (derived from the old power standard of 0 dBm, which equals 1 mW into 600 Ω). Note that by definition it is an rms level, not a peak level.'

From Normalized Audio and 0dBFS+ Exposure (2012) by Greg Ogonowski 2012:

'Because an analog-to-digital converter or sample rate converter sample clock generally has an arbitrary time relationship to a given piece of program material applied to its input, the same audio can be represented in an infinite number of ways if correctly dithered before the quantizer. Many CDs produced today are normalized to 0dBFS in the digital domain by digital signal processing that is not oversampled and is thus unaware of the peak values of the waveform following playback device D/A converters. Following reconstruction into the analog domain, the peak level of the audio waveform can exceed 0dBFS, a phenomenon commonly known as “0dBFS+,” “intersample peak clipping,” or “true peak clipping.” If the digital-to-analog converter in a consumer playback device does not have 3dB of headroom (3dB being the maximum possible increase in peak level if the reconstruction filter is phase-linear), the converter can produce massive clipping and aliasing distortion components on top of any distortion components introduced by the digital signal processing. Add these to the artifacts produced by the MP3/AAC encode/decode process and it is no wonder that much of today's aggressively mastered music sounds so unpleasantly distorted.

What is particularly pernicious is that if mastering engineers monitor their work through converters having the required 3dB of headroom and do not use meters that show intersample peaks, these engineers will be completely unaware of the additional distortion that many consumer playback devices will produce. Mastering engineers who do not use intersample peak meters are therefore likely to process more aggressively than would if they were able to hear the additional distortion introduced by poorly designed playback components.

Over-processed audio simply creates bad sound. Bad sound in, more bad sound out. It is really no wonder at all why it is so difficult to make radio stations and netcasts sound good with modern material, because it is all grossly pre-distorted! We are pleased to note that in the last year or so, the mastering community has finally started to become more aware of the intersample peak problem, but we are still seeing many major-label CDs that produce intersample peaks above 0 dBFS.'

TIP: Audio Studio Recording: Mastering and Gain Structure: interactive graphic with different meter reference levels.

From an excellent series of audio production tutorials from The Tenth Egg: Production Tip 6 : Preparing a mix for Mastering:

'One of the most frequent questions we get from new clients is how best to prepare their mixes for mastering and what format they should supply them in. Whether you intend to use a mastering service like ours (www.tenthegg.co.uk/mastering), tackle it yourself, or just want to keep a copy for archive there are a couple of simply steps to ensure that the mix you have is fit for the job.

1. Bit Depth and Sample Rate

Though a standard Audio CD can reproduce only 16Bit 44.1kHz digital audio it makes sense to work at the best resolution possible throughout the recording, mixing and mastering stages to ensure maximum quality of the end product. Most soundcards, software packages and hardware recorders now support 24Bit 96kHz recording and while there is still some debate about the benefits of higher sample rates most engineers would agree that 24Bit is the way to go. When it comes to mixing down even if you’ve recorded at 16Bit then there are still benefits to bouncing down your mix at 24Bit. The combination of multiple 16Bit elements will most likely have created a signal with a greater dynamic range. At 24Bit the low level detail, which will be brought up during mastering, will also be more faithfully reproduced. Regarding sample rates, your best bet is to mix down at the same resolution as you recorded. There won’t be any benefit from selecting a higher sample rate and the resulting conversion and re-conversion at the mastering stage may affect quality. If you’re not sure then opt for 44.1kHz.

2. File type

There are often lots of options here (wav, aif, SDII etc.) and most mastering houses will be able to work with whichever format you provide. But for maximum compatibility we would recommend a .wav (broadcast wave) file. Certainly you should try to avoid compressed formats such as MP3 or AAC, but if you’re forced to work in one of these then try and use a data rate of at least 256kbps. Often you will also be presented with the option of ‘split’ or ‘interleaved’ stereo files, with ‘interleaved’ being the preferred option.

3. Headroom

A degree of headroom (the gap in level between the maximum possible and that of the audio) is very important. If a signal clips at any point, even if distortion is inaudible on mixdown, it can become evident during mastering and will limit the processing options. Generally all that is needed is to pull the master fader down so that the meters no longer jump into the red at any point.

If you’re mixing down at 24Bit then you can safely leave as much as 3dB headroom. If you’re working at 16Bit then you’re going to want to maximise dynamic range so closer to 0.5dB is recommended.

4. Mix processing

Most engineers like to add a touch of overall compression, EQ and maybe even limiting when they mix down. This essentially goes some way towards creating that mastered sound and can help mixes sound lounder and play better across a range of audio systems. However, this kind of processing can again create problems at the mastering stage, especially if they have been overdone. If possible all overall mix processing should be avoided in the copy destined for mastering, as these processes can be better applied using the specialist equipment and experience available to the mastering engineer. Certainly they should be free from limiting, which can have a similar effect to clipping. If overall processing must be applied then it should be done as conservatively as possible, avoiding large EQ cuts or boosts and compression gain reduction of more than 3dB.

5. Burning questions

If you’re mastering the mixes yourself then job done, you’re ready to master. But if you’re passing them on to a mastering house then you’re probably going to need to burn a disc. You can’t go far wrong here, just ensure that you burn a Data CD rather than Audio CD or all your mixes will get converted down to 16Bit 44.1kHz and will need to be re-ripped at the other end. When burning your disc be sure to use one of the write speeds recommended on the disc to avoid data errors. Also try to avoid touching the surface of the disc before and after burning and refrain from using the disc more than once to verify its contents before sending it off.

Summary

Now if you’re relatively new to music production then all that might sound a bit daunting. But don’t worry, even if you aren’t able to meet every criteria in this list that doesn’t mean that mastering can’t make a massive difference to your mixes. Most mastering studios, including ours (www.tenthegg.co.uk/mastering) can help talk you through the best options for your particular project and help you prepare your mixes. What our recommendation represents is an ideal format that will maximise the benefits of the mastering process. i.e. a 24Bit .wav file at whichever sample rate your recorded with around 2dB headroom and free from overall mix processing.'

From Bob Katz on Digital Domain: Keeping Your Digital Audio Pure from First Recording to Final Master: everything you ever wanted to know about dithering, and the cost of cumulative dithering, and the cost of not dithering.

From Sound on Sound: MASTERING MASTERS: CD Mastering On Your PC: Tools & Techniques (2001): including more on why you need to dither when mastering down from 24-bit to 16-bit CD quality.

From Sound on Sound by Paul White, Feb 1999: 20 Tips On Home Mastering

Turn me up ! Bringing dynamics back to music., including Loudness War - The Movie.

A final word on the "loudness war" and why it matters

In Australia the loudness war has clearly been won by the retailer Harvey Norman who has the loudest and most annoying TV ads in the history of the world (not that I watch much commercial TV). They seem to have discovered a magic "penetration and annoyance" factor that is mixed into their also very visually loud ads. (I therefore refuse to shop in their shops ever, because they completely spoil any attempt to enjoy a movie on commercial TV in Australia.)
Syndicate content