A summary of a review of music levels for broadcasting, personal use, recording and mastering, including the new LOUDNESS measures

This page started because I began reading recently about the new(ish) loudness measures and standards, especially those of the European Broadcasting Union (EBU) (I did not examine the slightly lower US SMPTE recommendation in depth). From EBU: Loudness:

'In August 2010, the EBU published its Loudness Recommendation EBU R128. It tells how broadcasters can measure and normalise audio using Loudness meters instead of Peak Meters (PPMs) only, as has been common practice.

..

-23 LUFS

Basically EBU R128 recommends to normalize audio at -23 LUFS +/- 1 LU, measured with a relative gate at -10 LU. The metering approach can be used with virtually all material. To make sure meters from different manufacturers provide the same reading, EBU Tech 3341 specifies the 'EBU Mode', which includes a Momentary (400 ms), Short term (3s) and Integrated (from start to stop) meter. Already more than 60 vendors have reported to support 'EBU Mode' in their products.'

Now I am not a broadcaster, but my review of this matter of loudness sent me on a very interesting trip right back to the fundamentals of analog and digital audio engineering and levels, and I attempt to share that journey here, including some examples of dBFS and LUFS statistics processing with some free tools for Mac OS X.

I include some tips and research links on how these loudness measures relate to metering, recording and mastering levels, and how to react to the broadcasting loudness measures and recommendations pragmatically, namely in advance:

- The "best" level(s) for digital recording are different from the best levels for digital delivery, and depend on whether you will use your recorded resources to be mixed with other music, or as an end mix (or for simple capture), and they also depend critically on what devices your end mixes and masters will be played on (served via), and to some extent also on the chosen audio format.

- Levels and loudness considerations for mastering are very different from levels for recording live music, and from levels appropriate for preparing music collections for playing on personal music devices (as opposed to broadcasting) may be different again.

- There is a consensus that the -23 LUFS European Broadcasting Union (EBU) standard is fine for some media (TV, radio etc.) but not at all appropriate for personal music playing devices such as iPods, mobile/smart phones etc, where pushing it a good deal louder is handy.

- The recently refined EBU (and SMPTE) measures of loudness are beginning to penetrate the world of Digital Audio Workstation (DAW) software, with new loudness meters already included by many audio software vendors.


Some background on audio levels

In order to understand my summary one needs to at least be familiar with the following:

- From Wikipedia: Decibel:

'The decibel (dB) is a logarithmic unit used to express the ratio between two values of a physical quantity (usually measured in units of power or intensity). One of these quantities is often a reference value, and in this case the dB can be used to express the absolute level of the physical quantity.

The number of decibels is ten times the logarithm to base 10 of the ratio of the two power quantities.

A change in power by a factor of 10 is a 10 dB change in level. A change in power by a factor of two is approximately a 3 dB change. A change in voltage by a factor of 10 is equivalent to a change in power by a factor of 100 and is thus a 20 dB change. A change in voltage ratio by a factor of two is approximately a 6 dB change.

..
The decibel unit can also be combined with a suffix to create an absolute unit of electric power. For example, it can be combined with "m" for "milliwatt" to produce the "dBm". Zero dBm is the level corresponding to one milliwatt, and 1 dBm is one decibel greater (about 1.259 mW).

In professional audio, a popular unit is the dBu (see below for all the units). The "u" stands for "unloaded", and was probably chosen to be similar to lowercase "v", as dBv was the older name for the same thing. It was changed to avoid confusion with dBV. This unit (dBu) is an RMS measurement of voltage which uses as its reference approximately 0.775 V RMS. Chosen for historical reasons, the reference value is the voltage level which delivers 1 mW of power in a 600 ohm resistor, which used to be the standard reference impedance in telephone audio circuits.'

..
In professional audio, equipment may be calibrated to indicate a "0" on the VU meters some finite time after a signal has been applied at an amplitude of +4 dBu. Consumer equipment will more often use a much lower "nominal" signal level of -10 dBV. Therefore, many devices offer dual voltage operation (with different gain or "trim" settings) for interoperability reasons. A switch or adjustment that covers at least the range between +4 dBu and -10 dBV is common in professional equipment.

..

dBFS (digital)

dB(full scale) – the amplitude of a signal compared with the maximum which a device can handle before clipping occurs. Full-scale may be defined as the power level of a full-scale sinusoid or alternatively a full-scale square wave. A signal measured with reference to a full-scale sine-wave will appear 3dB weaker when referenced to a full-scale square wave, thus: 0 dBFS(ref=fullscale sine wave) = -3 dBFS(ref=fullscale square wave).

dBTP

dB(true peak) - peak amplitude of a signal compared with the maximum which a device can handle before clipping occurs. In digital systems, 0 dBTP would equal the highest level (number) the processor is capable of representing. Measured values are always negative or zero, since they are less than or equal to full-scale. '

Now before proceeding any further, let's look at one very important point about decibels as applied to dBFS. The formula for calculating dBFS is equivalent to the formula for calculating dB relative to a voltage (not a power), so the formula is:

LdB = 10 * log(V2/V02) = 20 * log10(V/V0)

where V0 is the reference. That is, the digital amplitude is handled as a "field" value just like electrical voltage, and not like a sound pressure or power ! Here are some typical values rounded for some amplitude ratios:

1.0000 =   0.000 dB
0.5000 =  -6.021 dB
0.2500 = -12.041 dB
0.1250 = -18.062 dB
0.1000 = -20.000 dB
0.0625 = -24.082 dB
0.0100 = -40.000 dB
0.0010 = -60.000 dB

This gives us a golden rule of thumb for digital:

Increasing the number of bits by 1 doubles the number of available quantisations,
and thus corresponds to about 6dB increase in the dynamic range.

- From Wikipedia: dBFS: decibels relative to full scale for digital:

'0 dBFS is assigned to the maximum possible digital level. For example, a signal that reaches 50% of the maximum level at any point would reach -6 dBFS at that point, 6 dB below full scale. Conventions differ for RMS measurements, but all peak measurements will be negative numbers, unless they reach the maximum digital value.'

- From Wikipedia: RMS levels:

'Since a peak measurement is not useful for qualifying the noise performance of a system, or measuring the loudness of an audio recording, for instance, RMS measurements are often used instead.

There is a potential for ambiguity when assigning a level on the dBFS scale to a waveform rather than to a specific amplitude, since some choose the reference level so that RMS and peak measurements of a sine wave produce the same number, while others want the RMS and peak values of a square wave to be equal, as they are in typical analog measurements.'

- From Wikipedia: Dynamic range:

'The measured dynamic range of a digital system is the ratio of the full scale signal level to the RMS noise floor. The theoretical minimum noise floor is caused by quantization noise. This is usually modeled as a uniform random fluctuation between −1/2 LSB and +1/2 LSB. (Only certain signals produce uniform random fluctuations, so this model is typically, but not always, accurate.)'

Some other useful audio and sound engineering guides and resources concerning levels and metering include:

- Understanding & Measuring Digital Audio Levels by Glen Kropuenske, 2006 (PDF) this is an excellent introduction to levels with some nice comparison graphics and discussion of digital vs analog:

'dB or decibels

Audio signal or sound levels are measured using a decibel (dB) system. The dB system is used to compare two levels or a change in signal voltage or power. One dB is the level change that is just noticeable by most people. A 6 dB change is considered to be about twice the volume.

Sound signal level in dB can be considered either as a power or as a voltage. The level in decibels is 10 times the logarithm of the ratio of two power levels. Where P is the measured power in watts and P Ref. is a reference power in watts.

Sound signal level in dB can be considered as a voltage ratio. The level in decibels is 20 times the logarithm of the ratio of two voltage levels. Where V is the measured voltage and V Ref. is a reference voltage.

The resistance is assumed to be the same so calculations using either the power or voltage formula agree.'

'Units of Sound Level Measurement

Sound signal level is expressed using various dB units of measurement including:

- dBm: decibels or dB referenced to 1 milliwatt (.001 watt)

- dBu or dBv: decibels or dB referenced to 0.775 volt (dBu is more commonly used)

- dBV: decibels or dB referenced to 1 volt'

'VU Meters

The VU (volume unit) meter is another voltage measurement method for analog audio level measurement. The VU meter is a voltmeter with a response time designed to reflect the loudness of live audio as the ear would interpret the loudness. Relating VU measurement units to the other dB units of measurement for audio can only be done with a sine wave test tone. In a professional audio balanced system, 0 VU corresponds to +4 dBu. You may also see 0VU as +4 dBm although this assumes 600 ohm balanced impedance. This is the only impedance in which 4 dBm equals 4 dBu'

'Analog vs. Digital Levels — the dBFS Scale

Digital audio levels are measured differently than analog audio levels. Yes, yet another and different dB system is used. The dB system in digital audio starts at the top and defines the loudest sound level that is to be digitized. This top or full scale view of the audio levels results in a full scale or "FS" system of dB measurement.'

[ED: Warning: The following numbers do not all agree well with some diagrams or statements made by others quoted below.
Also, it does not state for whether 16-bit or 24-bit (assumed).]

'A 0 dBFS measurement unit is to be the highest audio level. Assuming this is to be at the highest audio level before clipping occurs, this corresponds to an analog level of 24 dBu. Therefore, 4 dBu (dBu =dBv) is the same as - 20 dBFS or 0 VU.

While this is generally accepted as the range of digital audio, it is not a hard standard. When digital audio values are converted back to analog, some digital audio equipment provides level selections to shift the analog output levels of 0 VU to -18 dBFS or -14 dBFS. Lowering the dBFS relationship increases the audio sound levels output from the D/A converter.''

Some explanations with reference to standards are provided by Hugh Robjohns, technical editor of Sound on Sound, in Q. What are the reference levels in digital audio systems?, from which I borrow diagrams for EBU R68 (top) and SMPTE RP155 (below):


So what does this all mean for recording and mastering ?

Let's start getting into some concrete tips for recording (and compare them with mastering). I figure the Final Cut Pro people would know what they are talking about, and their recommendations agree well with the other tips and diagrams I provide below. From Final Cut Pro7: User Manual: About Audio Meters:

'There are several common digital levels used to correspond to 0 dB on an analog [VU] meter:

-12 dBFS: This level is often used for 16-bit audio such as DV audio, and for projects with compressed dynamic ranges, such as those for television or radio.

-18 or -20 dBFS: This level is more common on projects with higher dynamic range, such as professional post-production workflows using 20- or 24-bit audio.'

- And similarly from Final Cut Pro: Understanding Audio Meters :

'As a general guideline, if you are working with 16-bit audio, you should set your audio level around -12 dBFS. If you are working with 20- or 24-bit audio, you should set your audio level around -18 or -20 dBFS.'

- From Audio Metering Introduction: Audio Geek Zine:

'VU

Mic preamps, converters, hardware effect processors are all designed to work optimally at 0 VU. They can usually handle more than that before distorting, but 0 VU is where the signal to noise is best. VU stands for Volume Unit and is the oldest analog metering system. VU meters are relatively slow moving with at 300ms response time. This slow response of a VU meter better represents an averaged volume level close to how our hears work. 0VU is equal to +4dBu or professional line level.

dBu

The dBu scale measures the analog voltage level in our equipment with 0dBu calibrated to about 0.775 Volts. The u in dBu stands for ‘unloaded’ which means that the voltage is measured with a zero resistance load. Again, 0VU or +4dBu is the ideal constant voltage of all your analog components in the recording and monitoring chain.

Here’s an example chain – microphone, mic preamp, compressor, audio interface line input, Analog to digital converter, recording software.

The microphone signal gets boosted up to line level by the preamp. Line level goes into and out of the compressor into the audio interface. The analog to digital converter assigns bits representing the voltage coming in and sends the data to your DAW.

Digital Meters

Once it’s in your DAW the level you see will not be 0 on your track meters, it will actually be closer to -18dBfs depending on the calibration. This may seem like a really low level but this is actually the optimal level that all the analog components that come before it.

Once you build up your song with several other tracks, you’ll be happy you have that extra headroom and lower noisefloor.

0VU = +4dBu = -18dBFS: This is the only thing you need to remember
[ED: it is assumed he is talking about EBU Digital 24-bit, the equivalent for 16-bit digital would be -12dBFS, and for SMPTE digital 24-bit it's -20dBFS.]

dBFS

The dBFS meters show Decibels relative to full scale. Instantaneous digital levels below the 0dBFS absolute peak. When 3 consecutive samples pass 0 the clip light will come on.

dB RMS

Now what’s left is RMS metering. Some DAWs have this in addition to Peak metering on the master. Similar to how VU meters work, RMS meters show an average level. The RMS value relates to how loud a sound is perceived.

These days all music is mastered to peak just below 0dBFS, the unwritten standard is -0.3,
but the song with the higher RMS level will appear to be louder.

There isn’t a widespread calibration standard for RMS metering so you’ll have to compare values from a few references to what you’re working on.'

[ED: this mastering recommendation of -0.3dBFS is relatively high compared with some other recommendations.]

There is a fantastic resource at Audio Studio Recording: Mastering and Gain Structure that enables you to compare live dBFS mappings for different calibrations, along with excellent explanations of all aspects of every metering scale:

'dBFS meter description

dBFS meters are either hardware- or software-based digital meters that can run anywhere from - 40dB to - inf (- infinity) on the low end, but invariably end at 0dB on the high end. Color schemes for these meters vary (especially on the software versions) but typically turn red at or near the - 3dB to 0dB range at the top of the scale.

Many dBFS meters include a single LED or other type or illuminated indicator usually labeled either "OVER" or "CLIP".

dBFS meters display digital levels

dBFS meters visually indicate signal levels as defined by the values of the digital samples of an analog signal that as been converted to digital data. The top of the meter (0dBFS) indicates a digital value where all the bits of a digital sample have a value of 1. A digital value of all 1s is, by definition, the highest possible value that can be represented in a binary digital form. There is nothing louder than a digital value of all 1s. Therefore 0dBFS (the top of the meter) represents the maximum possible volume on any digital signal.

Note that this 0dBFS maximum is true regardless of the digital word length (a.k.a. bit depth) used. Whether we are recording at an 8-bit, 16-bit, or 24-bit word length doesn't matter here; as long as every bit in a sample has a value of 1, it will translate to 0dBFS on the meter.

dBFS meter calibration

dBFS meters do not directly represent analog voltages or signal levels, they provide a graphic representation of binary digital values only. As such, any correlation between analog levels and digital values is determined by the calibration of the analog-to-digital converter (ADC) circuitry in the recording signal chain.

Unfortunately there is no definitive standard of conversion in ADCs for converting from dBu to dBFS; it varies from brand to brand, model to model, even country to country. A pro-grade line level of +4dBu can typically equate to anywhere from -12dBFS to -20dBFS on the digital scale, depending on the individual ADC's calibration. Some ADCs even have switches on them offering multiple calibration settings.

There are many quality ADCs that convert +4dBu to -18dBFS as a default. For this reason, this is what many engineers quote as the conversion factor, and it is also the default display setting for our meter on the left. But the number of ADCs that do or can equate +4dBu to a different digital level than that are at least as numerous as those that equate it to - 18dBFS, so we need to check the specs on our ADCs to ensure we are using the right calibration standard for our recording.

[ED: one can use their cool Analog to Digital Conversion Calculator on the left in their page to see live how the dBFS meter levels can change based upon the calibration of a converter.]

More bits means more range

Because 0dBFS is the absolute top of the digital scale regardless of the number of bits used in our digital samples, the number of bits used does instead matter towards the bottom end of the dBFS scale. The more bits we have, the lower of a volume we can digitally represent, and the greater of a dynamic range we have to work with in the digital domain.

This range can be calculated by multiplying the number of bits by 6dB. Therefore 8 bits gives us a maximum range from 0dBFS to - 48dBFS. 16 bits will go from 0dBFS down to - 96dBFS, and 24 bits from 0dBFS down to - 144dBFS.

The "CLIP" or "OVER" indicator

Many dbFS meters include a separate indicator labeled "CLIP" or "OVER". This lights up when the meter "believes" that the incoming analog signal may have been higher than the digital 0dBFS. Because in the digital realm there can be nothing higher than 0dBFS, anything analog coming in higher than that is simply "clipped off" at 0dBFS during the conversion to digital. The "CLIP" or "OVER" indicators warn us when that clipping may be happening.

Because there is nothing above 0dBFS, the only way a meter can determine if clipping is occurring is by looking for consecutive samples of 0dBFS, the assumption being that a flattened waveform with a flat top of more than one sample in a row at maximum value most probably means that the top of a normal waveform has been clipped off.

Unfortunately here again there is no standard. Some clip lights are programmed to light up as soon as a single sample hits 0dBFS. Others wait for three consecutive 0dBFS samples to confirm that a real clip has taken place before lighting up. There are even others that will wait for as long as 8 consecutive samples before lighting up on the theory that shorter clips than that cannot be heard.'

'+4dBu "Pro" Line Level

In order for different pieces of analog audio gear to be able to properly send signals to each other without those signals being too weak or too strong for any given piece of gear, all such gear is designed to operate at a standard "line level".

"Line level" refers to the average signal voltage at which the standard line inputs and outputs of most of our audio gear is designed to operate. For this reason, the average-reading VU meters on most audio processing gear are calibrated so that a reading of 0VU indicates a line level voltage.

"Pro" line level

Most professional-grade and prosumer audio recording gear is designed to operate at a standard line level of +4dBu (~1.23 volts). However, some gear have switches or circuitry on them that let the user select between a "pro" line level of +4dBu and the "consumer" line level of -10dBV (approx. -7.8dBu or ~0.32 volts.)

Un-level playing fields
Because of the huge difference between "pro" and "consumer" line levels - "pro" line level is almost 4 times the voltage as "consumer" line level - It's important to know at which level your gear operates.

If you run a -10dBV "consumer" signal into a +4dBu "pro" input, the signal will be running almost 12dB lower than expected; having to boost the input that extra 12dB will also increase the noise level of the signal by almost 12dB.

Conversely, running a +4dBu signal into a -10dBV input will be inputting a signal almost 12dB hotter than expected, potentially cutting the amount of peak headroom in the device and opening up the possibility of extra signal distortion.'

I am going to offer you also for your reference during this discussion this excellent (respectively borrowed) summary image of analogue and digital levels; don't be overwhelmed, for this summary I will focus on digital. It is from a highly recommended blog article Digital Recording Levels - a rule of thumb from 2009 by ZedBee". [ED: I can't hotlink to the image, and borrowing it here uploaded is a hopefully forgiven breach of copyright for educational purposes, do please read the original article too.]


Getting a hold on the new Loudness measures

Ok, so let's examine some of the new human-perception-based loudness measures (as opposed to sound pressure dB measures, voltage dB measures, or digital dB measures). I highly recommend you watch firstly this absolutely brilliantly clear screencast video tutorial with live examples by Ian Shepherd from Production Advice UK. He knows exactly what he is talking about, and shows you in RMS/peak on old VU meters and on the new LUFS loudness meters with a real music project (then do please come back here):

- YouTube: LUFS - the new Loudness Units. What do they mean ?

- LUFS, dBFS, RMS… WTF ?!? How to read the new loudness meters (this includes a Pink Noise download so you can compare with his results).

Alright, so you get that there are different kinds of Root Mean Square (RMS) measures for music/sound, and that one has to be careful comparing them, but basically LUFS is like RMS but adjusted for human perception. It is also comparable with relative dB units (but always on the LUFS scale). So adjustments in dB will give a similar adjustment in LU. This means, that even if you don't yet have an official LUFS monitor, you can get a feel (only) using just RMS measures.

Let's explore some RMS and Peak dBFS stats first

My software tips here are all Mac specific (currently Mountain Lion 10.8.5) , but some of these tools are also available for other UNIX/Linux machines, and some even run on Windows (how about that).

I recommend that with some of your audio files and with the Ian Shepherd pink noise test WAV file you explore some stats. Visit also: Audio engineering test/sample file resources, and online generators and online audio tests.

Audacity is a free, open-source, cross-platform audio editor for Mac, GNU/Linux Windows etc. It's not the world's best audio editor (especially not for MP3 or AAC because it imports, processes, then reexports with a tiny quality loss rather than say direct MP3 editing), but it has lots of plugins and FX and is sufficient for experiments. Internally it works in 32-bit floating point LPCM at up to 96kHz.

There is an unofficial Wave Stats plugin for Audacity that performs excellent wave analysis over regions of about 30s length, which is enough for you to explore the difference between dBFS RMS and max peaks, and to get a feel on some different audio files. Typical output from the plugin:

Another useful audio analysis tool is ffmpeg used in command line mode. It is available on Mac using MacPorts. I got it working OK on Mac OS X Mountain Lion, but you should at least be a bit UNIX savvy to try this. You will also need the LAME MP3 Encoder port if you want to deal with MP3.

$ sudo port install ffmpeg

$ sudo port install lame

(As always with MacPorts, don't be scared to use that -f (force) option if you upgraded your os recently !)

Store the following in a file at ~/bin/@ffmpeg-statistics:

#!/bin/bash
ffmpeg -i "$1" -filter:a "volumedetect" -vn -f null /dev/null

Make sure you make it executable with:

$ chmod +x ~/bin/@ffmpeg-statistics

And do when running it on a file, do "quote" your audio file name if it contains any spaces:

$ @ffmpeg-statistics "my chill music audio file.mp3"

The statistics output, for a run on a 128kbps MP3 chill music file, is like:

Duration: 00:05:32.43, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s
Output #0, null, to '/dev/null':
Metadata:
..
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (mp3 -> pcm_s16le)
..
size=N/A time=00:05:32.43 bitrate=N/A
video:0kB audio:57263kB subtitle:0 global headers:0kB muxing overhead -100.000038%
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] n_samples: 29318494
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] mean_volume: -17.1 dB
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] max_volume: -1.4 dB
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_1db: 12
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_2db: 583
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_3db: 9111
[Parsed_volumedetect_0 @ 0x7f8fa8412d00] histogram_4db: 42170

There you have it, an indication of whether anything clipped (it didn't, max is negative and less than 0dBFS) and what your mean volume (RMS) is without even loading the audio file in an editor.

You might also understand why I wanted you to see the FFmpeg basic RMS and Peak stats processing before we examine its EBU R128 filter (later), because the RMS and Peak stats tell us important things - like whether we clipped at all, that we need to know anyway.

Here is the result on the Pink Noise WAV file from the Ian Shepherd LUFS tutorial:

Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'Pink_Noise-production-advice-lufs-dbfs-test.wav':
Metadata:
artist : Fred Nachbaur
Duration: 00:00:10.00, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Output #0, null, to '/dev/null':
Metadata:
artist : Fred Nachbaur
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s16le, 44100 Hz, mono, s16, 705 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le)
Press [q] to stop, [?] for help
size=N/A time=00:00:10.00 bitrate=N/A
video:0kB audio:861kB subtitle:0 global headers:0kB muxing overhead -100.002494%
[Parsed_volumedetect_0 @ 0x7ffe53000000] n_samples: 441000
[Parsed_volumedetect_0 @ 0x7ffe53000000] mean_volume: -14.7 dB
[Parsed_volumedetect_0 @ 0x7ffe53000000] max_volume: -2.1 dB
[Parsed_volumedetect_0 @ 0x7ffe53000000] histogram_2db: 13
[Parsed_volumedetect_0 @ 0x7ffe53000000] histogram_3db: 111
[Parsed_volumedetect_0 @ 0x7ffe53000000] histogram_4db: 566

The max_volume and mean_volume values agree well the Peak and RMS values seen in the tutorial video meters.

Let's compare the FFmpeg stats with the Wave Stats plugin for Audacity applied on the same Pink Noise test:

The peak and RMS stats agree exactly with FFmpeg !

FFmpeg command line is very handy and fast, it's nice to not always have to load files in an editor and it is very useful for batch runs over many files (with some simple UNIX Bash shell scripting).

But I found you have to be a bit careful with it. FFmpeg tries to detect the input format, and bit rate or bit depth, but unless you explicitly give an output format, it will assume what it calls 16-bit "pcm_s16le" as output format (which in the command form above gets thrown away anyway). It makes no difference to the stats calculated on the input file, but for example say we are examining a 24-bit pink noise file, then it might be better used thus:

ffmpeg -i PinkNoise-10mins-24bit-48kHz.aiff -acodec pcm_s24le -filter:a "volumedetect" -vn -f null /dev/null

This gives more sensible input and output format identification and mapping:

Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, aiff, from 'PinkNoise-10mins-24bit-48kHz.aiff':
Duration: 00:10:00.00, start: 0.000000, bitrate: 2304 kb/s
Stream #0:0: Audio: pcm_s24be, 48000 Hz, stereo, s32, 2304 kb/s
Output #0, null, to '/dev/null':
Metadata:
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s24le, 48000 Hz, stereo, s32, 2304 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s24be -> pcm_s24le)
size=N/A time=00:10:00.00 bitrate=N/A
video:0kB audio:168750kB subtitle:0 global headers:0kB muxing overhead -100.000013%
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] n_samples: 57600000
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] mean_volume: -24.2 dB
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] max_volume: -12.0 dB
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_11db: 525
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_12db: 8270
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_13db: 38442
[Parsed_volumedetect_0 @ 0x7f9fa2c15220] histogram_14db: 131459

It has correctly detected the input file as 24-bit 'pcm_s24be', and it now has a pseudo (discard) output also at 24-bit.

However, I found that FFmpeg failed to detect the bits within the format of a 24-bit FLAC file:

Input #0, flac, from 'PinkNoise-10mins-24bit-48kHz.flac':
Duration: 00:10:00.00, bitrate: 2305 kb/s
Stream #0:0: Audio: flac, 48000 Hz, stereo, s32
Output #0, null, to '/dev/null':
Metadata:
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s24le, 48000 Hz, stereo, s32, 2304 kb/s

And I also don't understand why in the above examples it mentions 's32' in the stream format..

But for the sake of discussion of levels and loudness, the RMS and peak stats make sense and are consistent, so let's move on, this article is not supposed to be a tutorial on FFmpeg. For more on FFmpeg for audio, including description of formats, visit also: FFmpeg: command line and GUI audio/video conversion tool: audio references

We simply note for now that we have to be careful when comparing wave statistics between 16-bit and 24-bit sample depths.

Ok, so far we have looked at old RMS and Peak, but what about LUFS loudness stats ?

Unfortunately, as far as I can tell Audacity does not yet support LUFS, but there is apparently already a plan to change/enhance the VU Meter to conform to the EBU Standard R128.

But FFmpeg does now offer an EBU R128 audio filter as of at least version 2.0.2. To see whether it is available for your version use:

$ ffmpeg -filters | grep -i r128

ebur128 A->N EBU R128 scanner.

You can then perform loudness measurement runs like this (for example, on our pink noise WAV sample):

ffmpeg -i Pink_Noise-production-advice-lufs-dbfs-test.wav -filter:a "ebur128" -vn -f null /dev/null

Output (with most scan lines removed) is:

Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'Pink_Noise-production-advice-lufs-dbfs-test.wav':
Metadata:
artist : Fred Nachbaur
Duration: 00:00:10.00, bitrate: 705 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s
Output #0, null, to '/dev/null':
Metadata:
artist : Fred Nachbaur
encoder : Lavf55.12.100
Stream #0:0: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le)
Press [q] to stop, [?] for help
[Parsed_ebur128_0 @ 0x7f9452c2d880] t: 0.0999792 M:-120.7 S:-120.7 I: -70.0 LUFS LRA: 0.0 LU
[Parsed_ebur128_0 @ 0x7f9452c2d880] t: 0.199979 M:-120.7 S:-120.7 I: -70.0 LUFS LRA: 0.0 LU
..
Parsed_ebur128_0 @ 0x7f9452c2d880] t: 10.0003 M: -14.4 S: -14.4 I: -14.4 LUFS LRA: 0.1 LU
size=N/A time=00:00:10.00 bitrate=N/A
video:0kB audio:938kB subtitle:0 global headers:0kB muxing overhead -100.002292%
[Parsed_ebur128_0 @ 0x7f9452c2d880] Summary:

Integrated loudness:
I: -14.4 LUFS
Threshold: -24.4 LUFS

Loudness range:
LRA: 0.1 LU
Threshold: -34.4 LUFS
LRA low: -14.5 LUFS
LRA high: -14.3 LUFS

At -14.4 LUFS integrated loudness, the pink noise is way above the EBU R128 broadcast standard of -23 LUFS.

LUFS meters for Audacity

As far as I can tell there is no LUFS meter available specifically for Audacity, however I found the following FREE from Klangfreund: LUFS Meter:

'EBU R128 compliant loudness measurement

The LUFS Meter plugin enables you to deliver loudness-calibrated content.

Multi-Platform, Multi-Format

Available as VST- and Audio Unit-plugin on Mac. On Windows, the LUFS Meter is available as a VST-Plugin. 32 and 64 bit. Support for Linux and other plugin formats is planned.

http://www.klangfreund.com/lufsmeter/download/'

Please note that for Audacity you just use 32-bit version, as Audacity does not support 32-bit VST plugins !

I managed to get it to run (preview) our pink noise test file within Audacity:

But it kept crashing Audacity whenever I clicked the Ok button !

Visit also:

- Mac OS X: EBU R128 compliant loudness meters and batch processing

- Mac OS X: audio engineering plugins

Playing with the loudness: normalization, amplification, attenuation

Audacity has a nice enough Normalize function under Effects, but it only works in terms of the maximum (peaks), it does not let you set RMS values, and certainly nothing fancy like the new LUFS loudness measures.

There is a command line 'normalize' you can also install using mac ports:

sudo port install normalize

It seems to only run on WAV files, I could not get it to see MP3 files.

Let's investigate an MP3 file with chill music I have already normalised in Audacity to -2dBFS. The FFmpeg RMS stats run gives:

Parsed_volumedetect_0 @ 0x7faab2000000] n_samples: 20731486
[Parsed_volumedetect_0 @ 0x7faab2000000] mean_volume: -14.4 dB
[Parsed_volumedetect_0 @ 0x7faab2000000] max_volume: -1.5 dB
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_1db: 23
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_2db: 741
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_3db: 9246
[Parsed_volumedetect_0 @ 0x7faab2000000] histogram_4db: 63090

Clearly the peak normalisation in Audacity to -2dBFS was not perfect, as the maximum is -1.5dB, but it is at least in the right ball park.

Compare with the LUFS runs:

Integrated loudness:
I: -12.2 LUFS
Threshold: -22.6 LUFS

Loudness range:
LRA: 3.0 LU
Threshold: -32.5 LUFS
LRA low: -14.1 LUFS
LRA high: -11.1 LUFS

The mean_volume was -14.4 dB, but the integrated loudness was -12.2 LUFS. This is also way above the EBU R128 broadcasting recommendation -23 LUFS; yet it works just brilliantly on my iPhone used as an iPod !

Changing the loudness in LU units to a target value

As a rule of thumb, the LUFS loudness can be adjusted in LU by making the same dBFS amplitude change in dB.

FFmpeg has a simple audio volume filter. See How to change audio volume up-down with FFmpeg:

'To turn the audio volume up or down, you may use FFmpeg's Audio Filter named volume, like in the following example. If we want our volume to be half of the input volume:'

ffmpeg -i input.wav -af 'volume=0.5' output.wav

However, this is not in dB, but we recall that halving is the same as reducing by 6dB. If we go back to my chill track with loudness 12.2LUFS, applying volume=0.25 should reduce the loudness by 12LU to about -24.2LUFS. Performing this adjustment, and rerunning the FFmpeg EBU R128 filter gives:

Integrated loudness:
I: -24.7 LUFS
Threshold: -35.0 LUFS

Loudness range:
LRA: 3.0 LU
Threshold: -45.0 LUFS
LRA low: -26.6 LUFS
LRA high: -23.6 LUFS

The rule of thumb has worked well enough, the integrated loudness is now -24.7 LUFS, the prediction based on -12dB reduction was -24.2LUFS.

Audacity has the ability to easily adjust the volume in dB units: see Amplify and Normalize.

So there it is, we have examined Peak and RMS statistics and LUFS loudness statistics for files and made reasonably accurate loudness adjustments using completely free Mac tools, including on the command line. I know there are now a range of much fancier LUFS tools available for Mac, but it's nice to know one can at least do it this way for nothing. Right, let's throw away that adjusted file, it's far too quiet for playing on my iPod !

Some recommended loudness levels for different applications

So time, for some basic recommendations.

As already illustrated, I am currently into "chill" music, and I want large collections of chill music that play at roughly the same "loudness" for a long time without having to adjust the volume (say if played through speakers while I am working at my computer, I don't want to have to frequently get up to adjust the volume).

And I want these collections to be usable on systems that do not use ReplayGain or Sound Check, the proprietary system for iTunes and iPod. (Besides, if I get it basically right without them I can always also use the same collections with those technologies as well.)

I have chosen my rule-of-thumb standard for "iPod preparation" of chill stuff in MP3 as -2dBFS peak in cases where the 'mean_volume' RMS (according to FFmpeg) is around -17dBFS to -14dBFS, which is about -15LUFS to -12LUFS loudness according to FFmpeg on this kind of music.

This is clearly much higher than the comparable -23LUFS European broadcasting standard. But remember, this is for playing on home devices, ipod, iphone etc.

I am perfectly aware that max peak values do not give a reliable indication of RMS values or LUFS loudness values, but I know in advance that the music I am treating in this case is quite compressed chill. I don't have the facility (yet) for automatically applying an LUFS loudness requirement in batch mode, whereas I can apply max peak normalisation easily, and in any case:

Just applying a high LUFS loudness measure (needed for say iPod) blindly does not ensure there is no clipping !

I find the resulting loudness range when normalising to -2dBFS max peak (for this kind of music) works well on quality headphones on my Macbook Pro, and on my iPhone headphones walking along the street, and just as well when played from an iPod via a mixer through my Opera DB Live powered speakers (yes I use a musician's PA at home instead of high quality "audiophile" speakers).

My rule-of-thumb, using simple peak normalisation, can be applied safely and reliably for lots of different kinds of chill and other music after performing a stats run (and I always do the FFmpeg stats run). Note that I leave still a little bit of room to play at the top end with -2dBFS.

Of course, many recent mastering standards/recommendations, especially for CDs, push it even higher, much closer to 0dBFS max peak, and far less dynamic range than was used in the past.

If you perform some measurements on a wide range of popular music ripped from CD you will find that there is typically a max peak range from as low as -5 dBFS (mostly from the 1980s) right up to 0dBFS and with much higher compression in recent years: See also this fantastic article The Death of Dynamic Range from Bob Speer of CD Mastering Services with some measurements and comparisons between decades, and this wise remark:

'You want your music to be loud? You can make it loud yourself [by TURNING UP YOUR STEREO'S VOLUME CONTROL] -- and the full quality and dynamic range of the music is preserved. .. But when all of your CDs are recorded to be loud right on the discs themselves, you don't have this choice anymore; you no longer have a variety of "loud" music and "quiet" music to choose from and to play at a volume level that suits your musical taste. The record companies are not only filling your CDs with distorted, corrupted audio, they are forcing you to listen to your music in a certain manner -- do you really want that?

Also from Bob Speer in 2001 comes the wonderful What Happened To Dynamic Range?, with this wonderful animation:

Again: My -2dBFS max peak tip (for chill music) is not a recommendation for a professional TV or radio broadcaster or a movie theatre; It is for a personal music collection to be played through a range of devices at home or on the go that will likely find the -23LUFS European broadcasting standard way too low.

So what about live music recording levels for multi-track ?

The recommendation above (-2dBFS peak) is clearly also not a suitable level for most digital recording of live music, and especially not for multi-track recording, where you will be likely reusing and altering a track in different contexts, combined with other tracks, and subjected to various FX and compression etc. Here is a recommendation for recording levels (based on some careful analysis of dynamic range ) from dBzee: Digital Recording Levels - a rule of thumb:

'The rule of digital thumb

1. Record at 24-bit rather than 16-bit.

2. Aim to get your recording levels on a track averaging about -18dBFS. It doesn't really matter if this average floats down as low as, for example -21dBFS or up to -15dBFS.

3. Avoid any peaks going higher than -6dBFS.

That's it. Your mixes will sound fuller, fatter, more dynamic, and punchier than if you follow the "as loud as possible without clipping" rule.'

Also:

'Most interfaces are calibrated to give around -18dBFS/-20dBFS when you send 0VU from a mixing desk to their line-ins. This is the optimum level!
-18dBFS is the standard European (EBU) reference level for 24-bit audio and it's -20dBFS in the States (SMPTE).'

I have found during my recent online research similar recommendations based on very precise analysis of noise features and the capabilities of 24 bit digital systems, typical converters, and above all, the capabilities also of typical analog to digital converters.

And another interesting discussion from Sound on Sound (SOS) Technical Editor Hugh Robjohns: Q How much headroom should I leave with 24-bit recording?:

'The basic idea is to treat -18dBFS as the equivalent of the 0VU mark on an analogue system’s meter, and that’s where the average signal level should hover most of the time. Peaks can be way over that, of course ..

If the material you are recording is well controlled and predictable in terms of its peak levels — like hardware synths tend to be, for example — you could legitimately reduce the headroom safety margin if you really want to. But in practice there is little point.

The only advantage to recording with less headroom is to maximise the recording system’s signal-noise ratio, but there’s no point if the source’s signal-noise ratio is significantly worse than the recording system’s, and it will tend to be that way with most analogue synth signals, or any acoustic instrument recorded with a mic in a normal acoustic space. The analogue electronic noise floor or the acoustic ambience will completely swamp the digital recording system’s noise floor anyway.

Recording ‘hot’, therefore, won’t improve the actual noise performance at all, and will just make it harder to mix against other tracks recorded with a more reasonable amount of headroom. One issue that comes up a lot is the confusion between commercially released media (CD, MP3, for example), which have no headroom margin at all (they peak to 0dBFS), and the requirement for a headroom margin when tracking and mixing.

Going back to traditional professional analogue audio systems, the practice evolved of recording signal levels that averaged around 0VU. OK, you could push things a few decibels hotter sometimes for effect with analogue tape, but a level of around 0VU was the norm, and that normally equated to a signal level of about +4dBu (VU meters are averaging meters and don’t show transient peaks at anything like their true level).

Analogue equipment is designed to clip at about +24dBu, so, in other words, the system was engineered to provide around 20dB of headroom above 0VU. It’s just that the metering systems we use with analogue don’t show that headroom margin, so we forget it’s there. Digital meters do show it, but so many people don’t understand what headroom is for, and so feel the need to peak everything to the top of the meter anyway. This makes it really hard to record live performances, makes mixing needlessly challenging and stresses the analogue monitoring chain that was never designed to cope with +20dBu signal levels all the time.

By recording in a digital system with a signal level averaging around -18 or -20 dBFS, you are simply replicating the same headroom margin as was always standard in analogue systems, and that headroom margin was arrived at through 100 years of development for very good practical reasons.

.. vworking with average levels of around -20dBFS or so is fine and proper, works in exactly the same way as analogue, and will generally make your life easier when it comes to mixing and processing.

The old practice of having to get the end result up to 0dBFS is a mastering issue, not a recording and mixing one. It is perfectly reasonable (after the mix is finished) to remove the (now redundant) headroom margin if that is what the release format demands.
..
A sensible headroom margin is essential when tracking, to avoid the risk of clipping and allow you to concentrate on capturing a great performance without panicking about the risk of ‘overs’. A similar margin is also required when mixing, to avoid overloading the mix bus and plug-ins (yes, I know floating-point maths is supposed to make that irrelevant, but there are compromises involved that can be easily avoided by maintaining some headroom!).

Once the mix is finished, the now redundant headroom can be removed, and that is a standard part of the mastering process for digital media like CD and MP3.'

So this is what I am basically doing when I go for -2dBFS max peak and around -17 to -14dBFS RMS (about -15LUFS to -12LUFS according to FFmpeg) for chill music end mixes. Play it through headphones on your Mac laptop or iPod or iPhone and you'll find out pretty quickly why. Most modern personal devices seem to benefit on playback from way more volume juice than the -23LUFS broadcast standard.

REMEMBER: preparing pre-recorded, pre-mixed music for playback on your personal playback devices (or capturing/stealing from computer audio sources like online radio streams) recording live music tracks, and mastering are completely different exercises !

Some more useful references on digital audio, quantization, and digital vs. analog levels

All About Digital Audio: Pt 2 by high Robjohns: excellent description of digital quantization and digital noise, from 1998, but still very relevant:

'When it comes to quantising the individual samples of an analogue audio signal, it turns out that our ears can easily hear very small errors in the measurements -- even down to tiny errors as small as 90dB or more below the peak level -- so we have to use a very accurate measurement scale. Figure 1 shows a few audio samples being measured against a very crude quantising scale simply to show the principles involved. Each level in the scale is denoted by a unique binary number -- in this case, three bits are used to count eight levels (including the base line at zero).

Some samples will happen to be at exactly the same amplitude as a point on the measurement scale, but others will fall just above or below a division. The quantising process allocates each sample with a value from the scale, so sometimes the quantised value is slightly lower than the true size of the audio sample, and sometimes slightly bigger. These errors in the description of a sample's size are called quantising errors and they are an inherent inaccuracy of the process.

When the digital data representing the quantised amplitude values is used to reconstruct samples for replay, some of those samples will be generated slightly louder or quieter than the original analogue audio signal from which they were derived -- they will not be entirely accurate. However, whether an audio sample falls on, above, or below a quantising level, and by how much a level is missed is essentially random -- and a random signal is noise. Consequently, quantising errors tend to sound like hiss -- white noise -- added to the original audio signal.

The only way to make quantising noise quieter is to reduce the size of the quantising errors, and the only way that can be done is by making the quantising intervals smaller -- in other words, by using a finer, more accurate scale for the measurements -- just like in the carpet example earlier. The errors will still be there, but if you choose small enough quantising intervals, the errors become vanishingly small, as does the hiss. However, finer gradations require more quantising levels, and so more binary digits are needed to count them.

If the number of quantising levels is doubled, the spacing between individual levels must be halved, and so the potential size of quantising errors must be halved as well. A doubling or halving (in terms of dBs) is 6dB; so every time the number of quantising levels is doubled, the hiss caused by quantising errors is reduced by 6dB. In binary counting, each extra bit added to the number allows twice the number of levels to be counted -- three bits can count eight quantising levels, four bits count sixteen, and five bits count 32 levels. This relationship gives us a handy rule of thumb to estimate the potential dynamic range of a digital system: For each extra bit used to count quantising levels, quantising noise is reduced by 6dB.

So, for example, an 8-bit system should have a dynamic range of 48dB, a 16-bit system (such as DAT and CD) should have a range of around 96dB, and a 24-bit system about 144dB.'

From Vincent Kars, 2012 The Well-Tempered Computer: 16 or 24 bits, explains exactly why it is better to record at 24bit:

1 bit=6 dB

SNR=6N+1.8 dB (N in bits) to be exact but for convenience sake, let’s use 6.

The loudest possible signal in digital audio (all bits are 1) is the reference, this is called 0 dBFS (dB Full Scale). All other measurements expressed in terms of dBFS will always be less than 0 dB (negative numbers). 16 bits will go down to -96 dBFS and 24 to -144 dBFS. In essence, 24 bits continue where 16 bits stops. It can resolve micro details 16 bits can’t.

Noise floor

The theoretical maximum signal-to-noise ratio in an analogue system is around 130dB. In practice 120 dB is a very good value. You can’t escape thermal noise

A couple of specs:

Benchmark ADC1 (24 bits 192 kHz) A/D THD+N, 1 kHz at -1 dBFS -102 dBFS, -101 dB, 0.00089%
Benchmark DAC1 THD+N: (w/-3 dBFS input) -107 dB, 0.00045%
Prism Orpheus AD (line in) THD+N -111dB (0.00028%, -0.1dBFS)

Yes 24 bit can capture those very soft tiny details 16 bit can’t but pretty soon you end in the noise floor of the equipment.

The big debate

You can find many debates on the internet about 16 vs. 24 In the pro world this debate has been settled, almost everybody is recording with 24 bits today. They have some very good reasons to do so ..

Also useful concerning levels and metering:

- Meter Madness: Understanding meters and what they're telling us..., By Mike Rivers (RecordingMagazine): Excellent reading, includes the history of VU meters, and the move to digital metering.

- Final Cut Pro: Setting Proper Audio Levels.

- The Well-Tempered Computer: Volume control. In general a super site for discussions on audio. This article compares volume control, quantization errors, and signal-to-noise for 16-bit digital, 24-bit digital, and analog. Excellent calculations and comparison tables, and explains why some audiophiles recommend controlling volume if possible with analog rather than digital (even with 24-bit) to keep noise down (unless you are using floating point digital).

Related: ESS Digital vs Analog volume control slides (PDF). Has excellent graphs in frequency domain of progressive volume reduction in a digital system, showing why it encourages noise, and why (as long as you have nice smooth analog volume control) audiophiles generally avoid digital volume control.

- Wikipedia: DBFS has the following to say on comparing dBFS with analog levels (compare with the graph above):

dBFS is not to be used for analog levels, according to AES-6id-2006. There is no single standard for converting between digital and analog levels, mostly due to the differing capabilities of different equipment. The amount of oversampling also affects the conversion with values that are too low having significant error. The conversion level is chosen as the best compromise for the typical headroom and signal-to-noise levels of the equipment in question. Examples:

- EBU R68 is used in most European countries, specifying +18 dBu at 0 dBFS

- In Europe, the EBU recommend that -18 dBFS equates to the Alignment Level

- European & UK calibration for Post & Film is −18 dBFS = 0 VU

- UK broadcasters, Alignment Level is taken as 0 dBu (PPM4 or -4VU)

- US installations use +24 dBu for 0 dBFS

- American and Australian Post: −20 dBFS = 0 VU = +4 dBu

- The American SMPTE standard defines -20 dBFS as the Alignment Level

- In Japan, France and some other countries, converters may be calibrated for +22 dBu at 0 dBFS.

- BBC spec: −18 dBFS = PPM "4" = 0 dBu

- German ARD & studio PPM +6 dBu = −10 (−9) dBFS. +16 (+15)dBu = 0 dBFS. No VU.

- Belgium VRT: 0dB (VRT Ref.) = +6dBu ; -9dBFS = 0dB (VRT Ref.) ; 0dBFS = +15dBu.

[ED: Warning: the above does not specify the digital bits, usually 24-bit applies here.]

The EBU R68 standard summary 2000 (PDF) makes this important statement:

'The EBU recommends that, in digital audio equipment, its Members should use coding levels for digital audio signals which correspond to an alignment level which is 18 dB below the maximum possible coding level of the digital system, irrespective of the total number of bits available.'

Note that this does agree with standard practice in many application domains for 24-bit, but it is not what many people recommend for 16-bit ! Look at the chart above from Zed Brookes again, and notice the Pro Reference Levels:

+4dBu = 0dBVU = 0VU = -12dBFS(16-bit) = -18dBFS(24-bit EBU) = -20dBFS(24-bit SMPTE)

Some more useful references on loudness, and the "new" European standards vs the USA standards

This one from the BBC is excellent, and at only 13 pages with good summaries well worth reading from top to bottom: White paper: Jan 2011: Terminology for Loudness and Level dBTP, LU and all that by Senior Research Engineer Andrew Mason, available as PDF download. It points out that:

'For broadcasting, there is one loudness measurement technique that we should know about. This has been relatively recently standardised by the ITU, and is known as Recommendation ITU-R BS.1770'

'The measurement uses a “K” weighting, so we have the subscript “K” for the quantity “L”. The
result is expressed in “LUFS” – Loudness Units relative to Full Scale. 1770 still refers to “LKFS”,'

'The 1770 algorithm is defined such that a stereo sine wave at 1kHz, at -18 dBFS, will have a
loudness level, LK, of -18 LUFS'

'Target level – the origin of “-23”

For the sake of a simple life, and reduced audience annoyance, EBU R 128 recommends that all
programmes be normalised to an average foreground loudness level of -23 LUFS. The figure
of -23 LUFS was chosen as the result of a careful study of broadcasting practice, dynamic range
tolerance, and the capabilities of different transmission technologies. Note that this value assumes
that gating is used in the measurement to prevent long pauses in a programme bringing down the
average loudness.'
..

'True Peak

The general shift away from quasi-peak metering towards loudness metering is complemented by
a move towards true peak metering as well. There are three “peak” metering terms that it might
be useful to clarify:

- quasi-peak – not really peak at all. Historically measured with a mechanical meter with controlled
rise and fall times, such as the well-known “PPM”. Now done in software for digital applications
using, for example, a 10ms integration time.

- sample peak – digital measurement of the highest sample value in the signal;

- true peak – digital measurement, interpolating between the actual samples in order to take account
of over-shoots that would occur later, with, for example, sampling rate conversion. Recommendation ITU-R BS.1770 includes an over-sampling true-peak meter.'

Wikipedia: Peak programme meter.

Wikipedia: Loudness monitoring

From LUFS/LKFS:

'Loudness, K-weighted, relative to Full Scale (or LKFS) is a loudness standard designed to enable normalization of audio levels for delivery of broadcast TV and other video. LKFS is standardized in ITU-R BS.1770. Loudness units relative to Full Scale (or LUFS) is a synonym for LKFS that is used in EBU R128.'

From Wikipedia: ReplayGain:

'ReplayGain is a proposed standard published by David Robinson in 2001 to measure the perceived loudness of audio in computer audio formats such as MP3 and Ogg Vorbis. It allows players to normalize loudness for individual tracks or albums. This avoids the common problem of having manually to adjust volume levels between tracks when playing audio files from albums that have been mastered at different loudness levels. ReplayGain is now supported in a large number of media players and portable media players and digital audio players. Although the standard is now formally known as ReplayGain, it was originally known as Replay Gain and is sometimes abbreviated RG.'

From Poll: Is 3 dB, 6 dB or 10 dB SPL double the sound pressure?, an interesting article that discusses the difference between "volume/amplitude" increase and "loudness" perception increase, with this rule of thumb:

'Doubling of the volume (loudness) should be felt by a level difference of 10 dB − acousticians say.
Doubling the sound pressure (voltage) corresponds to a measured level change of 6 dB.
Doubling of acoustic power (sound intensity) corresponds to a calculated level change of 3 dB.

+3 dB = twice the power (Power respectively intensity − mostly calculated).
+6 dB = twice the amplitude (Voltage respectively sound pressure − mostly measured).
10 dB = twice the perceived volume or twice as loud (Loudness nearly sensed − psychoacoustics).'

From Wikipedia: The Loudness War: An excellent discussion of the most of the issues concerning loudness measures, and comparisons over the decades, and remarks on dynamic range advocacy by engineer Ian Shepherd (see resources above).

A fantastic 6-part series by Hugh Robjohns on Sound-on-Sound from 1998 (but still relevant): start at All About Digital Audio, Part 2, it has links to the other parts of the series. Everything you ever wanted to know about quantization, metering, headroom, and dither.

From Dennis Bohn, Rane Corporation, 2008/2012 why there No Such Thing as Peak Volts dBu:

'It is incorrect to state peak voltage levels in dBu. It is common but it is wrong.

It is wrong because the definition of dBu is a voltage reference point equal to 0.775 Vrms (derived from the old power standard of 0 dBm, which equals 1 mW into 600 Ω). Note that by definition it is an rms level, not a peak level.'

From Normalized Audio and 0dBFS+ Exposure (2012) by Greg Ogonowski 2012:

'Because an analog-to-digital converter or sample rate converter sample clock generally has an arbitrary time relationship to a given piece of program material applied to its input, the same audio can be represented in an infinite number of ways if correctly dithered before the quantizer. Many CDs produced today are normalized to 0dBFS in the digital domain by digital signal processing that is not oversampled and is thus unaware of the peak values of the waveform following playback device D/A converters. Following reconstruction into the analog domain, the peak level of the audio waveform can exceed 0dBFS, a phenomenon commonly known as “0dBFS+,” “intersample peak clipping,” or “true peak clipping.” If the digital-to-analog converter in a consumer playback device does not have 3dB of headroom (3dB being the maximum possible increase in peak level if the reconstruction filter is phase-linear), the converter can produce massive clipping and aliasing distortion components on top of any distortion components introduced by the digital signal processing. Add these to the artifacts produced by the MP3/AAC encode/decode process and it is no wonder that much of today's aggressively mastered music sounds so unpleasantly distorted.

What is particularly pernicious is that if mastering engineers monitor their work through converters having the required 3dB of headroom and do not use meters that show intersample peaks, these engineers will be completely unaware of the additional distortion that many consumer playback devices will produce. Mastering engineers who do not use intersample peak meters are therefore likely to process more aggressively than would if they were able to hear the additional distortion introduced by poorly designed playback components.

Over-processed audio simply creates bad sound. Bad sound in, more bad sound out. It is really no wonder at all why it is so difficult to make radio stations and netcasts sound good with modern material, because it is all grossly pre-distorted! We are pleased to note that in the last year or so, the mastering community has finally started to become more aware of the intersample peak problem, but we are still seeing many major-label CDs that produce intersample peaks above 0 dBFS.'

TIP: Audio Studio Recording: Mastering and Gain Structure: interactive graphic with different meter reference levels.

From an excellent series of audio production tutorials from The Tenth Egg: Production Tip 6 : Preparing a mix for Mastering:

'One of the most frequent questions we get from new clients is how best to prepare their mixes for mastering and what format they should supply them in. Whether you intend to use a mastering service like ours (www.tenthegg.co.uk/mastering), tackle it yourself, or just want to keep a copy for archive there are a couple of simply steps to ensure that the mix you have is fit for the job.

1. Bit Depth and Sample Rate

Though a standard Audio CD can reproduce only 16Bit 44.1kHz digital audio it makes sense to work at the best resolution possible throughout the recording, mixing and mastering stages to ensure maximum quality of the end product. Most soundcards, software packages and hardware recorders now support 24Bit 96kHz recording and while there is still some debate about the benefits of higher sample rates most engineers would agree that 24Bit is the way to go. When it comes to mixing down even if you’ve recorded at 16Bit then there are still benefits to bouncing down your mix at 24Bit. The combination of multiple 16Bit elements will most likely have created a signal with a greater dynamic range. At 24Bit the low level detail, which will be brought up during mastering, will also be more faithfully reproduced. Regarding sample rates, your best bet is to mix down at the same resolution as you recorded. There won’t be any benefit from selecting a higher sample rate and the resulting conversion and re-conversion at the mastering stage may affect quality. If you’re not sure then opt for 44.1kHz.

2. File type

There are often lots of options here (wav, aif, SDII etc.) and most mastering houses will be able to work with whichever format you provide. But for maximum compatibility we would recommend a .wav (broadcast wave) file. Certainly you should try to avoid compressed formats such as MP3 or AAC, but if you’re forced to work in one of these then try and use a data rate of at least 256kbps. Often you will also be presented with the option of ‘split’ or ‘interleaved’ stereo files, with ‘interleaved’ being the preferred option.

3. Headroom

A degree of headroom (the gap in level between the maximum possible and that of the audio) is very important. If a signal clips at any point, even if distortion is inaudible on mixdown, it can become evident during mastering and will limit the processing options. Generally all that is needed is to pull the master fader down so that the meters no longer jump into the red at any point.

If you’re mixing down at 24Bit then you can safely leave as much as 3dB headroom. If you’re working at 16Bit then you’re going to want to maximise dynamic range so closer to 0.5dB is recommended.

4. Mix processing

Most engineers like to add a touch of overall compression, EQ and maybe even limiting when they mix down. This essentially goes some way towards creating that mastered sound and can help mixes sound lounder and play better across a range of audio systems. However, this kind of processing can again create problems at the mastering stage, especially if they have been overdone. If possible all overall mix processing should be avoided in the copy destined for mastering, as these processes can be better applied using the specialist equipment and experience available to the mastering engineer. Certainly they should be free from limiting, which can have a similar effect to clipping. If overall processing must be applied then it should be done as conservatively as possible, avoiding large EQ cuts or boosts and compression gain reduction of more than 3dB.

5. Burning questions

If you’re mastering the mixes yourself then job done, you’re ready to master. But if you’re passing them on to a mastering house then you’re probably going to need to burn a disc. You can’t go far wrong here, just ensure that you burn a Data CD rather than Audio CD or all your mixes will get converted down to 16Bit 44.1kHz and will need to be re-ripped at the other end. When burning your disc be sure to use one of the write speeds recommended on the disc to avoid data errors. Also try to avoid touching the surface of the disc before and after burning and refrain from using the disc more than once to verify its contents before sending it off.

Summary

Now if you’re relatively new to music production then all that might sound a bit daunting. But don’t worry, even if you aren’t able to meet every criteria in this list that doesn’t mean that mastering can’t make a massive difference to your mixes. Most mastering studios, including ours (www.tenthegg.co.uk/mastering) can help talk you through the best options for your particular project and help you prepare your mixes. What our recommendation represents is an ideal format that will maximise the benefits of the mastering process. i.e. a 24Bit .wav file at whichever sample rate your recorded with around 2dB headroom and free from overall mix processing.'

From Bob Katz on Digital Domain: Keeping Your Digital Audio Pure from First Recording to Final Master: everything you ever wanted to know about dithering, and the cost of cumulative dithering, and the cost of not dithering.

From Sound on Sound: MASTERING MASTERS: CD Mastering On Your PC: Tools & Techniques (2001): including more on why you need to dither when mastering down from 24-bit to 16-bit CD quality.

From Sound on Sound by Paul White, Feb 1999: 20 Tips On Home Mastering

Turn me up ! Bringing dynamics back to music., including Loudness War - The Movie.

A final word on the "loudness war" and why it matters

In Australia the loudness war has clearly been won by the retailer Harvey Norman who has the loudest and most annoying TV ads in the history of the world (not that I watch much commercial TV). They seem to have discovered a magic "penetration and annoyance" factor that is mixed into their also very visually loud ads. (I therefore refuse to shop in their shops ever, because they completely spoil any attempt to enjoy a movie on commercial TV in Australia.)
randomness