TapVolume Levelling and Replay Gain

Ever since recording began there has always been the problem of varying volume levels—with a big horn gramophone you sat closer, with record players you kept getting up to adjust the volume. Sometimes this was the fault of lazy recording engineers but often it was inherent in the media; the peaks had to be limited to avoid clipping (or worse on vinyl) whilst still maintaining a respectable amplitude bandwidth. With digital media there should have been a solution at source. The bandwidth stretches below the threshold of hearing but recordings are usually made with the peak just a few dB off the maximum. The impact of this is that recordings with a wide dynamic range such as acoustic and classical music sound very quiet when compared to highly compressed pop music (compressing pop is an issue for another time).

DJs learn to “ride the fader” to keep the apparent volume steady over a session. At home we have to resort to the remote, but there is a solution with media that you have control over—or at least there should be. Digital formats such as MP3, WMA, Flac and AAC have developed mechanisms to adjust the decoder output to a user defined level. A good description of the technique is described on the MediaMonkey FAQ pages (scroll down to the last chapter).

The three different methods are

1. Recode the audio to the level you require; this is potentially destructive as information (and hence quality) is lost each time you do it and it is not reversible so few systems employ this method.

2. Code the reference level on each audio frame so that standard decoders can interpret them. This is the method used by MP3gain and MediaMonkey “Level Playback Volume” and is the most universally successful. However there is some doubt over whether it is truly reversible without loss. It is also not possible with files protected by rights management (DRM).

3. Code the adjustment required in the metadata without touching the audio stream. This truly does not lose any information but requires support from the player to interpret the tags. It is supported, amongst others by WMP “Volume Levelling”, MediaMonkey “Analyze Volume” and iTunes “Sound Check” and by players to varying degrees.

The way they all calculate what adjustment to make goes by the grand name of Psychoacoustic Analysis to judge how loud the listener perceives the music to be. The two ways to do this are by track, what used to be called “Radio Levelling” and is what you would want if you were a DJ or playing a party mix; and by album which used to be called “Audiophile Levelling.” This preserves the relative volume of the tracks on an album to respect the artist/engineer’s requirements and is what you would want if listening to a symphony with each movement as a separate track.

The problem

The definition of many of the metadata tags for method 3 are not standardised, particularly for the most common format—MP3. Different encoders do it in different ways which means that players have to decide which, if any they support. This is partly as a consequence of multiple independent tagging systems.

Replay Gain

The first format to adopt what it called Replay Gain was Flac (Ogg Vorbis) and that is (nearly) standardised using the tags REPLAYGAIN_TRACK_PEAK, REPLAYGAIN_TRACK_GAIN & REPLAYGAIN_ALBUM_GAIN. There is also a REPLAYGAIN_ALBUM_PEAK which most encoders ignore. The loose point in the standard is that it specifies a reference volume of 83dB above the threshold of hearing whereas everyone now accepts (and implements) 89dB as a better level to avoid stretching the analogue amplification too far. The technical description for the standard proposed is

…the ReplayGain tags stored in the files are 6dB above the gain adjustments required to make the files “sound as loud” as a -20dB RMS pink noise signal when replayed in an SMPTE RP 200 calibrated system. The -20dB RMS pink noise signal will measure 83dB [89dB] SPL at the listener’s seat in such a system.

http://www.hydrogenaudio.org/forums/lofiversion/index.php/t83397.html

Encoder/Taggers such as MediaMonkey also use these as sub-tags of the TXXX {user defined text information} ID3v2 tag in MP3 files. They have the advantage that they are easy to read.

Windows Media Player

It is predictable that the proprietary organisations should do it differently. “Volume Levelling” has a system for its own WMA format using PeakValue and AverageLevel tags (these are “track gain” only) which it also uses to code MP3 files as sub-tags of the PRIV {Private} ID3v2 tag. It is coded in binary. I have seen reported elsewhere on the internet that WMP uses WM/WMADRCAverageReference, WM/WMADRCPeakReference, WM/WMADRCAverageTarget, and WM/WMADRCPeakTarget tags but I can’t find evidence for this in my files. What ever it does, WMP does it very slowly just like its collection of other meta data.

Apple iTunes

“Sound Check” is different again. I can’t analyse AAC files as I can’t find a structure definition document but for MP3 files it writes an iTunNORM sub-tag of the COMM {Comment} ID3v2 tag. There are 40 bytes of binary data in there but what they mean I haven’t discovered.

LAME

Surprisingly, this very popular public domain encoder also uses a unique system called the MP3 INFO tag. Replay Gain uses bytes 167-174 (not 175 as the documentation says) of the tag coded in binary—Track Peak Amplitude (4 bytes floating point), Track Gain (2 bytes), Album Gain (2 bytes). The format of the latter two is as follows—3 bits; type code, 000=Not Set, 001=Track, 010=Album. 3 bits originator code; 000=Unspecified, 001=set by producer, 010=set by user, 011=calculated automatically. 1 bit: sign. 9 bits; value * 10.

RGAD

As well as the Vorbis type tags, MediaMonkey also writes an MP3 ID3v2 tag called RGAD {Replay gain adjustment} with 8 bytes of data supporting both track and album gain and I think there was some intention to get this standardised but I see no sign of it. The format (inside the tag) is the same as the LAME data described above.

What now

My immediate requirement is for my Sonos system to play at the correct volume. Sonos supports WMP tags for WMA & MP3, iTunes tags for AAC & MP3 and the standard tags in Flac files. It only supports “track gain” (and, as I have discovered by experiment, only supports negative values, so it will lower the volume but not raise it). What I require is “album gain” on Flac and MP3 files not written by the proprietary systems. What I need is a method to write either iTunes or WMP type tags based on the MediaMonkey ones. As a start I am working on a MediaMonkey plugin that first saves the “track gain” in a custom field for safety then copies the “volume gain” to the “track gain” field to fool the player into supporting audiophile mode. To do the rest of the job I will need to discover what the binary means in the WMP or iTunes tags.

[Edited 4 Jan 2012] to add information obtained from ReplayGain legacy metadata formats (with thanks).

3 Responses to “Volume Levelling and Replay Gain”

^ Top