Technical question regarding lossy audio playback

humblebee

Active Member
Joined
Jan 5, 2010
Messages
336
Points
43
Location
Realm of The Immortals
So, we know that lossy audio (mp3, aac etc) encode limited data and the rest is discarded.

How much data is encoded is given by the equation:
Bitrate = channels x bit depth x sampling rate (twice of bandwidth)

So,
320 kbps = 2 x 16 x 10khz

And 10khz sampling rate will encode frequencies upto 5000 hz. (Nyquist)

Now, my question is :
When we playback audio, in say winamp or potplayer, the software shows us 44.1khz. This means the audio being rendered has frequencies upto 22.05khz.

So what gets filled up from 5khz to 22.05khz ?
Who decides whether anything gets filled up? The decoder api? Like ffmpeg decoder for mp3?
I mean a decoder could fill up anything there if it wants to.
 
Nyquist frequency applies to sampling for eg in case of PCM samples in WAV format.

When audio is compressed, information that matters less is discarded. The remaining data when converted to bit rate is what is the bitrate of the track for eg 320kbps. So this data rate is not the sampling data rate.
 
Now, my question is :

When we playback audio, in say winamp or potplayer, the software shows us 44.1khz. This means the audio being rendered has frequencies upto 22.05khz.


Well, not necessary. The conversion to mp3 does not remove samples from the set of 44100 samples, rather it alters the values of few/many and tries to fit them in a pattern. The pattern takes less storage space compared to original values. In doing so, sample values which have very high value, which means high frequencies, get shifted to lower values to fit in the pattern. I may not be 100% correct in my reply since i also had similar confusion before.
 
the equation holds good for only uncompressed PCM audio. for compressed music, algorithms are used to reduce the file size.

for example, take a pattern of alternating 1's and 0's. if you try to compress a file like that in winzip, you'll get a very small file - because the pattern is uniform & predictable. On the other hand, if you use random numbers, you'll get much bigger files. a pure sine wave can be compressed to a high extent without much loss. complex music, on the other hand, would be much harder.
 
As far as I know audio compression algorithms do not resample the original signal to a lower sampling rate (although resampling is certainly possible). So if the original source file was 44.1 kHz, with 16 bit depth, the compressed file remains 44.1/16. What gets thrown out are things like high dynamic swings which takes more number of bits to encode (so you get lower peaks), low level information which one cannot hear unless listening on highly resolving playback system (eliminate info which in any case doesn't show up in the majority of playback devices, thus saving some more bits), eliminate fast transients, etc.
 
the equation holds good for only uncompressed PCM audio. for compressed music, algorithms are used to reduce the file size.

for example, take a pattern of alternating 1's and 0's. if you try to compress a file like that in winzip, you'll get a very small file - because the pattern is uniform & predictable. On the other hand, if you use random numbers, you'll get much bigger files. a pure sine wave can be compressed to a high extent without much loss. complex music, on the other hand, would be much harder.

As far as I know audio compression algorithms do not resample the original signal to a lower sampling rate (although resampling is certainly possible). So if the original source file was 44.1 kHz, with 16 bit depth, the compressed file remains 44.1/16. What gets thrown out are things like high dynamic swings which takes more number of bits to encode (so you get lower peaks), low level information which one cannot hear unless listening on highly resolving playback system (eliminate info which in any case doesn't show up in the majority of playback devices, thus saving some more bits), eliminate fast transients, etc.

Hi Humble bee,
green horn and jls are right about the compression, let me add to that.

1) your equation holds for uncompressed audio but not for the compressed one.
2) as jls has said there is no re-sampling done. the complete data is compresses.
there are different methods used. like
a) redundancy removal as said by green horn.
b) hi frequency removal which we all hate about.
c) quantisation or detail reduction, like representing all numbers in multiple of 2 will reduce your sample set to half,
d) non uniform symbol encoding where the data that is abundant are represented by lesser bits than those that are rare.
and similar many algorithms which help in compression of data.

hope this helps.
 
In doing so, sample values which have very high value, which means high frequencies, get shifted to lower values to fit in the pattern. I may not be 100% correct in my reply since i also had similar confusion before.

This is not correct.

In MP3 encoding, all high frequencies above a specified frequency is discarded, truncated, or chopped off.

You cannot shift one frequency to another. If you try doing that, what you will get is noise.

Cheers
 
This is not correct.

In MP3 encoding, all high frequencies above a specified frequency is discarded, truncated, or chopped off.

In case of 16 bit 44.1Khz PCM audio source, it is limited to 20Khz - are you saying that there is an additional low pass filter applied during encoding :/
 
In case of 16 bit 44.1Khz PCM audio source, it is limited to 20Khz - are you saying that there is an additional low pass filter applied during encoding :/

If you see the spectrum of a 16/44 mp3 file in an audio editor, most of the content past 15-16kHz would be chopped off.
 
The confusion here is, that mp3 chops off most of the content past 15-16 kHz, at the same time it does not reduce the count of samples and still retains 44100 samples per second, then what changes it does to the sample values?[emoji144]
 
Bitrate = channels x bit depth x sampling rate (twice of bandwidth)

So,
320 kbps = 2 x 16 x 10khz

I think there are some factors wrong in the calculation:

1.>16 bit-depth.
Bit-depth makes sense when dealing with PCM data. My understanding is that MP3 uses a different approach where bit-depth is not a factor.

2.>Left/Right Channels.
My understanding is that MP3 typically does not store the left/right channel data independently. My recollection is that it allocates maximum bandwidth to the data common between the 2 channels, and then stores the uncommon data with lesser bandwidth (called Joint-Stereo if i'm right).

I'm doing a lot of hand-waving with concepts I'm not entirely clear myself. But I'm pretty sure the above calculation does not hold good with MP3. The calculation is correct for PCM (WAV) data, but that's not the point of discussion.
 
Buy from India's official online dealer!
Back
Top