Technical question regarding lossy audio playback

humblebee · Oct 11, 2015

So, we know that lossy audio (mp3, aac etc) encode limited data and the rest is discarded.

How much data is encoded is given by the equation:
Bitrate = channels x bit depth x sampling rate (twice of bandwidth)

So,
320 kbps = 2 x 16 x 10khz

And 10khz sampling rate will encode frequencies upto 5000 hz. (Nyquist)

Now, my question is :
When we playback audio, in say winamp or potplayer, the software shows us 44.1khz. This means the audio being rendered has frequencies upto 22.05khz.

So what gets filled up from 5khz to 22.05khz ?
Who decides whether anything gets filled up? The decoder api? Like ffmpeg decoder for mp3?
I mean a decoder could fill up anything there if it wants to.

goldyrathore · Oct 11, 2015

Nyquist frequency applies to sampling for eg in case of PCM samples in WAV format.

When audio is compressed, information that matters less is discarded. The remaining data when converted to bit rate is what is the bitrate of the track for eg 320kbps. So this data rate is not the sampling data rate.

humblebee · Oct 11, 2015

I know. The equation above says so. Data rate - 320, sampling rate 10khz. But my question is something else

amit11 · Oct 11, 2015

humblebee said:
Now, my question is :

When we playback audio, in say winamp or potplayer, the software shows us 44.1khz. This means the audio being rendered has frequencies upto 22.05khz.

Well, not necessary. The conversion to mp3 does not remove samples from the set of 44100 samples, rather it alters the values of few/many and tries to fit them in a pattern. The pattern takes less storage space compared to original values. In doing so, sample values which have very high value, which means high frequencies, get shifted to lower values to fit in the pattern. I may not be 100% correct in my reply since i also had similar confusion before.

greenhorn · Oct 12, 2015

the equation holds good for only uncompressed PCM audio. for compressed music, algorithms are used to reduce the file size.

for example, take a pattern of alternating 1's and 0's. if you try to compress a file like that in winzip, you'll get a very small file - because the pattern is uniform & predictable. On the other hand, if you use random numbers, you'll get much bigger files. a pure sine wave can be compressed to a high extent without much loss. complex music, on the other hand, would be much harder.

jls001 · Oct 12, 2015

As far as I know audio compression algorithms do not resample the original signal to a lower sampling rate (although resampling is certainly possible). So if the original source file was 44.1 kHz, with 16 bit depth, the compressed file remains 44.1/16. What gets thrown out are things like high dynamic swings which takes more number of bits to encode (so you get lower peaks), low level information which one cannot hear unless listening on highly resolving playback system (eliminate info which in any case doesn't show up in the majority of playback devices, thus saving some more bits), eliminate fast transients, etc.

kul · Oct 12, 2015

greenhorn said:
the equation holds good for only uncompressed PCM audio. for compressed music, algorithms are used to reduce the file size.

for example, take a pattern of alternating 1's and 0's. if you try to compress a file like that in winzip, you'll get a very small file - because the pattern is uniform & predictable. On the other hand, if you use random numbers, you'll get much bigger files. a pure sine wave can be compressed to a high extent without much loss. complex music, on the other hand, would be much harder.

jls001 said:
As far as I know audio compression algorithms do not resample the original signal to a lower sampling rate (although resampling is certainly possible). So if the original source file was 44.1 kHz, with 16 bit depth, the compressed file remains 44.1/16. What gets thrown out are things like high dynamic swings which takes more number of bits to encode (so you get lower peaks), low level information which one cannot hear unless listening on highly resolving playback system (eliminate info which in any case doesn't show up in the majority of playback devices, thus saving some more bits), eliminate fast transients, etc.

Hi Humble bee,
green horn and jls are right about the compression, let me add to that.

1) your equation holds for uncompressed audio but not for the compressed one.
2) as jls has said there is no re-sampling done. the complete data is compresses.
there are different methods used. like
a) redundancy removal as said by green horn.
b) hi frequency removal which we all hate about.
c) quantisation or detail reduction, like representing all numbers in multiple of 2 will reduce your sample set to half,
d) non uniform symbol encoding where the data that is abundant are represented by lesser bits than those that are rare.
and similar many algorithms which help in compression of data.

hope this helps.

venkatcr · Oct 12, 2015

amit11 said:
In doing so, sample values which have very high value, which means high frequencies, get shifted to lower values to fit in the pattern. I may not be 100% correct in my reply since i also had similar confusion before.

This is not correct.

In MP3 encoding, all high frequencies above a specified frequency is discarded, truncated, or chopped off.

You cannot shift one frequency to another. If you try doing that, what you will get is noise.

Cheers

greenhorn · Oct 13, 2015

venkatcr said:
This is not correct.

In MP3 encoding, all high frequencies above a specified frequency is discarded, truncated, or chopped off.

In case of 16 bit 44.1Khz PCM audio source, it is limited to 20Khz - are you saying that there is an additional low pass filter applied during encoding :/

reignofchaos · Oct 13, 2015

greenhorn said:
In case of 16 bit 44.1Khz PCM audio source, it is limited to 20Khz - are you saying that there is an additional low pass filter applied during encoding :/

If you see the spectrum of a 16/44 mp3 file in an audio editor, most of the content past 15-16kHz would be chopped off.

amit11 · Oct 13, 2015

The confusion here is, that mp3 chops off most of the content past 15-16 kHz, at the same time it does not reduce the count of samples and still retains 44100 samples per second, then what changes it does to the sample values?[emoji144]

gryph0n · Nov 11, 2015

humblebee said:
Bitrate = channels x bit depth x sampling rate (twice of bandwidth)

So,
320 kbps = 2 x 16 x 10khz

I think there are some factors wrong in the calculation:

1.>16 bit-depth.
Bit-depth makes sense when dealing with PCM data. My understanding is that MP3 uses a different approach where bit-depth is not a factor.

2.>Left/Right Channels.
My understanding is that MP3 typically does not store the left/right channel data independently. My recollection is that it allocates maximum bandwidth to the data common between the 2 channels, and then stores the uncommon data with lesser bandwidth (called Joint-Stereo if i'm right).

I'm doing a lot of hand-waving with concepts I'm not entirely clear myself. But I'm pretty sure the above calculation does not hold good with MP3. The calculation is correct for PCM (WAV) data, but that's not the point of discussion.

Technical question regarding lossy audio playback

humblebee

Active Member

goldyrathore

Active Member

humblebee

Active Member

amit11

Well-Known Member

greenhorn

Well-Known Member

jls001

Well-Known Member

kul

New Member

venkatcr

Well-Known Member

greenhorn

Well-Known Member

reignofchaos

Well-Known Member

amit11

Well-Known Member

gryph0n

Member

Similar threads