Digital audio fundamental question

Guys, guys!

We were assembling some basic building blocks of understanding. Even if they might have looked purile to an actual engineer or mathematician, they were getting us somewhere.

Then it all went crazy about the nature of analogue sound, which is the least of our worries. Isn't it almost enough to say that noises happen, without going into any detail, let alone controversy, about how they happen? By the time those noises reach an ADC, they are at least one step away from sound in air: they have become electrical signals already. Some of them may never have been sound in air in the first place.

Manoj: how many points do we require to reconstruct a curve?

--- If it is an arc of a circle then I think it is three (assuming of course, we don't know its centre and radius!)

--- If it is compound curve, then it will depend on the curve

--- If it is a complex sound wave, then we need twice the highest frequency every second. Others have posted that already. Given that, we do not get a better, or more accurate, curve just by adding more samples. Whether we get a better, more accurate result from our DAC, because of the engineering involved in the filters, is indeed a different question. And whether any specific DAC product produces a more accurate sound with one sample rate (not necessarily the higher) than with another, is yet another product/engineering related question.

Thad,

Do understand one thing, I am all for digital. But My gripe is with the last bit, when people say only twice the highest frequency is needed to get accurate curve. That is based on an assumption that the incoming wave is a perfect sine wave. I am repeating here so many times - Nobody has said twice the frequency is the max needed. The theorem as applied says min 4 is needed.

That's all I am trying to say. Will more make it better? Who knows. But nobody is saying only 4 sample per frequency is all one needs to reconstruct the wave completely.
 
Here is my take:

Given a set of points, there are infinite number of curves that fit those points. And also consider that reality need not be 3-dimensional, but multi-dimensional. But when we put limits/conditions - such as min of 20 hz and max of 20khz, then that number reduces, and hence representable in limited amount of information.

Yes - digital always starts on the premise of how much it can store and how much it will cover. It will drop (clip/compress) anything beyond those limits. Anytime it does that, there will be distortions in the frequencies near those limits.
 
I care because that's what the truth is. But rather than admitting it was a misunderstanding, the whole argument is created around it with more and more analogies.
Don't get me wrong. I am in the digital camp. The only analog I have is the cassette deck and that's also because I have some cassettes from my college days. But it hardly gets used. Its there for sentimental reasons. :)

Yes, we just witnessed one here with sound wave not being continuous.
Well, if you care about the truth, perhaps you should keep an open mind too. Physics is a tricky subject.

Here's a wikipedia article for starters.

To quote:
"After quantization of the electromagnetic field, the EM (electromagnetic) field consists of discrete energy parcels, photons. Photons are massless particles of definite energy, definite momentum, and definite spin."

Lest you think this is one more digression - any electromagnetic field is the same as your notion of a sound wave. It is a means of transferring energy. But I quoted this article to point out that existing physics also allows for interpretation of a field (wave) in terms of discrete quanta of energy.

The point I am trying to make is - the wave model and the quantized (discrete pulse) model are simply two ways of thinking about (and explaining) these natural phenomena - such as sound.

So our whole argument boild down to two different notions (or ways of thinking) we have in our head. It is not necessary for one of us to be wrong.

However, what is important is that we be in agreement about the consequence of these physical phenomena - and sound we hear in our ears is the real physical consequence.

I don't know how much physics I have studied, but I do read and try to understand. The analog to digital capture in this thread happens to be one such a topic and I tried to explain the way I understand it with visually. I know you said its wrong ...

Gah, I never said you are wrong. What I meant to say was that in the jagged analog wave you have visualized in your head, what matters is the sequential state transition (Y axis value), and how fine grained these state transitions are (smallest possible X axis value). so what you mentally think of as a continuous sound wave - I think of it as a sequential set of state transitions.

You are trying to superimpose the jagged looking sound wave into a sine wave. However, that is where your visualization is incorrect. In reality, each small piece of the sound wave (smallest slice) consists of a batch of perfect sine waves. The number of sine waves in this batch indicates the frequency. So a 0.5 second slice for a 20Hz analog wave will have 10 perfectly shaped sine waves in it. This is what 20Hz really means.

Thad - to your earlier question, all we need to draw these sine waves is the height and width. A simple math function like sin(x) draws this quite reliably. To digress, this is how vector graphics (Flash for example) works vs bitmapped graphics (GIF or JPEG file for example). Vectorized information stores instructions on how to reproduce something vs bitmapped information that stores every possible value. But I am sure you know all this too :)

You can use this graphing app for example, and enter sin(x) in the textbox on the left to see how easily a computer can draw perfect sine waves.
[/QUOTE]

I know you said its wrong and but I haven't heard a different explanation from you other than quantum physics, discrete data, and non-continuous waves, with eardrums + stones in water + what not thrown in. Well, that's probably what you explained above.
Well, agree with you here. The ideal sound is sine wave, but not so in reality. But think for a second, won't we need more data to accurately reconstruct the imperfect actual wave? Again - I don't know what that number is.

Well, all I want is the meaningful discussion, devoid of digressions. There may be theories, and one doesn't need to know those theories for enjoying the music. We can ignore those. But we cannot dismiss those theories and how those are applied in a discussion and totally go around creating analogies. Nobody has to take my word for it. If someone says I am wrong, then explain it to me.

And no - I am not talking about purity cause I am not in that camp. All the music or sounds we hear are processed sound.

Come one, cut me some slack :)

I am only trying out all these analogies because I am trying to use words and descriptions that make sense to me :) I cannot give a fourier transform explanation because I don't understand it myself (studied and forgotten, unfortunately).

And for the record, the tuning fork example I gave was not an analogy. That is biological fact. We hear sound because we have 20,000-30,000 hairs behind the ear drum - each tuned to a different resonant frequency. Sound at certain frequencies end up vibrating the appropriate hairs which in turn sends an electrical signal to our brain. So our "resolution" of sound is literally 20,000 discrete frequencies (or think of it as 20,000 pixels).

But I am sure our brain is also doing all kinds of fancy extrapolation and anti-aliasing so we probably end up hearing much more nuanced and delicate state transitions in sound.

Another interesting thing to note is that when the ear hears loud volume or loud bass, it actually flexes the ear muscles which make the ear drum more taut and stiff. This effectively acts as a (low pass?) filter and screens out much of these frequencies. This also routinely happens when we talk - because our own voice is super loud to our ears.
 
Last edited:
I think I understand where you are having difficulty in accepting this. Let me give it a shot at explaining this.

1. First the audio waveform - This is both continuous and discrete
- The whole song is not one perfect sine wave but at every discrete point in time there is at least one or more notes.
- Every note is atleast perfect sine wave of a given frequency and amplitude (one crest and one trough). The longer the note is played the length of the wave increases in the time scale (same height and width but many more crests and troughs)
- A song as a whole is a summation of many such perfect sine waves (This is why Poisson's summation is a part of the Proof for the Nyquist - Shannon Theorem)

2. Sampling in context
- these are not a conventional measurement in space and time which can later be used as (x,y) coordinates to form the curve
- sampling in this context is measurement as a derivative of the application of a transformation function, when reversed will have only one unique solution and not multiple
- in simpler terms we have already established a song is a summation of perfect sine waves. Every sine wave can be represented by a function x(t) resulting in an unique wave which will perfectly recreate that one sine wave
- The Shannon's version of the same theorem states and proves that

If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.

- for a 20,000 Hz wave you need a reading every 0.000025 apart. It doesn't say so many reading are required, that will depend on the length of the sine wave in time scale. IF the time length of the wave is so short that it can accommodate only 4 x 0.000025 intervals then 4 readings will be enough.
- For every perfect wave, you will take as many readings as distanced by the above formula multiplied by the length of the wave in time scale

- When all the sine waves are re created and summed up you should have the whole song

Please see monty's video, he will demonstrate this using both digital and analogue oscilloscopes for frequencies 20kHz and above.

Hope this helped.

Thad,

Do understand one thing, I am all for digital. But My gripe is with the last bit, when people say only twice the highest frequency is needed to get accurate curve. That is based on an assumption that the incoming wave is a perfect sine wave. I am repeating here so many times - Nobody has said twice the frequency is the max needed. The theorem as applied says min 4 is needed.

That's all I am trying to say. Will more make it better? Who knows. But nobody is saying only 4 sample per frequency is all one needs to reconstruct the wave completely.
 
Gah, I never said you are wrong. What I meant to say was that in the jagged analog wave you have visualized in your head, what matters is the sequential state transition (Y axis value), and how fine grained these state transitions are (smallest possible X axis value). so what you mentally think of as a continuous sound wave - I think of it as a sequential set of state transitions.

You are trying to superimpose the jagged looking sound wave into a sine wave. However, that is where your visualization is incorrect. In reality, each small piece of the sound wave (smallest slice) consists of a batch of perfect sine waves. The number of sine waves in this batch indicates the frequency. So a 0.5 second slice for a 20Hz analog wave will have 10 perfectly shaped sine waves in it. This is what 20Hz really means.

Ahh, that jagged thing is because I used the mouse to draw it. I was just using it as a chalkboard and kind of showing how the data + wave is constructed. Didn't want to convey that the outcome is jagged.
Thad - to your earlier question, all we need to draw these sine waves is the height and width. A simple math function like sin(x) draws this quite reliably. To digress, this is how vector graphics (Flash for example) works vs bitmapped graphics (GIF or JPEG file for example). Vectorized information stores instructions on how to reproduce something vs bitmapped information that stores every possible value. But I am sure you know all this too :)

You can use this graphing app for example, and enter sin(x) in the textbox on the left to see how easily a computer can draw perfect sine waves.

Arun - there are two issues with this. Assumption 1: that the wave is true sine wave. Which in reality may not be. Assumption 2: We don't have the width of the wave. All we have is amplitudes taken at determinate time gaps.

Come one, cut me some slack :)

I am only trying out all these analogies because I am trying to use words and descriptions that make sense to me :) I cannot give a fourier transform explanation because I don't understand it myself (studied and forgotten, unfortunately).
Understood.
 
I think I understand where you are having difficulty in accepting this. Let me give it a shot at explaining this.

1. First the audio waveform - This is both continuous and discrete
- The whole song is not one perfect sine wave but at every discrete point in time there is at least one or more notes.
- Every note is atleast perfect sine wave of a given frequency and amplitude (one crest and one trough). The longer the note is played the length of the wave increases in the time scale (same height and width but many more crests and troughs)
- A song as a whole is a summation of many such perfect sine waves (This is why Poisson's summation is a part of the Proof for the Nyquist - Shannon Theorem)

My understanding is - every note creation is unique and discrete. No argument there. But the resulting wave - once its create is continuous. The wave will be just long enough as the duration of the note. But that wave is continuous till it dies.

- for a 20,000 Hz wave you need a reading every 0.000025 apart. It doesn't say so many reading are required, that will depend on the length of the sine wave in time scale. IF the time length of the wave is so short that it can accommodate only 4 x 0.000025 intervals then 4 readings will be enough.
- For every perfect wave, you will take as many readings as distanced by the above formula multiplied by the length of the wave in time scale

- When all the sine waves are re created and summed up you should have the whole song

Please see monty's video, he will demonstrate this using both digital and analogue oscilloscopes for frequencies 20kHz and above.

Hope this helped.

The way I understand is - the sampling rate is not different for different frequencies. In case CD's, the 20 Hz signal will also get sampled 44.1k times/sec as well as 20Khz signal. There are samples taken at (1/44100) seconds irrespective of what the incoming frequency is. If the wave is so short that you can't fit min 4 samples in a wavelength, then the DAC will not be able to produce it accurately.

Please enlighten me if it's otherwise.
 
Last edited:
I think even aliasing is fairly misunderstood as well. Let me try and explain what it is.

When you are sampling within a range of frequencies, say 20 - 20k Hz, if your input audio contains a signal at 20KHz and 30Khz for example, then all three will result in the same value equating to that of 20Khz. When converted back to analogue this will create a second waveform called an alias.

There are 2 techniques to combat this.

You can store extra information and extrapolate the waveform to 30Hz approximately, but the problem is in identifying which is the original and which is the alias. Even if we did identify, extrapolation will result in a degraded audio signal.

Instead the easier approach is taken, since we can hear only between 20 and 20K Hz, a low-pass filter is used before the sampling function to keep all frequencies below 20Khz only. Considering the use of the low pass filter, in context it is called an anti-aliasing filter.

Hope this helps.
Yes, this helps.
My understanding of why the Nyquist theorem talks of minimum and not anything more is because that is how a good engineer would/should think in any sphere he is working - what is the minimum that needs to be done to achieve a specified result or objective. Any thing more is wasted resource or effort, and not good engineering. As much not good engineering as inadequately achieving the specified objective is not good engineering. The theorem therefore proves what is the minimum needed to achieve the objective,taking care of both the above characteristics of what is good engineering.
And the theorem refers to situations where the bandwidth is limited, with known limits that are not violated. This is not the case in audio, where this theorem was applied - it did not originate for digital audio use. Hence the aliasing thing, and the means referred to above to deal with it. Along with other things developed to deal with other technical issues that were observed to stand in the way of accurate ADC and DAC implementation in audio.
 
My understanding is - every note creation is unique and discrete. No argument there. But the resulting wave - once its create is continuous. The wave will be just long enough as the duration of the note. But that wave is continuous till it dies.
Correct, the final wave is continuous and not a perfect sine wave of a single function, but when you split along the time scale, it becomes a sum of perfect sine wave at that instant point in time. Like a sum of many notes of differing frequencies create a continuous song. I think you understand this.
The way I understand is - the sampling rate is not different for different frequencies. In case CD's, the 20 Hz signal will also get sampled 44.1k times/sec as well as 20Khz signal. There are samples taken at (1/44100) seconds irrespective of what the incoming frequency is. If the wave is so short that you can't fit min 4 samples in a wavelength, then the DAC will not be able to produce it accurately and aliasing will be there.

Please enlighten me if it's otherwise.
You are correct, band limiting between 20 and 20K Hz requires you to measure every 0.025seconds for a 20 Hz wave, 0.024 seconds for a 21 Hz wave, ...., 0.000025 seconds for a 20000 Hz wave. when you are measuring every 1/44100 seconds i.e. you are measuring every 0.000023 seconds.

Effectively giving you more measurement points than you actually need for any wave between 20Hz and 20KHz.

If the wave is so short that you can't fit 4 intervals then given the sampling rate of 1/44100 seconds the wave lasted less than 0.000092 seconds. Now multiply this by speed of sound (340.9 m/s) gives you 0.03 m which is 30mm.
- This would never have reached the recording mic unless it was connected to the intrument
- If it was on the instrument and recorded, you would still have to sit as close as 30mm from the speaker to listen to it when recreated.
 
Arun - there are two issues with this. Assumption 1: that the wave is true sine wave. Which in reality may not be. Assumption 2: We don't have the width of the wave. All we have is amplitudes taken at determinate time gaps.
1. One of the basic principles of this area of physics/math is that any given wave can be expressed completely accurately as a sum of perfect sine waves. Actually Rajagopal's explanation was a much better one.
2. You are correct, you don't have the width from a single sample. Which is why we take so many samples. The theorem says this sample rate is mathematically enough.

But perhaps a non-math way of thinking (there i go again!) is that in the case of a 20KHz wave, two successive samples allow us to detect the peak and the trough and lets us figure out that the wave was indeed a 20KHz (and thus allows our math to draw it). With a slightly longer 19Khz wave, two samples are not enough, and no, none of our samples hit the peak or the trough. However, the comparisons of successive samples (3 i think in this case) allows us to detect the slope changes - and we can basically use this technique to still draw the 19KHz wave correctly. This is enough information and we have the math to do this reliably and accurately.
 
You are correct, band limiting between 20 and 20K Hz requires you to measure every 0.025seconds for a 20 Hz wave, 0.024 seconds for a 21 Hz wave, ...., 0.000025 seconds for a 20000 Hz wave. when you are measuring every 1/44100 seconds i.e. you are measuring every 0.000023 seconds.

Effectively giving you more measurement points than you actually need for any wave between 20Hz and 20KHz.

If the wave is so short that you can't fit 4 intervals then given the sampling rate of 1/44100 seconds the wave lasted less than 0.000092 seconds. Now multiply this by speed of sound (340.9 m/s) gives you 0.03 m which is 30mm.
- This would never have reached the recording mic unless it was connected to the intrument
- If it was on the instrument and recorded, you would still have to sit as close as 30mm from the speaker to listen to it when recreated.

yes, you will be able to record it. No need to keep the mic within 30 mm of instrument. A 15Khz wave is 22 mm long. We are able to hear it without a problem.

Once the wave is created, even if its 30 mm long, it will keep on moving till its energy is all depleted by pushing air molecules.
 
yes, you will be able to record it. No need to keep the mic within 30 mm of instrument. A 15Khz wave is 22 mm long. We are able to hear it without a problem.

Once the wave is created, even if its 30 mm long, it will keep on moving till its energy is all depleted by pushing air molecules.

There is a reason why you have mics almost inside the bell of a sax or right next to the cymbals and hi-hats of a drum kit.

You are forgetting the time scale. One instance of the the 15Khz will not be audible as it will die before it reaches the mic. It has to be played for a long time or the player has to sustain the note till it reaches the mic, meaning, longer on the time scale or multiple perfect sine waves for the note (more than one crest and trough). Remember The sampling is for each pair of crest and trough. Hence again the summation.
 
I think we should get right perspective on mathematics and reality.

So, all that scientists do is to model reality in terms of mathematics. And then manipulate that reality using that mathematical understanding. Hence, we can verify that A to D to A is accurate in theory or not; but this is because we are able to do measurement of some underlying principle, and we believe that measurement captures the ultimate reality. Perhaps it does, and perhaps it doesn't!

Things like i (i.e. square root of -1), which are integral part of the mathematics used for digital audio, doesn't have equivalent concept in reality that we can understand. And quantum aspects of reality make it still more bizarre. They make observer influence this reality and how it unfolds...

One hit on head, and this man started seeing world in fractals. What can explain that?
A Beautiful Mind: Brain Injury Turns Man Into Math Genius

Life is indeed mysterious!
 
There is a reason why you have mics almost inside the bell of a sax or right next to the cymbals and hi-hats of a drum kit.

You are forgetting the time scale. One instance of the the 15Khz will not be audible as it will die before it reaches the mic. It has to be played for a long time or the player has to sustain the note till it reaches the mic, meaning, longer on the time scale or multiple perfect sine waves for the note (more than one crest and trough). Remember The sampling is for each pair of crest and trough. Hence again the summation.
Yes, you need multiple to hear it, because its our hearing. But a sensitive mic will be able to record it if there is enough amplitude and electronics monitors can show that. Agree with mic placement part too, they keep the microphone near instruments because they want to record it without any external influence and even the low amplitude signals as well. A mic 5 ft away can still record a drum beat, but not all the sounds created by it compared to the one which is next to it.

About samples at crest and trough - there is no guarantee those samples will be taken at precisely crest and trough. Mainly because the sampling does not adjust itself as per the incoming frequency. Sampling happens at fixed duration and it just captures whatever the amplitude is. At that point, it will not be necessarily peaks and troughs but data anywhere on the wave.
 
Last edited:
Yes, you need multiple to hear it, because its our hearing. But a sensitive mic will be able to record it if there is enough amplitude and electronics monitors can show that.

Its not our hearing, this is where acoustic impedance comes into picture. you need multiple waves for it to reach your ear to hear. The longer the wave sustains the lesser is lost to acoustical resistance/impedance/friction and the longer and clearer you hear the note. If it doesn't sustain enough to reach you ear you don't hear. As you say "if there is enough amplitude", there has to be enough of it for you to hear meaning the spl has to be high which is effectively a very large amplitude or it has to be long by repetition. Basically it has to be long on the time scale either as a single wave or multiple, usually its the latter. Because instruments simply don't have that large SPLs at that high frequencies.
About samples at crest and trough - there is no guarantee those samples will be taken at precisely crest and trough. Mainly because the sampling does not adjust itself as per the incoming frequency. Sampling happens at fixed duration and it just captures whatever the amplitude is. At that point, it will not be necessarily peaks and troughs but data anywhere on the wave.
I don't think you understand this part, the points of measure don't require them to be crest of peak. There have to be 4 and they have to be 0.000025 seconds apart thats all. Consider one sine wave, the shortest will be for 20Khz, it requires a measurement every 0.000025 seconds and you need 4 measurements, then the wave has to sustain for 0.0001 seconds to be sampled perfectly. in comparison it is the 20 Hz wave which should be difficult to capture because it has to sustain for 0.1 second for you to get 4 measurements, but 0.1 second is still too short a time and is easily captured.

Natural instruments sustain notes for well more than 0.1 seconds.
 
I don't think you understand this part, the points of measure don't require them to be crest of peak.
That's what I was saying but I understood from your post that you mean to say we need data on peak and trough. May be we are not communicating properly.
There have to be 4 and they have to be 0.000025 seconds apart thats all. Consider one sine wave, the shortest will be for 20Khz, it requires a measurement every 0.000025 seconds and you need 4 measurements, then the wave has to sustain for 0.0001 seconds to be sampled perfectly. in comparison it is the 20 Hz wave which should be difficult to capture because it has to sustain for 0.1 second for you to get 4 measurements, but 0.1 second is still too short a time and is easily captured.

Natural instruments sustain notes for well more than 0.1 seconds.
You mean to say "20 Hz wave which should'nt be difficult to capture"? compared to 20khz? Cause a 20Hz wave will linger longer compared to 20khz. A 20 Hz wave is 17m long compared to 17mm for 20Khz.
 
That's what I was saying but I understood from your post that you mean to say we need data on peak and trough.

We need to be able to derive (and later redraw) the peaks and troughs from the sample data. That is achieved by mathematical functions.
 
This has turned to be very interesting and thought provoking thread, with a lot of useful and patently knowledgeable contributions, way beyond what I expected when I started it.
Much obliged to these.
There have been a couple of weird one off posts, but that is much better than par for the course for such forums! My experience is that if no one responds to them, the thread tends to remain closer to the topic.
Thought provoking is always good to test and expand what one knows. With just this in mind, I have started another thread, that should be just as much fun as this one, I imagine:):
http://www.hifivision.com/amplifiers/52267-determinants-amplifier-performance.html
 
Here is my take:

Given a set of points, there are infinite number of curves that fit those points.
As I understand it so far (and "so far" is an important qualification) if that is true, then Mr Nyquist was wrong.

Mr Nyquist's theorum makes digital music possible.

Mr vikoma's theorum makes digital music impossible.

Let me hurry to add that, for anything other than a simple curve (remember my metal strip from earlier in the thread?) I'd be inclined to agree with you. My pencil could connect those points in infinite ways. But my metal strip couldn't, and I still suspect that that is the crux of this thing.

And, I bet that one or more if us is listening to digital music right now, so it is possible. On this basis, I'm inclined to think that Mr Nyquist must have got it right, which leaves me the problem not of saying that he didn't, but of trying to understand how. I'm probably going to hot a brick wall at FFT, but ...I'll try.

And also consider that reality need not be 3-dimensional, but multi-dimensional. But when we put limits/conditions - such as min of 20 hz and max of 20khz, then that number reduces, and hence representable in limited amount of information.
We don't put 20-20k as a limit: our ears do. And that is at best. Nature thus reduced the amount of information we can cope with ---but it's ok, because the voices, the instruments, the composers, and so on, live with the same restrictions, so everything works out in the end :)



Thad,

Do understand one thing, I am all for digital.
Manoj, I know you are all for digital. I think everybody in the conversation is. We are bandying words about in the hope of getting them better and better each time. It helps us all.
But My gripe is with the last bit, when people say only twice the highest frequency is needed to get accurate curve.That is based on an assumption that the incoming wave is a perfect sine wave.
Again, if this stuff only worked for sine waves, it would only work for sine waves. But it doesn't only work for sine waves: it works for music, and we have the evidence of that.

I am repeating here so many times - Nobody has said twice the frequency is the max needed. The theorem as applied says min 4 is needed. That's all I am trying to say. Will more make it better? Who knows. But nobody is saying only 4 sample per frequency is all one needs to reconstruct the wave completely.

I think it has been answered that, no. more won't make it better, it might even make it worse. I've been away from the net for a whole day, and have a page of posts to catch up on...

EDIT... now I see this post is a page or so behind the times...
 
Last edited:
Discrete v continuous

This has turned to be very interesting and thought provoking thread, with a lot of useful and patently knowledgeable contributions, way beyond what I expected when I started it.
There have been a lot of discussion about this here.
Just to keep the kettle boiling, allow me to throw in Zeno's paradox into the mix:).
He would argue that a vibrating tuning fork can be shown to not be moving at all.
Any relevance to the post subject?!
 
Get the Wharfedale EVO 4.2 3-Way Standmount Speakers at a Special Offer Price.
Back
Top