Reflections on talk by "Father of Perceptual Audio Coding"

ajinkya

Active Member
Joined
Oct 27, 2007
Messages
506
Points
43
Last week, I was lucky enough to attend a seminar given by James Johnston (JJ), who is presently the Chief Scientist of DTS, Inc. He has been called the father of perceptual audio coding for his pioneering contributions (that include the MP3 standards) that revolutionized digital audio. His accomplishments during a 26-year career at AT&T Bell Labs have, among other achievements, allowed for the distribution of digital music and digital radio over the Internet. For more details, Ive put a brief bio at the end of this post.

This post may become long and rambling at times, since I want to write down everything I remember from the seminar before it is lost to the annals of time and memory. I want to share whatever I gleaned from his talk with fellow forum members, since there were many interesting things I was exposed to during that one hour.

JJ gave his talk in a conference room full of some of the best people in the perceptual audio (and music) circle. Thats why he started off by apologizing to all the musicians in the room for helping invent MP3. He then went on to elaborate on the science and art of acoustics, human hearing and digital music. Heres the synopsis:

1.If you want to hear live music, Be There!

JJ explained analytically the acoustic wave equations that result from a musical concert in small to large halls. I will gloss over the mathematics for brevity, but the short point is, they are exceedingly complex! Concert halls (or any room for that matter) set up multiple modes of resonance, interference and destruction patterns, all particular to the room geometry and materials. If you want to capture the acoustic experience and reproduce it from a system, theoretically you would need infinite channels to capture spatial and time information completely. For the mathematically inclined, the infinite assumption can be relaxed if you sample the spatial dimension at the Nyquist rate, based of the fundamental frequency of the highest note produced during the performance. Then you have to sample at the Nyquist rate in time, at each spatial point as well. For the non-engineers, in English this means we need a lot of microphones (mics), placed in a lot of places in the hall, all sampling fast enough in time to be able to capture and accurately reproduce the original musical signal. This is a practical impossibility for most recording studios to do today and most recording mediums currently to hold this information.

This is also a theoretical impossibility because of the following reasons. The sound field is not uniform over the concert hall (obviously). The sound field is also not uniform at spaces separated by 3-4 inches as well (non-obvious). This is again because of the complex relationship between the wavelengths of the music to the geometry of the concert hall. JJ pointed out a phenomena which can be observed if you go to concerts regularly. I dont, so forum members who do can verify or comment on this. JJ said that most people will move their heads slightly from side to side during the performance to adapt to the changing soundfield. So, if a listener likes bright sound but the soundfield suddenly becomes diffuse around his head (for some frequency combinations), they will try to adjust for the best local sweet spot by moving their heads slightly till they are again happy with what they hear. Which means that the soundfield changes even in the small space around a listeners head enough for there to be perceptual differences at different points around the head region. Of course, JJ is an engineer and so he showed us measurements of this phenomenon as well. The point is, in order to accurately reproduce the musical experience for a single listener at some theoretical central point in the hall (maybe in an equilateral triangle from the musicians stage), you would need mics placed in a hemisphere around the head, at very small spatial intervals. And Heisenbergs principle will cause the measurement to be affected. In English, if you place so many mics, they will absorb sound energy (in order for the system to capture and reproduce the signal), and so the presence of mics will fundamentally affect the soundfield and change the very sound that the mics are trying to capture.

The whole point of the above discussion is that you cannot capture the complete physical experience of being in a concert hall just by placing a small number of mics on the soundstage and around the hall. The effect of the hall is invariably lost because of the physics of the acoustic phenomena. Which is also why stereo sound CANNOT capture the live music experience, no matter what recording techniques you try. Real acoustics are all around your head, from every direction. There is no front soundstage in isolation. And we do not have infinite bandwidth to store and capture all this information. In a nutshell, if you want to hear the concert exactly the way it sounded live, then be there when the concert is being played live.

2. A Centre channel centres the soundstage

JJ pointed out a series of interesting experiments carried out in 1934-36 where the effect of placing a centre speaker was measured on a large number of people. I dont recall the names of the experimenters now but if someone does, feel free to tell me. In each trial, the person could get a better sense of depth, localization of the front stage and slight height perception. Overall, the front soundstage was judged much better than with two (left-right) speakers. The interesting thing was, this was true for music as well as movie dialogue. JJ sadly shook his head and said that the audio industry had not learnt the lessons of something we knew so many years back. They still dont incorporate a centre speaker as standard. According to JJ, the centre channel is as important for music as the left-right and not just for voice. The centre fixes the soundstage and conveys depth and distance (difference) information to our ears, which helps our brain form the impression of the placement and musicians much easier than with only two channels. As an engineer, this makes sense to me, assuming the information from the centre channel is correctly captured. JJ advised people to spend as much care and attention to the centre speaker as they would for the stereo pair. Again, the message was clear: two-channel sound is not the be-all, end-all. At a minimum, three channels are needed for accurate audio rendition of the frontstage.

A brief discussion of quadraphonic sound came up during this time. JJ was emphatic: quadraphonic was a horrible, terrible mistake. All it did was take information from the stereo channels and add it as delayed sound to the rears. There was no centre channel and no new information being conveyed by the rear channels. People would ask, what exactly am I paying extra for? It was a flawed approach to begin with.

3. Hear Ye, Ear Ye

Our body is a wonderful piece of organic machinery. Our sensory organs have evolved to be sophisticated yet subtle conveyors of complex information to our brain, which then sorts and manages all these data streams before making a decision. JJ covered human hearing in five slides. He helpfully mentioned that it takes a new student at least a semester or more to understand what he just spoke about. So I wont confuse forum members by writing about the details of the human auditory system beyond a few points that are experimentally verifiable. The theory behind the ears is very complex, most of it still unknown and evolving and open to new research. In fact, there was an interesting exchange between JJ and a professor of perception during these slides. JJ said, And so this is what happens and we dont really know why. Professor said, Well, now we do and Ill explain it to you after the lecture. JJ said, No, now you think you know why and youre not sure yet if your theory is completely correct. Ill still be interested in listening to you while remaining a skeptic. This was typical of the question-answers which went on during the talk. The interesting point for me was that these were all excellent people in their field but they agreed to disagree while maintaining their faith in their understanding of the phenomenon. A lesson for all of us on the forum while arguing about our respective audio systems?

Back to the ear. The important results (for us) after analyzing what goes on in the ear are the following:
1. If you listen to something differently, you will REMEMBER different things This is not an illusion.
2. If you have reason to assume things may be different, you will most likely listen differently. Therefore, you will remember different things
This has been taken from this presentation online, which is an expanded version of the 5 slides JJ showed us. www.aes.org/sections/pnw/ppt/jj/hashighlevel.ppt

Which again goes to prove that unless listening tests are conducted without bias (and blind), there is a high chance of the listener hearing something that is not in the audio source. But s/he will hear it as reality. They are not lying but the test has failed since non-audio parameters have marred the accuracy of the test. It is the same thing with taste and smell, both are strongly associated. When you smell an apple and then bite into it, you will get an apple taste. If you close your nose completely and dont know what youre biting into, the apple may taste like a raw potato. When doing a listening test, try to isolate all bias and prejudice. This is humanly almost impossible to do. Which is why I now understand tests where listeners could not differentiate between two amplifiers of widely different price points in a blind test but could make out which sound they preferred once they listened to them separately, non-blind. They heard differently because of their inherent bias in what they liked. And they truthfully reported what they heard. The fault was with the non-blind second test, not the listeners.

I am going to end the first part of this post here (its long enough already) and write up the second part in a few days. Ill keep editing as I recall any missed points. Here is JJs bio for those interested:

He received the BSEE and MSEE from CMU in 1975 and 1976 and since then has worked at Bell Laboratories, AT&T Research, Microsoft, and DTS on basic research in audio signal processing, audio perception, perceptual coding of both audio and video, acoustics, acoustic processing, and related subjects.He invented a number of basic techniques used in perceptual audio coding, especially in MP3, and MPEG 2 Advanced Audio Coder (AAC). In addition he has developed loudness (as well as smart intensity) models, room correction algorithms, loudspeakers, acoustic diffusers, array loudspeakers, microphone techniques, and a variety of other things that combine physical acoustics, human perception, and signal processing. These achievements have also influenced international standards for audio transmission, such as the MP3 standard, widely used in computer networks, and they are the foundation of the electronic music distribution business, including AAC players and jukebox systems.
 
I am going to end the first part of this post here (its long enough already) and write up the second part in a few days.
Nice article Jinx. When is the second part due:)?
 
Thanks Santhol. Second part in Dec...when I have some free time to put down thoughts into words. I thought I could get it done this month but work is never-ending. :rolleyes:
 
Thanks a ton for that Ajinkya. I am eagerly waiting for the next instalment. They should have the facility to thank multiple times. That stuff about perception and non-blind tests and the brain actually "hearing" stuff based on non-auditory stimuli....is the kind of stuff I've been thinking about in my head for the longest time, and arguing also, on these forums (in a clunky and non sophisticated/scientific way i concede). Good to hear it from an authoritative source. I can now cite this when someone tells me that "blind tests don't work because in that test in 1987 they said a pioneer AVR was the same as a halcro amp".

I love the stuff about the centre channel as well. Thanks!
 
Kingfisher,
This was held as part of a symposium in a US university. I was attending a (unrelated area) workshop at the same time and got a chance to sit in and listen to his lecture. Unfortunately did not get time to go for this 2 day workshop, where he was going to explain hows and whys of compression, coding and details of the human hearing experience... :sad:
Maybe some other year, I'll get another chance
 
Thanks a ton for that Ajinkya. I am eagerly waiting for the next instalment. They should have the facility to thank multiple times. That stuff about perception and non-blind tests and the brain actually "hearing" stuff based on non-auditory stimuli....is the kind of stuff I've been thinking about in my head for the longest time, and arguing also, on these forums (in a clunky and non sophisticated/scientific way i concede). Good to hear it from an authoritative source. I can now cite this when someone tells me that "blind tests don't work because in that test in 1987 they said a pioneer AVR was the same as a halcro amp".

I love the stuff about the centre channel as well. Thanks!
Psychotropic,
Thanks for the kind words. Yes, I had a suspicion that a lot of our hearing is based on emotions rather than acoustics, as well. This month (late) I will post the second part, where I actually had a talk with him about some issues. I hope my memory retains most of the dialogue...memories are also based very much on emotions, rather than just the act of the brain recording the event :)

Ajinkya.
 
Part II

This is the second and final part of my reflections on the talk given by James Johnston (JJ) late last year. I apologise I could not upload it in December as promised. And sorry for the long post but there was a lot of information to put in. Happy reading!

4. Is 5.1 enough for Home Theatre?

JJ talked about how cinema halls (THX or Dolby certified) create the sound field for movie audiences. Specifically, he explained why 5 channels may not be enough to recreate the complete movie experience.

As I mentioned in Part I, as far back as the 1930s, Fletcher, Snow, and others showed that 3 front channels provide a much better depth illusion than stereo. Why does this effect happen? Because at half the angle (between the centre and each stereo speaker), the time delay is smaller (moving the interference up in frequency) and the higher frequency is more highly attenuated (reducing the size of the timbre change). There are other reasons to have at least 3 front channels. With 3 front channels, and using nearly coincident miking techniques, it is possible to create a good illusion of the front soundfield without having depth or elevation problems. Aside from avoiding the problem of interference, non-coincident 3-channel miking has other good effects, specifically a wider listening area, measured by subjective tests to be 6 to 8 times the listening area available in the 2-channel presentation. This is as far as direct sound is concerned.

What about all the reverberations and reflections going on in a typical movie scene? Again, empirical research shows that two back channels can provide a good sense of reverberation and depth to the rear of the listener. The same two channels can create, if they are at the side of the listener, a good sense of side to side envelopment. This again has to do with the fact that sound is heard differently when it arrives directly from the side, as opposed to from the rear, because of the shape of the outer ear and what sound waves it blocks in each direction. Which means, at a minimum, one needs at least 7 channels/speakers to recreate a reasonable rendition of the movie scene.

Thats not the end of the story. What about the .1 part? Well, it seems that even here, the listener benefits with more speakers. And the reason is the following: Although it is true that one cannot localize low frequency signals (below about 90 Hz), it turns out that our brains can detect spatial effects from inter-aural phase differences down to about 40Hz. For the layman, this means that while one sub will do the job of enhancing the low frequencies in movies, two or more subs *placed correctly* will allow a more uniform spread of the bass, which can be heard and felt by the listener. JJ mentioned a disc, the AT&T Labs Perceptual Soundfield Reconstruction, which contained a very nice example of these effects, and how they can change boomy bass in the 2 channel case into bass spread about a room in the 5-channel case. Sadly, this demo is no longer freely available, but if you have some friends in the recording industry, maybe one of them has it or knows where to get it from. A typical movie theatre also has active equalization over all its speakers so that listeners in different parts of the room hear a balanced volume level during any scene. If this is not done correctly, you will get sucked into the speaker closest to you. This effect simply means that the speaker(s) closest to you will dominate the sound you hear and the spatial experience will be ruined.

To sum it up, a good, certified movie theatre does a lot more than simply place x number of speakers at the correct positions in a large listening area. There is a lot of room treatment, active volume levelling and rules of engineering thumb that go into making your movie experience larger than life. Or in this case, larger than your home theatre experience. And heres what JJ (and by extension DTS) thinks about the future of movie and music reproduction (paraphrased from dim memory by me): Modern recording methods are just now to the point where we can capture a large number of channels in an accurate, clean fashion, and either transmit them or process them for transmission as fewer channels. This should, with associated hardware, processing, and authoring capabilities, give us the ability to provide appropriate direct and indirect signals to the listener, providing a more immersive experience, and one more like (for a live recording) the original venue.

At the end of JJs talk, I actually got a chance to interact with him personally, after most people had left. Instead of writing it as a narrative, I thought Id try to recreate the actual conversation that took place, reproduced as accurately as possible from memory. Hopefully this will give the reader a sense of the ambience and how the information exchange took place. AJ stands for me, Ajinkya, and JJ stands for, well, JJ.

5. The Recording Engineer is also an Artist

AJ: Thanks for a great talk. I really learned a lot today.

JJ: Youre welcome. What did you want to know more then?

AJ: First question and one thats been burning me up for a long time. I keep hearing many of the audio enthusiasts talk about how an ideal audio system should be transparent, how it should reproduce music exactly like the artist intended. But from your talk, it seems to me that the recording process, the sound engineer, the room acoustics, the mic and equipment, all play an equally (if not more) important role in the entire listener chain. So, to put it simply, when I buy a CD, or vinyl, or DVD, who and what am I hearing? Is it really the artist only?

JJ: Good observation! And heres my answer in short. Youre hearing what the recording engineer wants you to hear. End of story.

AJ: (perplexed, saddened, enlightened) But but then, where does the artist and his/her music come in?

JJ: A good artist knows how to create good music. A good and commercially successful artist knows how to create music and also whom to trust to recreate it for the listener. In other words, once the artist creates their piece, they choose a good recording engineer, whose tastes and musical beliefs match their own and trust that he will do a good job of conveying to the listener what they want to convey through their music. All successful bands (I think he named Dire Straits, Led Zep,, a few others I cannot recall) dont interfere in the recording process beyond a point. All the bad bands try their hand at recording as well, and usually they make a royal mess of it. The final recording comes out bright and harsh and noisynot good listening.

AJ: So then the recording artist is the artist for the listener?

JJ: Yeah, in a way, thats exactly right.yeah.

AJ: Thanks a lot. That clears up something that had been bothering me for a long time.

JJ: Heres another thing about the recording process you should think about. Heard of Chesky?

AJ: As in Chesky Records? Sure, I have a few of them, DVD-A and CDs. Always have pristine sound quality.

JJ: Damn right. And you know what makes David and Norman ( JJ knows the Chesky brothers personally) sound so good each time?

AJ: They choose good music and artists?

JJ: Try again.

AJ: Ummm they choose good recording studios and equipment?

JJ: Last try.

AJ (feeling like he was giving a qualifying exam): They choose good recording engineers?

JJ: Partly truebut heres the secret. They choose acoustically excellent recording venues.

AJ: Ahhhh

JJ: Yep. Go through their collections and see where each one was recorded. Its always a place with great sound signaturesa church that is not too reverberant, a concert hall that doesnt have parabolic reflecting mirror walls, all venues have beautiful sound timbre. They know how to pick em. And of course, all the other reasons you mentioned play a large part too. Butlocation, location, location is important ... and not just for real estate!

AJ: Interesting.

JJ: Let me tell you about a band we once recorded for. Jokers wanted to perform in an aircraft hangerlive. Maybe they took some banned substance, I dont know. We told them acoustically, it was a bad idea. But they were like, our fans deserve it. We said, nono one deserves that. Anyway, we went ahead and rigged it up. Omni mics overhead, front stage mics, the whole works. And we captured the whole performance live, accurately. The reverbs, echoes, screaming fans, everything. Sounded ok live. Then we played it back over our studio system. Sounded like my neighbours cat being strangulatedover three dayswhile my other neighbours dog was being beaten to death with a hammerfor a week. God, it sounded awful! All those reverbs became shrill when played back. What could we dothe original venue was acoustically horrible. So we had to heavily equalize it. Killed some surround informationbasically killed off all the ambience. Then it became tolerable to listen to and you could hear the music. Moral of the story is the fans didnt get the pristine, live, accurate music event. They got our interpretation of it. Which is a good thing, otherwise the band would be long deadkilled by a fan who was subjected to the live thing.

AJ: HeheI understand now. Its an intricate chain, isnt it, this whole audio reproduction process.

6. Vinyl and CD, Audio and Bose, God and Satan, Good and Eviland so on

AJ: A final point. Your thoughts on vinyl.

JJ: What about them?

AJ: Do you listen to vinyl?

JJ: I prefer SACD myself (wicked grin). But a lot of my friends like the vinyl sound. See, its not about which medium is better or worse. Both are good at what they do. Its basically about what your listening tastes are. Vinyl is for people who prefer a softer, more organic sound. Is it a more accurate sound? No, nor is CD. As weve just discussed, the whole audio chain loses information so what youre hearing at home is in no way going to be the same as that recorded or performed live. It can come close, but never the same.

AJ: Then why do I hear so many people claiming CDs are metallic, they can never sound natural et all?

JJ: See, that was true in the early days, up to a point, when the CD manufacturing, recording, and playback technology was in its infancy. It is in no way true today. CD (and now SACD and newer formats) have all the fidelity that vinyls do. Its a matter of listener preference which medium they finally choose. I have been a fan of SACDs for a while but thats because I prefer their sound signature and convenience. I also like listening to vinyl, whenever I visit my friends who have records.

JJ gets up and goes to the drawing board. Starts drawing the frequency spectrum of some well-known albums on CD. Most graphs are heavily biased on the lower and higher frequency ends and droop down in the middle, some sharply, some less so.

JJ: Heres another trick which recording companies play to feed into this vinyl vs CD vs SACD perception. Many labels artificially compress and equalize the sound levels on CD under the mistaken notion that consumer listeners are attracted by loud sound and loud means better. This is also enforced by some artists who want their albums to sell. So the CD material is already compromised during the recording itself. See these spectrums? I measured them myself and this is how they look. How is it ever going to sound natural?

AJ: So why doesnt the same thing occur on vinyl?

JJ: Because the medium physically does not allow equalization and compression beyond a point. If the recording engineer tried to do what he does with CD, the recording stylus would jump off the LP or cut grooves a mile deep (JJ likes to emphasize his point ?) you dont have the headroom to mess around with LP like that. So LP ends up sounding natural since in most cases, the recording process itself is true to the original master.

AJ: That makes a lot of sense.

JJ: Another trick that really burns me up. I bought this SACD, see, with both a CD layer and the high-fidelity version. The label was trying to show how much better SACD is. Which is fine, if they had the same recording quality on both layers. Which they did not! The CD layer (again I measured it myself) was horribly compressed compared to the SACD layer. Of course it was going to sound worse! Thats not a fair test. And not what I expect from a label of that repute. My point is, pay attention to the recording before blaming the medium or the playback system. Sometimes, its the biggest culprit of bad sound.

At this point, I remembered the numerous posts on HFV, where someone had mentioned that speaker X made the recording sound shrill compared to the way they remembered hearing it in their childhood. And after a series of investigative posts and back and forth, the original poster tried the same recording on another speaker type, with the same result. It always turned out that the recording (and our nostalgic memory of it) were the culprit, not the speakers.

JJ: Heres another interesting observation about peoples tastes. I have two recording engineer gurus. (Name forgotten) really likes direct, crystal clear soundrecords for bands like (some rock band namesmy memory needs more gigabytes). Hes the type who sits like 6 inches from the monitors while recording. So he prefers Genelac (?) and those types. Clear, sharp, hearing every detail. And thats how the final CD comes out as well. Go pick up a copy and hear what Im saying. This other guy (Name again forgotten) likes the diffuse sound, works with acoustic instruments and voice a lot, hes the fellow who did (some good group). Hes the one youll find sitting all the way back in the concert hall, away from the stage. And thats how his recordings come outsounding like ol whiskey and rum diffuse but melodious. He prefers listening to music through Bose speakers.

AJ: Hahasorry, I thought you just said Bose.

JJ: Yes, the original 901s as a matter of fact. Thats what he likes, thats what his ears tell him. Dont give me that look (catches look of incredulity on AJs face)Not all Bose speakers are bad. Some of their earlier versions were actually quite pleasant. Look, the point is not whether Bose cuts it or notthe point is, different people have varying tastes and theyll get whatever moves them emotionally. Thats what Im trying to illustrate.

I again think back to the numerous debates on HFV with Bose-bashing threads. I actually agree with some posts that the AM series is not a technically good sounding system but then I have never heard a 901. Maybe they are good to hear for some peoples tastes. The point I now understand is, not everyone hears the same music, so unless a speaker is really technically deficient (boom boxes or 1 drivers that claim to go down to 30 Hz), one really cannot pass judgment on what a person should listen to. I mean, this person is a well-respected, successful, financially stable recording guruhe can buy whatever his heart wantsand his heart wants Bose 901s. So be it.

AJ: James, I see other people waiting to talk to you. So Ill take your leave. Thanks again for all your time.

JJ: Youre welcome.

AJ exits the room, his heart full of happiness and the light of life, his brain full of exciting new ways hes now going to listen to and evaluate his music collectionthe sun is shining, birds are chirping
Well, thats the way I wish it would have ended, anyway. What actually takes place is, AJ exits the room, bumps into his professor, who spends the next half hour going over AJs work, and telling him why he doesnt stand a chance of graduating till the next presidential election
 
Last edited:
Kingfisher,
This was held as part of a symposium in a US university. I was attending a (unrelated area) workshop at the same time and got a chance to sit in and listen to his lecture. Unfortunately did not get time to go for this 2 day workshop, where he was going to explain hows and whys of compression, coding and details of the human hearing experience... :sad:
Maybe some other year, I'll get another chance

Thank you Very much for ur reply.:)
 
Thanks a Lot Ajinkya for taking the pains to put it in such a readable format.
It was really very informative and so well put.

BTW Am sure your Professor was just joking
 
Get the Award Winning Diamond 12.3 Floorstanding Speakers on Special Offer
Back
Top