Pure Stereo vs. Ambiophonics
Quoted from the 3D3A lab website at Princeton University-with comments by Ralph Glasgal
Ambiophonics is a method which relies on using [non-optimized] crosstalk cancellation filters on two front and two back loudspeakers to produce a wide stereo image, and a number of additional surrounding loudspeakers to give a sense of hall ambience by emitting the sound that results from convolving the stereo signal with the impulse response of a concert hall [most often a different hall than that in which the recording was originally made].
Not quite right as regards Ambiophonics and a bit biased. Two speakers are used for the front 180 degree arc of the stage and indeed two rear speakers are optionally used to create the rear 180 degree arc for a rear stage if it has been recorded (as in some 5.1 movies). In the case of music, if the recording engineer has recorded the hall ambience as part of the 5.1 rear channels, then these two rear speakers can easily reproduce the rear hall ambience with full half circle directional fidelity. Where one has only two channel media such as CDs to play back, then one should use a concert hall impulse response and convolve the front channels to generate a similar and quite realistic concert-hall ambient field using inexpensive surround speakers. The fact that one does not always have the impulse response of the hall in which the CD was made is not as catastrophic as the comments above would have you believe. Many CDs are made in studios and really benefit from being played in a full Ambiophonic domestic concert hall using non-critical PC convolvers and placing surround speakers where convenient. The assumption that the original hall, necessarily recorded live, will always sound better than the same or a different convolved hall using a 3D impulse response is a subjective opinion. Also the Princeton method requires that such a live recording be made with a special two channel microphone and so cannot be applied to the vast existing library of non-Princeton made 2.0/5.1 media.
RACE is indeed optimized in the practical sense that it is non-critical in adjustments, works in virtually any room with any speakers, produces a full circle of direct sound from ordinary 4.0 (5.1) DVDs or CDs with enhanced depth, clarity, and resolution and accommodates several listeners; but indeed the tradeoff for these benefits is that it does not achieve levels of ILD over about 10dB. This should be kept in perspective since in concert halls, or movie theaters, or recordings of halls and theaters it is rare that this level is exceeded. RACE does deliver the full 700 microsecond range of ITD possible, and the optional rear speakers provide the proper pinna directionality cues that frontal only speakers necessarily lack despite the use of HRTF functions to ameliorate this problem.
Ambiophonics uses non-optimized XTC filters [called RACE filters] that are designed in the time domain and that do not attempt to maximize XTC level while minimizing spectral coloration.
Yes, RACE is not able to achieve ILD levels up to 20dB as can BACCH. RACE cannot reproduce a bee buzzing at your ear or even moving behind your ear using two speakers. This is a major achievement of the Princeton method and was thought to be impossible until they did it. But the issue of spectral coloration is much overblown. Consider for a left input only, the signal at the left speaker consists of the unchanged left signal intact, followed by a series of delayed and declining replicas of the same left signal. The right speaker is similarly outputting a series of attenuated and decaying left signal samples. Yes, if you look at these with an averaging meter or RTA neither signal seems to be flat particularly at frequencies over 3000 Hz.
An ear, not in the prime listening area, will hear a normal flat left speaker signal followed by a bunch of early reflections even including dying reflections from earlier samples. While more research needs to be done on how humans react to such trains of reflections, they do resemble what one is exposed to in a concert hall from the seats and heads nearby. A sort of early decaying diffuse field. Of course at the home Ambio listening position this train is cancelled by the cancellation train coming from the right speaker. Thus the level and effect of these reflections is greatly reduced if the system is configured and operating properly. The use of line source speakers such as electrostatics or ribbons also reduce any possibility of such effects being audible.
The argument that the later room reflections from sidewalls will produce timbral distortion is possible but unlikely since both speakers sound will mix in the room pretty thoroughly and make such a low level delayed field hard to hear as a frequency response distortion. In any case, the comparison should be made with stereo where comb filtering peaks and dips are rampant throughout the room and few people notice. Again, the use of dipole speakers such as electrostatics minimize the influence of the room and in Ambio since the speakers are close together, it is possible to listen in the near field where the ratio of direct sound to room reflections is higher.
Obviously, it is harder to get perfect cancellation at higher frequencies where centimeters can matter. But the ears are less sensitive to crosstalk at frequencies over say 4000 Hz, so one need not apply RACE in this region and thus the coloration issue becomes almost moot. What actually happens is that one now has ordinary stereo at the higher frequencies again with its comb filtering in this region. But since the speakers are much closer together the onset of this combing is about two octaves higher and does not affect the pinna as severely as in ordinary 60 degree stereo or 5.1.
Finally, what is wrong with processing in the time domain?
Since a low level of XTC often results in an exaggerated wide soundstage [especially for solo instrument or small ensembles] and a reduced 3D sense of depth and proximity RACE filters cannot reproduce proximity effects and the same 3D sense of depth as BACCH filters.
The practical implementation of the BACCH filters is quite different from the filters actually being marketed in the field for practical installations, especially where the filters are used with speakers at 60 or the filters are not made by doing actual binaural microphone measurements at the listening position using the listeners head. Thus, in general, the extreme results claimed for BACCH are not achieved and indeed are comparable to RACE as described below.
RACE is capable of delivering up to about 10dB of interaural level difference and essentially the full range of interaural time differences plus proper pinna cues for central sounds. This corresponds to better localization cue delivery than is found on most recordings of music, movies or games and is even comparable to what one hears from an average seat in a concert hall and much better than what one hears in a movie theater. Since RACE easily delivers whatever ILD has been captured on the recording (up to its 10dB max) one simply hears the stage width that has been recorded as a result of the microphone placement, spot mic mixing, or panning algorithm. Obviously, if you put the a stereo microphone one foot in front of a string quartet then you will likely record an ILD of 20 dB for some of the notes. Then BACCH will actually produce a wider image than either RACE or stereo. Basically both RACE and BACCH try to faithfully reproduce what you would have heard had you been at the microphone position which can mean a very wide piano image for example. Larger than life pianos abound as well in stereo where one often hears pianos that stretch almost sixty degrees in front of you. In Ambio you can reduce this excess width effect by simply reducing the ILD with the RACE control provided or bypassing RACE altogether when playing such unusually close micd recordings. Indeed many solo vocal or instrumental recordings are better played back in plain mono using just one loudspeaker so as to avoid the comb filtering of the stereo arrangement. With RACE, one can also move back from the speakers to reduce the stage width a bit or move forward to hear plain stereo with its normal 60 degree stage.
For extreme proximity effects such as where somebody is whispering in your ear, one must be able to have recordings and a reproducing system that can deliver ILDs well in excess of 10dB. So it is true that BACCH does easily outperform RACE in this regard. But nevertheless, for normal orchestral recordings operas, movies, etc., there remains a greatly enhanced sense of depth (distance) using RACE compared to normal stereo or 5.1. Also, full Ambiophonics includes surround speakers for providing large space ambience which further enhances the 3D depth experience.
They [RACE filters] are better suited for the playback of recordings of large ensembles, where the perception of a large soundstage is not jarring
RACE normally reproduces the perspective that the main recording microphone sees. You essentially hear the stage width that you would have heard had you been at the microphone position. This is not necessarily a bad thing. The detail one can hear is exciting and the added clarity and depth can be exhilarating. In practice one seems to be sitting third row center or sometimes above the conductor. For movies this is not an issue. Some string quartet recordings do produce a distance between the first violinist and the cello of some ten meters, but you can then hear each instrument as clearly as you wish via the cocktail party effect. Since BACCH does an even better job of extracting ILD from such a recording, the quartet may span 20 meters and some instruments will seem very close to you.
Furthermore, due to the significantly higher XTC levels achieved by the optimized BACCH filters, Pure Stereo does not require additional loudspeakers and extraneous convolutions with hall impulse responses to give a realistic sense of hall reverb and ambiance, which if it exists in the recording, would under high enough XTC levels, envelop the listener as realistically as it did the main stereo microphones during the recording.
Ah! But most stereo recordings do not contain proper ambient signal energy. Also using BACCH or RACE alone to reproduce standard existing recordings does not solve the problem for a human ear that expects ambience to arrive at the pinna from the sides, the rear, and overhead. Yes, if you make a Pure Stereo Recording, using an advanced dummy head plus HRTFs, then two-speaker BACCH can do a great job of producing a near spherical ambient sound field.
In Ambiophonics this basic psychoacoustic problem is solved for existing DVD/BD discs and movies, by using two rear speakers and two free RACE programs simultaneously. If the recording is indeed 4.0 in nature with a direct sound stage at the rear, then one has a full 360 degree field without difficulty, high cost, or complexity and one saves the expense of the front and rear center speakers. For music recordings the rear channels are often just ambience and so the hall sound and applause, coming from the rear, sounds quite natural. Presumably, a version of BACCH will appear that can reproduce standard 4.0 media using just two front speakers.
For playing CDs and LPs, since there is no rear surround signal pair, one can use hall impulse response convolution to generate the hall ambience missing on virtually all stereo recordings. The reason similar hall ambience is seldom recorded is that if the microphone were moved too far back from the stage it would pick up more reverb than direct sound. When such a mixed (wet) signal is played back through frontal speakers, the result sounds like a recording made in a sewer. Thus even if both RACE and BACCH were perfect, the ambient field recovered from standard 2.0 discs and played back via just two speakers is unlikely to be fully satisfactory even if better than plain stereo.
Finally, normal hall reverberation does not have large values of ILD and ITD. Ceiling reverb and rear reverb are essentially mono and even side reflections have a fair amount of correlation. So ambience is not really a stereo-like signal. The sense of ambience depends on pinna pattern sensing, left to right and front to rear energy differences, and, of course, timing and response differences between the direct sound and the reverberant field. Once all these signals are mushed together in the usual 2.0/5.1 recording microphone console it is difficult to see how just reproducing such signals with enhanced ILD via two frontal speakers will generate a concert-hall sound field. The 10dB level of RACE certainly doesnt do much in this regard although applause (and ambience) is nicely spaced out over 180 degrees even if still frontal. Turning on the convolved surround speakers swamps the false frontal ambience (but not the applause) and is an obvious improvement if one really wants a domestic concert hall as opposed to a video/movie/game home theater.
Ralph Glasgal
January, 2011