Binaural recording is the technology behind this.
What Wiki says:
Binaural recording is a method of recording sound that uses two microphones, arranged with the intent to create a 3-D stereo sound sensation for the listener of actually being in the room with the performers or instruments. This effect is often created using a technique known as "dummy head recording", wherein a mannequin head is outfitted with a microphone in each ear. Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers.
Why not with Stereo Speakers? - because the technology behind this is based on Dummy head.
Dummy head recording The dummy head or Head and Torso Simulator (HATS) are based upon the average dimensions of a human head and torso. They consist of acoustic materials fitted with ear and mouth simulators[5] as well as two microphones inserted within each ear canal, typically at the ear drum.