Binaural Hearing and Head-Related Transfer Function
Mon, 17 Jul 2017 10:22:06
Binaural literally means "having or relating to two ears." Binaural hearing lets humans (and other animals) determine the direction and origin of sounds. This impressive feat is accomplished by understanding signal cues that each ear hears, called binaural signals. There are two different kinds of cues involved in binaural signal processing. Let's explore each in a little more detail below.
Binaural Cues
A binaural cue helps us understand the difference between signals that each ear receives. The two main binaural cues are composed of the level and time difference between two signals’ different frequencies. These values are Interaural Level Difference (ILD) and Interaural Time Difference (ITD).  Binaural cues are also used to recognize the direction of sound (azimuth) on a horizontal plane. So let’s take a look at all of those concepts in practice. If there’s a bell on the right side of your head, then your right ear receives its sound directly, but your left ear receives it after a specific time delay. This discrepancy is caused by the distance between both ears. Additionally, your left ear receives the sound with more head shadowing because the signal has been diffracted and reflected from your head, torso, pinna, etc. Essentially, human ears recognize the direction of sound on a horizontal plane using time difference (ITD) and level difference (ILD).
binaural listening through headphones
Binaural, having two ears
Monaural Cues
Binaural cues actually don’t give us the complete picture when it comes to sound localization because a human ear is parallel to the horizontal plane. So in addition to binaural cues, humans use monaural cues to further determine the location and origin of a sound in space. More specifically, a monaural cue is used to recognize the elevation of sound, as frequency characteristics of an input signal vary depending on the height angle. For example, a sound that occurs at the same height as your ear and another sound that occurs higher than your ear have different resonance frequencies with different reflections at the pinna, resulting in different peaks or notches on the spectrum.  
Monaural Cue listening
Monaural, using pinna for elevation    Image credit: UC Davis
Replicating binaural hearing using Head-Related Transfer Function
These two critical cues (and therefore binaural hearing) can be replicated using something called Head-Related Transfer Function (HRTF). HRTF datasets provide information about how sound sources will sound (or how audio signals will form) for each ear from a certain distance and direction. Imagine a sound coming to a listener in 3D space — the sound that arrives at his left ear and right ear cannot be same. This is thanks to the time gap for each respective ear (the interaural time difference), the loudness discrepancy of the sound between each ear (the interaural level difference), the shape of each external ear (pinna), and so on. All of these different signals are collected into HRTF. HRTF functions with variables that change depending on location. When a raw mono signal goes through a given HRTF filter, it gets positional data and sounds like it’s coming from that specific place. HRTF is measured/recorded in an anechoic room so that there’s no reverberation. However, in real life, people are definitely not used to being in an anechoic room, so the HRTF dataset could sound awkward. To mitigate this awkwardness, a technique called Binaural Room Transfer Function (BRTF) can be used. It’s not perfect, however, and since BRTF includes a reverberation in a specific room, it can again sound awkward when used in a room with different characteristics.
Listening through headphones over loudspeakers
Regarding actual transducers for VR, loudspeakers are not a desirable option when compared to headphones. When you listen through loudspeakers, you are still stuck “here” in the room because you can hear room artifacts. This severely hinders the immersive potential of every VR experience. Headphones, on the other hand, completely block sound from the physical world, preventing you from hearing local room artifacts and allowing you to completely “be there” in the virtual world. Additionally, a virtual speaker configuration is limited by physical room size with loudspeakers while using headphones allows for unlimited virtual space. Perhaps most importantly, headphones allow for the interactivity that is a core part of the VR experience. It’s critical that users are not just sitting and staring forward. They need to be able to walk around in the scene and static loudspeakers make it exceptionally hard to reflect their movement. loudspeaker vs. headphones