Spatial Audio, Immersive Audio, 3D AudioSpatial Audio,Apple has joined and it is on track to becoming mainstream.
The latest iOS update (October 2020) enabled users with iPhone and Airpods Pro to experience a new feature called Spatial Audio. Once you plug Airpods Pro in your ears and watch the video on your iPhone screen, you will have an impressive experience where you can’t tell whether the sound is coming from the iPhone or Airpods. Like many of Apple’s new releases, Spatial Audio is not something new. However, I would say it will be the first year of getting hyped because Apple has entered the market.
So, what is Spatial Audio?
Spatial Audio Coding (SAC)
Let’s backward to 15 years ago. Back in 2005, at the MPEG Standardization meeting, where worldwide audio experts gather to compete, the standard for Spatial Audio Coding (SAC) was on the way to set. The project was meant to create a standard that can effectively code and transmit to welcome the era of Spatial Audio. MPEG is the organization that standardizes the most popular audio codecs such as MP3 and AAC (Advanced Audio Coding) and the video codecs like H.264 and HEVC. However, back then, the standard “Spatial Audio Coding” was renamed to MPEG “Surround” (ISO/IEC 23003-1) at least in part due to a sense of unfamiliarity. (🤔Question 1: Here, does Spatial Audio and Surround has the same meaning?)
Spatial Audio Object Coding (SAOC)
In 2007, the same experts of MPEG designated another standard, “Spatial Audio Object Coding” (SAOC, ISO/IEC 23003-2), for better expression of audio objects (in audio terms, object means each instrument of music performance for example) by extending the coding principle of Spatial Audio Coding (SAC). Both “Spatial Audio” standards haven’t had a piece of luck in the market and might be hibernated on someone’s hard drive.
Immersive Audio
Meanwhile, the industry experts came up with the new name, “Immersive Audio” instead of “Spatial Audio,” which has faded in the market. The Immersive Audio seems to have been used to start by explaining a format with enhanced sound directions by adding ceiling speakers. And It is also being used as an audio term for VR, AR, and XR, representing Immersive Media that has rapidly emerged in the market.(🤔Question2: Does Spatial Audio have the same meaning as Immersive Audio?)
3D (Three Dimensional) Audio
3D Audio means the three-dimensional spatial sound that adds space, clarity, and depth. One-dimension (1D) is a line, such as a stereo speaker placement, that separates left and right. Two-dimensions (2D) can be defined as a horizontal plane having an additional axis to define front and back and also can be represented through 5.1 channel speakers. And, the “Surround” has been used to indicate this 2D space. Adding height(up and down) on top of the 2D eventually leads to three-dimensions. The 5.1.2 channel and 7.1.4 channel, which you see as multichannel format these days, are three-dimensional examples. Someone can argue that a stereophonic setup can already represent 3D space. Yes, it is true but here we’d like to refer to a strict definition of the dimension with the physical speaker configurations.
MPEG-H 3D Audio (ISO/IEC 23008-3)
As time flew, the audio experts at MPEG came up with a new standard, MPEG-H 3D Audio (ISO/IEC 23008-3), in 2014. The standard contains and delivers the audio signal that spans 3D space with one of the channels, object, audio scene data (called Ambisonics), and their combinations in one. 3D is such a traditional term representing this standard’s identity– nothing new and fresh. Because the term 3D audio is already being used in the 1960s… Just expected, it’s being called “MPEG-H Audio” these days.
MPEG-H Audio was originally standardized to meet the audio requirements for the UHDTV applications. It’s adopted as a next generation audio format for UHDTV in many countries including the US, EU, Japan, and Korea. It is also used as a delivery format for Immersive Audio in Tidal, Amazon Echo, and more. There’s a Dolby AC4 (compression method) + ATMOS (signal format) combination as an alternative to the market. While the MPEG-H is an International Standard that encompasses format and compression
Let’s clarify Spatial Audio, Immersive Audio, and 3D Audio.
To begin with the conclusion, I would say these are pretty much possessing the same meaning.
3D audio, as mentioned earlier, is the audio that represents three-dimensional space, however, the term was being used too early like ‘3D Surround’ as marketing, even before the true meaning of three-dimensional has been created by frontiers in the audio industry. As result, people have no impression of the term “3D”. Therefore, the term “Spatial” and “Immersive” came up as new terms.
Spatial Audio is already synonymous with 3D Audio as 3D means a ‘space’. However, it doesn’t give a freshness since the term Spatial was exhausted with Spatial Audio Coding in 2005 when expressing 5.1 channels or Surrounds — which are two dimensions.
Immersive is a term that expresses reality from the perspective of the person listening to the sound, instead of the technical definition of space and dimension. Because the noun Immersion means a state in which it is difficult to distinguish between reality and virtual boundaries, the Immersive Audio is such a realistic audio that gives a natural listening experience even it’s virtual, and it’s also the audio to realize “Being There,” the expression that describes virtual reality. To fulfill the Immersive of “Being There,” it’s based on 3D or Spatial Audio technically.
But, there is one more thing to consider. With the advent of VR, the listener ‘myself’ can move in three-dimensional space in the virtual world. The listener’s point of view or head-orientation changes to a three-axis (3DoF; Degree-of-Freedom) called Yaw-Pitch-Roll, and the listener’s position can further move to a three-axis of X-Y-Z. They constitute a six degree of freedom in total and the immersive audio must be provided that is not distinct from reality wherever the listener goes and sees in the space. Therefore, Immersive Audio has been accepted as the same meaning as 6DoF audio. This means it’s just another 3D or Spatial Audio in a 6DoF environment.
VR Audio, 360 Audio
So, VR Audio means audio for VR and is an Immersive Audio that guarantees 6DoF freedom. In the same manner, the 360 Audio is one of the Immersive Audio that responds to 360-degree Video – a video format with a 3DoF in response to movement of the head, referring to video taken using 360-degree cams as a subcategory of VR.
MPEG-I Immersive Audio
In 2014, the audio experts who had already completed the MPEG-H Standard soon kicked off standardizing Immersive Audio under the project name MPEG-I. It aims at realizing 6DoF audio for the era of VR, AR, and XR; however, the era has not come yet, the standard body is still wandering in the exploration stage even today.
Following the MPEG history, you see they’ve created standards in the order of Spatial Audio (2005) ➡️ 3D Audio (2014) ➡️ Immersive Audio (2022?). At this point, you’re probably wondering if these audio technology standards are all same with different names. Well, they do sound very alike but the underlying technology is different. As I summarized each of the above, the order and concept are pretty much mixed up which is likely to cause confusion in the market.
Back to Apple’s Spatial Audio
If Apple would have told that they added the “Immersive Audio” feature to their updated iOS instead of “Spatial Audio,” it would have been easier to organize the term in the industry. Either way, it seems like Spatial Audio sounds much cooler than other terms nowadays because of Apple.
Apple explained in WWDC when Spatial Audio receives an audio format consisting of 5.1 or more channels of audio signals or object signals, it will get users to experience the cinematic sound through the AirPods Pro. (Update 2020-12-17: Apple further released AirPods Max which is a headphone version of AirPods and without a doubt, features the Spatial Audio again. Already sold out so you can’t get it by March 2021 unfortunately) But this is only the beginning, I have a feeling the use of Spatial Audio within Apple’s ecosystem will be expanded way further in the future.
I assume, by early 2021, the Spatial Audio feature will be found everywhere like Android-based leaders such as Samsung, Oppo, Vivo, Xiaomi, Huawei, and more.
2020.12.18