Immersive Audio Representation
Tue, 25 Jul 2017 10:47:05
It’s not uncommon to hear someone working in VR audio complain, “it’s so frustrating how there’s no set format for VR audio and each platform has totally different specs.” Moving forward, it’s important to make sure that content creators’ projects will be interoperable and able to maximize their full potential across multiple platforms. The same need exists to simplify interoperability between different stages of the audio production and delivery chain as well as playback for future audio representation systems.
International Telecommunication Union (ITU)
The ITU-R BS.2266-2 report aims to segregate each element into channel-based, scene-based, and object-based representations and then propose a single framework that can incorporate all those elements. As a result, the framework should be able to act as a base to operate with different possible audio representations.
future audio representation systems
Framework of future audio representation systems
The key part of this framework is identifying the position of a loudspeaker in a superset of loudspeaker layouts. Studies have determined the speaker layouts that meet specific sound quality requirements like localization of phantom sound images in all directions, sensation of a three-dimensional spatial impression, and directional stability of the frontal sound image over the entire image area. To fulfill these sound quality requirements, the following criteria are taken into effect:
  • Elevation perception of phantom sound images in the frontal hemisphere, which is desired for UHDTV applications;
  • Sensation of “listener’s envelopment (LEV)”, which is one of the primary features of a three-dimensional spatial impression. (LEV here refers to feeling of being surrounded by the sound source);
  • Localization and localization uncertainty of phantom sound images in the elevation direction generated by two speakers located above the listeners;
  • Influence of listening position on directional perception of frontal sound images.
Defining the exchange file format
Another important part of this framework is defining the exchange file format. For exchanging audio content, there should be a standard regarding how to store a sound scene so that all necessary information is delivered, rendered, coded and/or stored. This format should consider different fixed speaker layouts, as well as the carriage of channel-based, object-based and scene-based content or a mixture of these. The format should also be usable for all targeted playback systems. The format should meet these generic but important requirements:
  • The format should enable the delivery of an audio experience beyond the state-of-the-art;
  • In should deliver the artistic intent as faithfully as possible and support and promote new creative and technological opportunities;
  • The format shall provide a reasonable path for upgrading;
  • It should be able to carry existing channel-based formats as well as future channel-based, object-based and scene-based content;
  • The specification should be open and royalty-free;
  • It should minimize the changes needed on the production and delivery process while transporting the information needed for advanced rendering;
  • It should allow for advances in all related areas while ensuring compatibility;
  • It should enable content, equipment and software to remain future-proof;
  • It should be based on international standards bodies and industry standards.