The Future of VR Audio – 3 Trends to Track This Year

2018.07.05ㆍ by Gaudio Lab

The Future of VR Audio – 3 Trends to Track This Year


2017 has pushed the VR industry forward in countless ways, including the recognition of audio as an absolutely critical element in VR experiences. Here are a few trending efforts that creators are using to push the envelope with sound.


The industry will embrace object-based audio for every kind of experience.

Utilizing object-based audio also gives more creative freedom to content creators since it’s easier to manipulate post-production effects on a single sound — think of it as a single raw element as opposed to a big, messy sound glob. In addition, object-based audio works perfectly for 6DOF (six-degrees-of-freedom) VR content, which is rapidly growing in popularity.


6DOF content is just like a game — the character moves around within the space in every direction and has the agency to interact with objects in the environment. When the character does either of these things, the sound needs to change accordingly. Because it is better at pinpointing sound and easily reflecting the changes during gameplay, object-based audio has actually already been used in 3D game engines for quite some time. As more 6DOF content is being built on game engines, it’s plausible that more audio engineers will be forced to learn how to mix and master sound in game engines rather than their traditional Digital Audio Workstations.


Quality VR content will be published with more players embracing spatial audio.

These limits on publishing platforms have discouraged content creators from fully embracing spatial audio in their productions. Still, as sound is met with increased appreciation, renderers or players will eventually have to support spatial audio. When Vimeo launched Vimeo 360 in March to support 360 content, a huge amount of the requests from users involved a desire for a spatial audio feature and the official help page states that they are planning to support spatial audio in the near future. Smaller players and platforms will follow the path laid out by Facebook, YouTube, and Vimeo. As user standards for VR content quality continue to rise, adoption for spatial audio will race to keep pace.


Out of the many reasons that have kept content publishing platforms from adopting spatial audio, the primary one has been the absence of a dedicated VR audio format along with a compatible renderer. With more emphasis being put on object-based audio, Ambisonics alone will be phased out as the standard format of the future.


Creators will push beyond post-production to create new listening experiences.

Sound is not just a storytelling cue that can be used to encourage VR users to look in a certain direction. In some new use cases, you are actually able to hear certain sounds over others within the same experience if you want. The following recorded 360 video, for example, lets users hear what they are looking at more clearly than the other instruments placed all around them.

This new wave of sound won’t just be part of an evolution of current techniques. In many cases, it will give way to revolutionary new forms of entertainment. The virtual canvas for artists is expanded 360 degrees horizontally and 360 degrees vertically beyond the physical dimensions of a stage in real life. Musicians will now be able to play with “virtual location,” along with their traditional considerations of pitch, loudness, and timing. They’re also learning how to exploit human auditory perception to influence these experiences at an even deeper level. Some psychoacoustic principles that you have already experienced in real life can be taken advantage of in VR to make each experience different at the individual level. While there is a whole lot to consider in that realm, our collective knowledge about it continues to grow.

Gaudio Lab in the media
Futuresound: Will Accuracy Overcome Familiarity For VR Audio?

Futuresound: Will Accuracy Overcome Familiarity For VR Audio?   John Dewey, who is known for exploring the philosophy of pragmatism, proposed that experience is based on two principles. The first concept of continuity explains that all experiences affect future experiences, for better or for worse. Experiences, therefore, are not independent of each other nor are they one-time events — each one has a relationship to something that happened in the past or will affect something in the future. For example, the fact that you are finally watching a VR piece is related to the fact that you recently got a membership for a VR content subscription service like Wevr’s Transport VR. Or maybe your enjoyable experience of The Mummy VR Zero Gravity Stunt Experience inspired you to go watch The Mummy movie in theaters.   Within this continuity framework, the VR listening experience has the potential to be very confusing. The way you hear sounds in VR might be totally different from the way you have been listening in the real world or while consuming more traditional content. At a concert or a conference, you see where the sound is originally generated from — the musician or the lecturer. However, unless you’re a VIP every time, you are actually hearing the sound from loudspeaker locations and not from the actual sound sources. This discrepancy also occurs when you watch something at the theater. The screen, filled with things that are responsible for the sound, is placed at the front while the sound is actually projected from speakers in various locations. Fundamentally, however, people are familiar with this listening experience — visuals are in front of them and sounds don’t necessarily match what they’re seeing.   Discrepancy in sound perception   We build new knowledge through experiences and all of our current knowledge is based on the experiences that came before. Even our imagination, a realm that is not physically accessible in the real world, is influenced by previous experiences. The more experiences we have, the more creative we can try to be.   Consuming different styles and concepts of VR content will be the key to building our experience base moving forward. We might learn that positioning ambient sound to the sides is more comfortable than spreading it throughout the entire scene. We might learn that volume manipulation could be effectively used to help people adjust more easily to this relatively strange listening experience. Or the decision might still fall down to familiar but false or strange but true, and even then, only after thousands and thousands of experiences have been accumulated.

What’s Next For VR Audio

What’s Next For VR Audio   Where VR Audio is At Before the age of VR, the 2D video story was not influenced by end-user’s interaction. The spatial resolution for audio improved just by adding more speakers around the end-user’s frontal rectangular screen. The biggest hurdle for immersion was instead ‘present room effect.’ One could never fully be there in the story because the virtual world was limited by the screen size. This is now a different story because the presence of the real world is blocked by wearing HMD and headphones. VR certainly helps the content consumer be completely transported to a different world, but again it delivers different levels of immersiveness depending on type of content.   In 360 video or 3DOF type content, the world is already pre-rendered. The three dimensional space is projected in a spherical world that you can look from different directions upon free will, but cannot walk around. Your position remains fixed in one spot. This is why an Ambisonics audio signal, a way of recording and reproducing 3D sound as a snapshot, became such a popular audio format for 360 videos. Just like the 360 video, this spherical audio format can be easily rotated to reflect head orientation yaw, pitch, and roll. However, Ambisonics is limited to 360 type content only, where the end-user is fixed at one position. Increasing the order of Ambisonics does not support greater interactivity or 6DOF, but merely increases the spatial resolution. Think of it as how increasing the pixel resolution doesn’t transform 360 video into walkable video.   Meanwhile, full VR or 6DOF content is rendered in real time while the user interacts and moves around in the scene. This requires the objects in the scene to be controlled individually, rather than as a chunk of pre-configured video and audio. When each sound source is delivered to the playback side as an individual object signal, it can truly reflect both the environment and the way the user is interacting within the environment. This full control capability of object-based audio may be used in 2D or 360 video, but it’s potential is best realized in full VR.   VR Audio Moving Forward While more and more VR content is being made in the full VR format, the audio industry is barely catching up with Ambisonics signals for 360 videos. Second order Ambisonics already requires a minimum of 9 channels, and higher order Ambisonics are not feasible in many cases because the network bandwidth is limited in mobile, not to mention the restrained processing power allocated for audio.   Some might argue personalized audio is the most important challenge going forward. Until capturing the exact anthropometric information requires quite a bit less resources than now, customization for each person’s ear shape and head size will remain as the last step to perfection. Luckily, four out of five people can already feel immersed in the VR scene with general binaural rendering process. What needs to be figured out in the foreseeable future is how to deliver interactive 3D audio without compromising the content quality, from creators to consumers and across multiple platforms. Once best practices are determined and a recommended workflow is set, standardizing those practices should follow to improve interoperability.