Spatial Audio for VR with a Ricoh Theta S Camera and Zoom H2n Audio Recorder

I had a try adding spa­tial audio to a VR video. In the­o­ry this should add real­ism to a 360 VR video by adding audio that can be processed to play back dif­fer­ent­ly depend­ing on the direc­tion of the viewer.

I updat­ed the Zoom H2n to firmware ver­sion 2.00 as described here https://www.zoom.co.jp/H2n-update-v2, and set it to record to uncom­pressed WAV at 48KHz, 16-bit.

I attached the audio recorder to my Ricoh Theta S cam­era. I ori­en­tat­ed the cam­era so that the record but­ton was fac­ing toward me, and the Zoom H2n’s LCD dis­play was fac­ing away from me. I pressed record on the sound recorder and then the video cam­era. I then need­ed a sound and visu­al indi­ca­tor to be able to syn­chro­nize the two togeth­er in post pro­duc­tion, and click­ing my fin­gers worked perfectly.

I installed the http://www.matthiaskronlachner.com/?p=2015. I cre­at­ed a new project in Adobe Pre­miere, and a new sequence with Audio Mas­ter set to Mul­ti­chan­nel, and 4 adap­tive chan­nels. Next I import­ed the audio and video tracks, and cut them to syn­chro­nize to when I clicked my fin­gers together.

Export­ing was slight­ly more involved. I export­ed two files, one for video and one for audio.

For the video export, I used the fol­low­ing settings:

  • For­mat: H264
  • Width: 2048 Height: 1024
  • Frame Rate: 30
  • Field Order: Progressive
  • Aspect; Square Pix­els (1.0)
  • Pro­file: Main
  • Bitrate: CBR 40Mbps
  • Audio track disabled

For the audio export, I used the fol­low­ing settings:

  • For­mat: Wave­form Audio
  • Audio codec: Uncompressed
  • Sam­ple rate: 48000 Hz
  • Chan­nels: 4 channel
  • Sam­ple Size: 16 bit

I then used FFm­peg to com­bine the two files with the fol­low­ing command:

ffmpeg -i ambisonic_video.mp4 -i ambisonic_audio.wav -channel_layout 4.0 -c:v copy -c:a copy final_video.mov

And final­ly inject­ed 360 meta­da­ta using the 360 Video Meta­da­ta app, mak­ing sure to tick both ‘My video is spher­i­cal (360)’ and ‘My video has spa­tial audio (ambiX ACN/SN3D format).

And final­ly uploaded it to YouTube. It took an extra five hours of wait­ing for the spa­tial audio track to be processed by YouTube. Both the web play­er and native Android and iOS apps appear to sup­port spa­tial audio.

If you have your sound recorder ori­en­tat­ed incor­rect­ly, you can cor­rect it using the plu­g­ins. In my case, I used the Z‑axis rota­tion to effec­tive­ly turn the recorder around.

There are a lot of fas­ci­nat­ing opti­miza­tions and expla­na­tions of ambison­ic and spa­tial audio pro­cess­ing avail­able to read at Wikipedia:

The orig­i­nal in-cam­era audio (Ricoh Theta S records in mono) to com­pare can be viewed here: