Zach Ocean
@
zachwe
TAGS
AUDIO
CODE
October 2, 2020

A fun audio streaming bugfix

A short tale of fun mystery-solving

Today, a short tale of fun mystery-solving. I've been working on a project that involves a server dynamically generating audio files and streaming them to a client via a WebRTC session.

The dynamic audio generation process works like this: first run a program that generates a wav file. Then compress the .wav to an .ogg which contains an Opus audio stream and deliver it over the network. The WebRTC portion is handled by the awesome Pion library, a pure Golang implementation which makes customizing WebRTC (for example by streaming a dynamically generated audio file) super easy.

So here was the bug: certain audio files were delivered successfully over the wire and played seamlessly in the browser. Other files weren't. I knew there wasn't anything fundamentally wrong with the files that wouldn't transmit because I could listen to them with any old audio player (Google Chrome, for example). All of the files were ogg + Opus sampled at 48 kHz.

Since I could open the files locally, I figured there was probably an issue with the network. Chrome has a handy tool at chrome://webrtc-internals for inspecting WebRTC sessions. Sure enough, this tool revealed that the client never actually received any bytes of the problematic ogg files. But why?

Ogg is a container format which can hold multiple logical data streams, each with their own respective encoding. In this case, the ogg files held a single logical Opus stream. Mozilla maintains a useful tool called opusinfo in the opus-tools package that inspects Opus streams. Here's what the opusinfo output looks like for one of the files which transmitted successfully:

$ opusinfo good_sound.ogg
Processing file "good_sound.ogg"...

New logical stream (#1, serial: 1fe69032): type opus
Encoded with Lavf58.45.100
User comments section follows...
        encoder=Lavc58.91.100 libopus
Opus stream 1:
        Pre-skip: 312
        Playback gain: 0 dB
        Channels: 2
        Original sample rate: 48000Hz
        Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)
        Page duration:     20.0ms (max),   20.0ms (avg),   20.0ms (min)
        Total data length: 20863 bytes (overhead: 12.8%)
        Playback length: 0m:01.779s
        Average bitrate: 93.79 kb/s, w/o overhead: 81.78 kb/s
Logical stream 1 ended


And here's the output for one which didn't transmit:

$ opusinfo bad_sound.ogg
Processing file "bad_sound.ogg"...

New logical stream (#1, serial: 9f763f54): type opus
Encoded with Lavf58.45.100
User comments section follows...
        encoder=Lavc58.91.100 libopus
Opus stream 1:
        Pre-skip: 312
        Playback gain: 0 dB
        Channels: 2
        Original sample rate: 48000Hz
        Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)
        Page duration:   1000.0ms (max),  900.0ms (avg),  800.0ms (min)
        Total data length: 18487 bytes (overhead: 1.6%)
        Playback length: 0m:01.779s
        Average bitrate: 83.11 kb/s, w/o overhead: 81.78 kb/s
Logical stream 1 ended


Pretty similar! But one obvious difference. The bad files had a much longer Page duration than the good ones.

Could this make a difference? It could! When the ogg files are streamed over the WebRTC media channel, they are sent via RTP, a protocol over UDP. Each RTP datagram contains one page of the ogg file. This meant that the WebRTC server was attempting to send 10kB datagrams. Too big! (The RTP MTU is 1200 bytes.)

Why did the bad files have such large pages? They were generated by compressing a .wav with ffmpeg, invoked like so:

ffmpeg -i file.wav -c:a libopus -ac 2 file.ogg

But the ffmpeg ogg mux has a -page_duration setting to specify how to slice up the pages. I hadn't known about this setting and wasn't using it. The default: 1000ms. And so, the 19-character fix for my bug:

# page_duration unit is microseconds
ffmpeg -i file.wav -c:a libopus -page_duration 2000 -ac 2 file.ogg


And all my files streamed happily ever after.

October 2, 2020