Monday, October 12, 2009

Live screencasting using ffmpeg

I reconnected with an old friend from home this summer. He's living out in Utah, and I was working in Boston at the time, but I tracked him down on Skype. Remembering that I still had his copy of Myst III: Exile from years ago, I confessed that I'd never played all the way through the game: without having much free time, it was hard to keep at the puzzles. “It's too bad,” I lamented, “that we can't play it together.”

But wait ... perhaps we could?

I'd been aware for some time that it's possible to stream video using ffserver and ffmpeg. Sadly, it's not a completely intuitive process, and until this point, I'd never had the motivation to figure out how to put all the parts together. But now, motivated by the possibility of being spoon-fed (just barely enough) hints to finally finish Exile, I resolved to figure out how to stream my desktop to Utah. After some initial success, and a fair bit of tweaking of codecs and parameters, and then hours of fighting with wine to get Exile running, I got everything working. Below, I provide a simple guide that should enable anyone else running Ubuntu to set up a similar live screencast. (How I got Exile to run acceptably under wine, though, is a story for another day.)

The general process of streaming with ffmpeg looks something like the following. First, you run an ffserver instance. It's configured with a number of feeds (usually local files), which provide audio and/or video, and output streams, which are hooked up to incoming connections over RTP/RTSP/HTTP.

Our problem is slightly more complicated, in that we need to feed live data to ffserver. The easiest way to do this is to use an FFM stream: we can configure ffserver with a feed that's sourced to an FFM file, and, once ffserver is running, fire up ffmpeg and tell it to dump live audio and video into the FFM file. Then, a buddy in Utah can connect to the correct output stream and see what we're streaming. Sounds great, right?

Now that you can see the big picture, it's not too hard to get things going. First things first: you'll need some packages:
sudo apt-get install ffmpeg \
    libavcodec-unstripped-52 \
    libx264-67 \
    libmp3lame0
Then you'll need to specify a configuration for ffserver with at least one feed and at least one output stream. I used libx264 for video encoding and libmp3lame for audio encoding, and found that the settings in the ffserver.conf below were more than generous enough for the available bandwidth:
Port 8090
BindAddress 0.0.0.0
MaxHTTPConnections 2000
MaxClients 1000
MaxBandwidth 1000
CustomLog -
NoDaemon

<feed exile.ffm>
  File /tmp/exile.ffm
  FileMaxSize 16M
</Feed>

<stream exile.asf>
  Feed exile.ffm
  Format asf

  AudioBitRate 64
  AudioChannels 1
  AudioSampleRate 22050

  VideoBitRate 128
  VideoBufferSize 400
  VideoFrameRate 15
  VideoSize 320x240

  VideoGopSize 12

  VideoHighQuality
  Video4MotionVector

  AudioCodec libmp3lame
  VideoCodec libx264
</Stream>
Put this configuration file wherever you like; then you can start ffserver and point it at the location you chose:
ffserver -f /path/to/your/ffserver.conf
The remaining piece of the puzzle is getting ffmpeg to capture audio and video from your desktop. Video is easy enough; you can tell ffmpeg to grab a chunk of the X11 display using -f x11grab -s [W]x[H] -r [R] -i 0.0+[X],[Y], where [W], [H], [X], and [Y] represent the width, height, x-offset, and y-offset, respectively, of your desired capture region. Additionally, [R] specifies the desired capture framerate, which should match the one you specify in your configuration file.

For audio, things could get a bit hairier — but, rather than mucking around with arcane ALSA magic in your .asoundrc or setting up JACK, I recommend taking advantage of PulseAudio here; though it is (perhaps deservedly — I caught it using 2.4 GB of memory in an unrelated incident earlier today) maligned by many, it really shines in this use case. You'll need pavucontrol, the very-helpful PulseAudio volume control, to make this work:
sudo apt-get install pavucontrol
You can then tell ffmpeg to record audio from a PulseAudio input channel by using -f alsa -i pulse. Adding to this the URL of the ffserver feed to which ffmpeg should be connecting, you have everything you need to invoke ffmpeg. For example, if you're using the ffserver.conf above, and you want to capture a 640x480 window at the upper-left corner of the screen, you'll want to invoke ffmpeg like this:
ffmpeg -f x11grab -s 640x480 -r 25 -i :0.0 \
       -f alsa -i pulse \
        http://localhost:8090/exile.ffm
With ffmpeg running, you'll then want to fire up pavucontrol and make sure that ffmpeg is connected to the monitor feed for your sound card, as shown in the screenshot below:



And that's it! You can tell your friend in Utah to point a streaming client, e.g. vlc, at your stream, and it should just work. Just remember to use your outward-facing IP address, and to make sure the port's open on your router or firewall, if any. For the ffserver.conf above, if my external IP address were 123.45.67.8, the URL I'd send to my friend would be http://123.45.67.8:8090/exile.asf.

There is one thing to note: if your friend's client takes a long time to start up, it may be because ffserver's default behavior is to connect clients to the live stream without a delay, which then causes many clients to pause for a number of seconds to fill up a buffer with incoming data as it becomes available. This can be accelerated somewhat by appending a ?buffer=X (where X is a number of seconds) param to the end of the URL; this causes ffserver to start the client on a position in the stream that's X seconds in the past, which enables the client to retrieve those X seconds as fast as bandwidth will allow, thus possibly reducing the time spent buffering. In experimenting, I also found that this cut back on the number of buffer underruns experienced on my friend's end.