In this post I describe the capture setup we used to create screencasts using an Ubuntu Linux desktop. This one just describes the recording process itself. I’ll go over the overall screencast development process and editing in separate posts.
Contents
- Overview
- Desktop
- Slides
- Terminal
- Camera
- Audio
- Putting it all together
Overview
These scripts were created to record screencasts for a class on Data Engineering, so they’ll need to cover both high-level conceptual material as well as detailed examples or tutorials.
To do that we really wanted to have the flexibility of showing both slides as well as terminal or web interactions at the same time. We also figured it’s a good idea to have the ability to overlay a talking head when there’s not much other detailed interaction going on, so we also wanted to be sure we captured camera footage during the recordings as well.
This setup is designed to capture raw footage of all of those channels at once.
We looked around, but couldn’t find any off-the-shelf tools that really met our
needs for this. It turns out this is actually pretty easy to accomplish just
using ffmpeg
directly from a script.
Desktop
Note the desktop setup:
-
Ubuntu desktop with three monitors set up within a single X session. You’ll want at least two to capture both slides and terminal/web at once
-
Each desktop is
1920x1080
, so the total big desktop size is3x1920
pixels wide and 1080 pixels tall -
Terminals/Browsers run on the left-hand monitor
-
Slides are full-screen on the center monitor
-
Webcam lives on top of the left monitor so we’re looking roughly towards the camera when going through a detailed example
-
Sound is coming from a lavalier mic plugged into a USB audio interface made available via standard Linux alsa devices
-
I use the right-hand monitor to hold terminal windows to start/stop these scripts, but nothing from there is recorded
Below, we’ll go through each of the different capture channels used
and then wrap it all up with a bow into a single script that follows
the screencasts -> shots -> takes
file organization that we used to keep
track of all of this.
Slides
To capture a stream of slides, we’re using the x11grab
ffmpeg interface. This
is designed to just sample what the X server sees every so often ($framerate
)
and then encode and save that as a video stream.
The tricky part is creating a command to record the correct monitor for slides.
Since the middle monitor is running slides, we tell ffmpeg
to capture a
single monitor’s 1920x1080
worth of screen but start that from the geometry
offset +1920,0
… the top of the middle monitor.
The command
1 2 3 4 5 6 |
|
gets wrapped in a bash function to capture slides:
1 2 3 4 5 6 7 8 9 10 |
|
This saves to the files slides.mkv
and slides.log
.
Terminal
We’ll use x11grab
to record the left-hand monitor as well. The offset here is
just the top of the left-hand monitor, so +0,0
in X geometry speak:
1 2 3 4 5 6 7 8 9 10 |
|
This saves to the files terminal.mkv
and terminal.log
.
Camera
To capture the stream from the webcam, we’re relying heavily on the fact that
the
Logitech HD Pro Webcam C920
does hardware h264 encoding on the fly and we’re just tapping into that using
ffmpeg’s v4l2
interface to simply copy
the video stream out to a file.
I also had some problems understanding the timestamps that the camera’s
hardware encoder used, so I include the set of ffmpeg
args that fixed that.
YMMV depending on your camera.
Probably the most important thing to recognize is that the capture relied on the hardware encoding. If we were getting raw video and having to encode on the fly, then the desktop’s computational capabilities my come more into play. This usually results is limiting the framerate you can actually record.
Here’s the function to capture the camera footage:
1 2 3 4 5 6 7 8 9 |
|
This saves to webcam.mkv
and webcam.log
.
Audio
Audio is coming in through a
TASCAM US-2x2
USB-audio interface, where I have a
lavalier mic
plugged in. This “just worked” through the alsa
interface for ffmpeg
so we
just need to copy the raw audio stream from the device:
1 2 3 4 5 6 7 8 9 |
|
which saves audio.wav
and audio.log
.
Putting this all together
So all of the above functions get rolled up into a single script named
capture
.
This script kicks off the ffmpeg
recordings at roughly the same
time and saves all the output to
1
|
|
where the variables in there either are defaults (like the shot number) or are specified as arguments to the script. I typically use it like
1 2 |
|
which kicks off the recording and streams outputs to files such as
1 2 3 4 5 6 7 8 9 10 |
|
This folder structure lets us keep things nice and tidy for editing.
So here’s the final script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
Note that each function is run in the background so they’re effectively kicked off in parallel.
If you have any questions or feedback, please feel free to share it with me on Twitter: @m_3