Seeking with FFmpeg

So people(including me) struggle with FFmpeg. It’s very powerful and also a bit hard to understand for the average developer who hasn’t had a background in media decoding/encoding. I struggled with playback first, then I found a good implementation on stackoverflow and built on top of that. This post details that.

So soon after we get the playback working we need to get the seeking working. If you are a Vrok(<=3) user then you might have experienced the buggy seeking it does. As I dwelled on to FFmpeg this vacation it just got clear with the help of dranger‘s good ‘ol tutorial. PTS/DTS are detailed well there within the context of FFmpeg. The documentation of FFmpeg tells half the story of its design, which is based on multiple streams. Each stream has its own context and each context has its separate time base. What’s a time base? It’s just a number that decides how big the time scale is. If you have a bigger time base, then you can store fine grained time intervals. But this should be balanced with media duration and the storage size of the variable we are going to store this number in.

FFmpeg has a separate context for the whole file (container in code) this context has a time base which is defined as AV_TIME_BASE. It is this time base that is used to store durations about the whole file. Namely total duration! So the following code will give you total duration in seconds.

duration_in_seconds = container->duration / AV_TIME_BASE;

In the audio stream there exists a different time base. Which you can get by accessing,

container->streams[audio_stream_id].time_base

This is expressed as a AVRational, all you need to know about this is every rational number can be represented in decimal (numerator/denominator). So now you can again take any PTS (Presentation Time Stamp) in seconds using this time base too. This can be used to show the current playback position.

av_seek_frame(container,audio_stream_id,seek_to *audio_st->time_base.den / audio_st->time_base.num ,AVSEEK_FLAG_ANY);

Is used to seek to the correct frame (it is not exact but does the job, FFmpeg only seeks to key frames). Here, seek_to is stored in seconds as you can see I’ve converted it to the time base version of time by dividing it by the time base (yes, the convention is a bit different from the AVFormatContext’s time base). Hope this clears out FFmpeg seeking! Refer to Vrok source for sample code.