Implementing custom captions renderer for Android VideoView

From Android KitKat (API level 16), programmers can use ‘addSubtitleSource()’ on VideoView to add a WebVTT format subtitle track. That’s it. Nothing else to do. The native subtitle handler even automatically listens and changes drawing styles according to system accessibility caption settings. (FYI – System captions settings are under Settings->Accessibility->Captions) Awesome!

But only that it isn’t. Most of the application workflows have a captions button for the user to turn captions on/off. Unfortunately, there are no public member functions on VideoView control to allow the developer, in proxy of app user, to control captions visibility from within the app. All the developer can do is redirect the user to system accessibility page using ‘Intent’. Even then, the captions settings are global. Sure, if the user has permanent hearing disability, this is no problem. He/she would want to turn on caption for all the apps with one switch. Unfortunately, this is not always the case. Even a normal user may want to mute sound and watch the video with captions in a quiet environment. Then, the app will need a button within it for the user to toggle captions. Another, though minor, quirk is that ‘addSubtitleSource()’ works with WebVTT format. The ubiquitous captions format is SRT. This will require you to wire an intermediate converter (A simple SRT to WebVTT converter that I wrote for testing can be downloaded here). Hence, there is atleast two strong cases to implement a custom captions handler.

Android Custom Captions Renderer

1. Get an instance of internal MediaPlayer

The VideoView control has a private member object instance of MediaPlayer control and implements ‘addSubtitleSource()’ on top of it. So, the first plan of attack is getting an instance of this MediaPlayer object instance. One way of doing this is using Java’s reflection capabilities to call hidden member functions in VideoView class – i.e. ‘VideoView.class.getDeclaredMethod(“<Method name>”).invoke(<Params>)’. This, of course, is very ‘hacky’ and we want to stay away from this as much as we can in production code. Fortunately, there’s another way. If we create a new derived class from ‘VideoView’, we can override the ‘onPrepared()’ event and get an instance of its internal MediaPlayer object in the event listener’s function parameter. When this event fires, the video player has loaded the video and is ready to accept captions track. Using MediaPlayer’s ‘addTimedTextSource()’, we can add an external captions track. This can be SRT format.

2. Handle special condition where caption file is on the network

If your captions file is on the network though, we have another hurdle to jump. The function accepts only local file path or a local file descriptor so we have to download the captions to a local file in a temp directory and then add code to handle file clean up when it is no longer needed. If your code fails to properly cleanup, you will be cluttering user’s precious storage space. Alternatively, we could go another path by using ‘MemoryFile’. MemoryFile is a file mapped into memory used for inter-process communication. Nice thing about this is that the system takes care of the file maintenance part while we just read/write to memory. Using Java’s reflection capabilities we can call hidden member ‘getFileDescriptor()’ on MemoryFile to get a ‘FileDescriptor’ to use with ‘addTimedTextSource()’. I would argue that though ‘getFileDescriptor()’ is hidden, it is safe to call it. It ties directly with underlying Linux API and there isn’t any reason for Android  framework to change or remove this function in the future.

3. Rendering caption on specific time

When you set a listener using ‘MediaPlayer.setOnTimedTextListener()’, we have only asked MediaPlayer to notify us using ‘onTimedText’ event when caption text start/end timing is reached on playback. We still have to put a TextView on top of VideoView to show the captions. This can be done easily in the video activity. When the listener handler is invoked, it will have a ‘TimedText’ object on its function parameter. The ‘getText()’ method of this parameter will return the caption text to be rendered at this playback time. If the end time for a caption is reached, this will return a null string. Then, we just hide the TextView captions text control. There is also a ‘getBounds()’ method in the ‘TimedText’ object. Optionally, this may be a non-null ‘Rect’ object. This is a valid Rect object when the input captions file contains positioning information. A rigorous implementation will use this bounds by moving TextView accordingly. One case where this may be used is when an object/person important to the video’s subject matter is on the bottom part of the video which is usually occluded by the captions text.


The native SRT parser is confused by subtitle information blocks where there is sequence and timing information but the captions text is empty. For instance, as in caption sequence 2 below:

00:00:00,000 --> 00:00:00,100
Caption text 1

00:00:01,200 --> 00:00:02,000

00:00:04,000 --> 00:00:06,000
Caption text 3

One way to fix this is by replacing the ‘\r\n\r\n\r\n’ (assuming Windows style line break) after the end time in sequence 2 with ‘\r\n{Any non-white space character here}\n\r\n’.