Writing audio decoder with OSX Core Audio

This article covers details on how to start writing an audio decoder which can be invoked by QuickTime or other client applications. Though writing an audio encoder would have very similar steps, this article specifically covers decoders (well just ’cause I have experience writing them). Unlike QuickTime 7 codecs, you use C++ code which is glued to C interface using boilerplate code provided by Apple. This makes codec development very convenient. But there are some learning curves and pitfalls which any newcomer will encounter – some of which might be unexpected. Hence, this article discusses those ‘gotchas’.

Sample codecs for IMA4 format can be found at: Apple Developer Website

The sample code provided by Apple uses boilerplate code which implements all the C interfaces required for interacting with Component Manager and ACPlugin architecture. Component Manager is a deprecated plugin architecture and so is implemented only for backward compatibility. ACPlugin must be implemented for newer OSX operating systems. The C++ class hierarchy is as follows:

Core Audio Class Hierarchy

Core Audio Class Hierarchy

If you are only writing an encoder or decoder, the ‘Codec’ class can be overridden without having to generate a new derived class from it. On the other hand, if you are implementing them both – two different classes (each for encoder and decoder) will make the design much simple (as shown by the sample code).

Some terminology

  • Codec – Refers to both encoder and decoder.
  • Linear Pulse Coded Modulation (LPCM) – Refers to uncompressed audio data whose amplitude values are linear.
  • AudioStreamBasicDescription (ASBD) – Refers to a C struct which carries information about a format such as channel information, bytes per packet, frames per packet etc.
  • Sample – Refers to one audio data value for one channel.
  • Frame – Refers to one group of samples containing one sample for each channel
  • Packet – In a compressed format, one packet contains many frames but in an uncompressed LPCM, one packet has exactly one frame. In addition, compressed format packet may have header information such as presentation timestamps.
  • Magic Cookie – An opaque data sent by the client. An example of this data may be information about the audio from file container header.

XCode project type

The project type must be a ‘bundle’ type and the final compilation must contain both a compiled .rsrc resource file (for Component Manager) and .plist (For ACPlugin) detailing the codec name, company name, whether its a decoder and/or encoder etc. Look for .r (uncompiled resource file) and .plist files in the Apple’s codec sample.

Functions to implement

The following are the main functions that you will need to implement:

  1. GetPropertyInfo(): This function is called by the client, to query the size of buffer (in bytes) it will need to provide to your codec to read a specific property value. After calling this function, the client will allocate that amount of space and call ‘GetProperty()’ to query that specific property.
  2. GetProperty(): This function allows clients to read a property from¬†your codec. Properties of a codec include information like¬†initialization state,¬†input/output audio formats supported, what output formats are supported for a specific input format, maximum audio packet size for input, number of frames in¬†an output¬†packet, whether frame bit rate is variable etc. If you want to return the default value, call your parent class’ GetProperty() function.
  3. SetProperty(): This function allows clients to set a property. A codec may not support setting a specific property or may reject any attempt to set any property by throwing an unsupported exception.
  4. SetCurrentInputFormat(): This function allows clients to communicate to your codec what audio formats they will be giving to your codec. The format information is provided using ‘AudioStreamBasicDescription’ struct. You may reject this format by returning an error code.
  5. SetCurrentOutputFormat(): This function allows client to communicate what audio format they are expecting out of your codec. For decoders, this is usually Linear Pulse Codec Modulation (LPCM) format. Again, ‘AudioStreamBasicDescription’ struct is used to communicate this information. You may reject this format by returning an error code.
  6. Initialize()/Uninitialize(): This function is called by the client to put your codec in initialized state. This means the client will not alter the input and output format agreed earlier (which were set using SetCurrentInput/OutputFormat() functions). The client can put the codec in uninitialized state again by calling ‘Uninitialize()’. Do not assume that the client will destroy an instance of a class after calling ‘Uninitialize()’. The client is free to call ‘Initialize()’ again. The parameters of this function are input format, output format and a magic cookie (i.e. an opaque data sent by the client like importer), if any. Be advised that the parameters of this functions are pointers and hence are optional. This means in some invocations, you may be provided with input and output format but no magic cookie. Then the client may called ‘Uninitialize()’ and call your ‘Initialize()’ function with NULL pointers for input and output format but with a valid magic cookie pointer. Hence, you are expected to make a copy of and save whatever you get in your class member variables.
  7. AppendInputData(): This function is called by the client to provide an input packet (compressed audio packet in case of a decoder). You are expected to save a copy of this packet in your circular buffer (implement by SimpleCodec class in the sample code). The client may provide more than one packet at a time. How many packets the client has provided is given by ‘ioNumberPackets’ function argument. Hence, you must return how may packets you actually added to your queue using the same ‘ioNumberPackets’ reference argument. Be advised, when all packets have been provided, this function may be called with ‘0’ value for ‘ioNumberPackets’. In this case, return from this function without doing anything¬†and throw ‘kAudioCodecProducePacketsEOF’ in ‘ProduceOutputPackets()’ function. If you specified ‘1’ for ‘kAudioCodecPropertyHasVariablePacketByteSizes’ and ‘kAudioCodecPropertyRequiresPacketDescription’ in ‘GetProperty()’, the client will provide a valid pointer for ‘inPacketDescription’ which points to a list of ‘AudioStreamPacketDescription’ struct. The no of elements of this list will be given by ‘ioNumberPackets’ parameter.
  8. ProduceOutputPackets(): This function is called by the client to ask your codec to process the packet they provided earlier in ‘AppendInputData()’. The ‘ioNumberPackets’ function argument contains the number of packets you are expected to process from your circular buffer. The ‘outOutputData’ pointer parameter specifies a memory location where you are expected to write your output data and the ‘ioOutputDataByteSize’ parameter specifies the amount of space provided in bytes. This space is usually the number of frames per packet your reported in ‘GetProperty()’ function multiplied by output format channel count multiplied by output format sample size. If you are given insufficient space, give the amount of bytes you need in ‘ioOutputDataByteSize’ reference variable and throw an insufficient space exception. If you have successfully processed a packet and written the output data, you must notify your client on how many frames of data were written and how much of the buffer space provided was utilized. The number of frames written must always be equal or less than number of frames per packet you reported in ‘GetProperty()’ and ‘ioNumberOutputSize’ must return number of frames outputted multiplied by output format channel count multiplied by output format sample size. If you have more frames than the maximum value, you can return ‘kAudioCodecProduceOutputPacketSuccessHasMore’ to notify the client that you want to work with the same packet because you have more data to output.
  9. Reset(): This is usually called by the client asking your codec to discard your circular buffer contents and start with an empty buffer.

For a decoder, a QuickTime client may call your decoder class functions in the following order:

  1. Class constructor()
  2. GetPropertyInfo() with kAudioCodecPropertyFormatList
  3. GetProperty() with kAudioCodecPropertyFormatList:¬†Return ‘AudioStreamBasicDescription’ struct for each format supported by you codec
  4. Class destructor()
  5. Class constructor()
  6. GetProperty() with kAudioCodecPropertyNameCFString: Return the name of your codec
  7. Class destructor()
  8. Class constructor()
  9. GetPropertyInfo() with kAudioCodecPropertyOutputFormatsForInputFormat
  10. GetProperty() with kAudioCodecPropertyOutputFormatsForInputFormat: Return a list of ‘AudioStreamBasicDescription’ struct for each output formats supported for a given input format. (NOTE: I could not get my QuickTime 7 client to accept LPCM with planar/non-interleaved audio format. I got ‘AppendInputData()’ but ‘ProduceOutputPackets()’ was never called. If anyone knows why, do let me know in the comments section.)
  11. GetProperty() with kAudioCodecIsInitialized: Return ‘0’ to show that the codec hasn’t been initialized
  12. Initialize() with valid input and output format parameters but NULL for magic cookie pointer
  13. GetProperty() with kAudioCodecIsInitialized: Return ‘1’ to show that the codec has been initialized
  14. GetProperty() with kAudioCodecPropertyInputFormat: Return the currently set input format
  15. GetProperty() with kAudioCodecPropertyOutputFormat: Return the currently set output format
  16. GetProperty() with undocumented property ‘grdy': Pass to lower base class which throws unknown property error
  17. GetProperty() with kAudioCodecPropertyInputBufferSize: Return the maximum size of your circular buffer in bytes
  18. GetProperty() with undocumented property ‘pakx': Pass to lower base class which throws unknown property error
  19. GetProperty() with kAudioCodecPropertyMaximumPacketByteSize: Return the maximum size of input packet you can handle in bytes
  20. GetProperty() with kAudioCodecPropertyPacketFrameSize: Return the number of output frames the input format packet has. If you have variable number of frames per packet, return a maximum number.
  21. GetProperty() with kAudioCodecPropertyMinimumOutputPacket: Return ‘1’ to indicate that you output at least one packet. Passing handling to the class lower than yours will do the same thing
  22. GetProperty() with kAudioCodecIsInitialized: Return ‘1’ to show that the codec has been initialized
  23. GetPropertyInfo() with kAudioCodecPropertyCurrentOutputChannelLayout: Return the sizeof(struct AudioChannelLayout)
  24. GetProperty() with kAudioCodecPropertyCurrentOutputChannelLayout: Fill and return a ‘AudioChannelLayout’ struct specifying how audio channel data is mapped
  25. GetProperty() with kAudioCodecIsInitialized: Return ‘1’ to show that the codec has been initialized
  26. Uninitialize()
  27. Initialize() with NULL input and output format parameters but valid magic cookie pointer and size, if any
  28. Same calls from steps 13 to 21
  29. GetPropertyInfo() with kAudioCodecPropertyPrimeInfo: Return sizeof(struct AudioCodecPrimeInfo)
  30. GetProperty() with kAudioCodecPropertyPrimeInfo: Return any leading and trailing frame information
  31. GetProperty() with kAudioCodecIsInitialized: Return ‘1’ to show that the codec has been initialized
  32. GetPropertyInfo() with kAudioCodecPropertyUsedInputBufferSize: Return 0 at this point because your circular buffer is empty
  33. Reset()
  34. GetProperty() with kAudioCodecIsInitialized: Return ‘1’ to show that the codec has been initialized
  35. GetPropertyInfo() with kAudioCodecPropertyUsedInputBufferSize: Return 0 at this point because your circular buffer is empty
  36. AppendInputData()
  37. ProduceOutputPackets()
  38. Same calls in steps 36 and 37 until all packets are processed

Installing your codec

To install your codec bundle folder, copy it to:

  • For system wide installation: /Library/Audio/Plug-Ins/Components
  • For user specific installation: ~/Library/Audio/Plug-Ins/Components
Tagged , , , ,

cherry: A GPU accelerated slide show daemon for Raspberry Pi

Sometime ago, I wrote a little daemon program for Raspberry Pi to continuously slide show a bunch of images. The project was intended to be commercial – utilizing Raspberry Pi with its powerful GPU and low power requirements and targeted shop owners who have full-HD monitors continuously flipping through a set of image advertisements 24-hours a week.

I have decided to release the project under public domain hence you may use it any way you like. The program uses OpenCV for image loading and hence supports all image formats that OpenCV can load. Slide show blending occurs in GPU using dispmanx APIs hence completely bypasses both the CPU and XWindow server for smooth performance.

Before you try to compile the source, make sure:

  1. OpenCV libraries are installed
  2. Understand that the program is intended to be a daemon
  3. Understand that the program can run without loading any display/window manager

Go to source repository

Load screen for cherry program

Load screen for cherry program

Tagged , , ,

Writing file container reader and decoder for QuickTime 7: FAQ

Having just written a file container reader and decoder for a custom video format for QuickTime 7 (and having to comb through scarce deprecated documentation manuals), it seemed appropriate to write a little FAQ for anyone else looking to do the same. If you are new to QuickTime component programming, skimming through this FAQ should save you days if not weeks. No thanks required, (:

Custom QuickTime component playing a WMV file

Custom QuickTime component playing a WMV file

Should I be writing a QuickTime 7 component?

QuickTime 7 components are plugins for 32-bit QuickTime world. They are intended to extend both QuickTime Player and programs which use QTKit (an interface for embedding QuickTime Player in user applications). Hence, if want QuickTime Player or your program that uses QTKit to play a custom format, you will have to write a plugin component. For newer applications (especially 64-bit ones), Apple recommends that you use AVFoundation instead. This FAQ does not cover AVFoundation.

Why not write QuickTime X component?

Apple hasn’t documented on how to write an QuickTime X component yet. Again, QuickTime X exists exclusively for 64-bit machines and applications.

What is the difference between QuickTime 7 and QuickTime X?

QuickTime 7

  • Always uses 32-bit binaries
  • Is installed by default in both 32-bit and 64-bit machine, but QuickTime 7 Player UI needs to be downloaded manually in 64-bit OSX. It appears in ‘Applications->Utilities’ folder after installation
  • On a 64-bit Mac, QTKit uses QuickTime 7 as a fallback if QuickTime X is unable to play a media format

QuickTime X

  • Always uses 64-bit binaries
  • Both its component and UI are installed by default in 64-bit¬†Mac

How long has Apple promised support for QuickTime 7?

There has been no word on how long the support will last. Even though a lot of APIs used by QuickTime 7 component have been marked deprecated, both Apple as well as large user applications continue to use and prefer QuickTime 7. There are also a number of components written for QuickTime 7 which are not ported to QuickTime X. Looking at this trend, QuickTime 7 should survive some major future OS releases.

What are the different components and which one should I write?

The components have a very confusing nomenclature. So, here’s a list of different components and what they do:

  • Import Component: If you intend to write a file container reader or want to allow users to convert custom format media file to QuickTime movie file (QuickTime’s mov file container format – NOTE: If an Apple documentation states ‘media file’ then its referencing this format) through QuickTime 7 Player, you should write this component.
  • Export Component: If you intend to allow QuickTime to export a QuickTime movie file to a custom format media file, you should write this component.
  • Image¬†Decompressor¬†Component: If you intend to decode video stream frame data passed by Import Component, you should write this component. Do note that this component can¬†also be written for static image decompressors such as JPEG reader.
  • Image Compressor Component: Opposite of what Image Decompressor does.
  • Data Handler Component: If you intend to add capability to read/write media data from a source not traditionally supported by OS, you should write this component. This component need not understand, verify or parse the incoming/outgoing data. A data handler¬†component is tasked by higher level components with reading and writing to a custom data source.
  • Derived Media Handler Component: If you intend to modify media samples after they are just ready to be drawn by Base Media Handler, you should write this component. At this point, you already know the media duration, picture size, format and other information. This is a high level component.

Where are QuickTime 7 components installed and how can I register mine?

For system wide installation: /System/Library/QuickTime
For current user only installation: ~/Library/QuickTime (The directory ‘QuickTime’ may not exist. If so, create it)

Registration¬†occurs automatically when you copy your ‘<ComponentName>.component’ bundle folder to any of the above installation location. No other action is required. Beware that, if your component had some important attributes (discussed later) missing and the registration failed – correcting the attribute in place will not trigger the component to register. Neither issuing a ‘touch’ command on the bundle or restarting your system will force the system to retry registration. The Component Manager system will attempt to register only when you copy a new bundle to those locations. Hence, it’s a good idea¬†to write a little post-build script which removes your previous copy and copies the newly built bundle after the XCode build process finishes to ease testing and avoid potential headaches.

How can I make sure my component has been registered?

There is a program (actually a sample program) called ‘Fiendishthngs‘, provided by Apple, that lists all the components registered in your system. If your component is registered properly, it should appear in the list. Do know that plugin development using component model is deprecated and should not be used for any other application than to extend QuickTime 7.

Can different component, for instance an importer and decoder, be in the same binary/bundle?

Yes.

How does QuickTime 7 invoke the components when asked to play a custom file format?

QuickTime 7 interacts with registered components by using Component Manager interface. QuickTime, using Component Manager, goes through all the registered components at the installation locations and tries to enumerate all importers which will handle the format by matching file extension. ¬†Information regarding whether a component is an importer or decompressor or any other type¬†is embedded in a compiled resource file (.rsrc files) inside the component bundle. The resource file also has information on what the entry point for the binary is called and what kind of frame data is outputted by the importer. Similarly, after the importer is done, QuickTime enumerates decoder by matching frame data type looking at the decoder’s resource file. If there are several decoders for a type, QuickTime will select the one which best matches the requirement (such as destination surface pixel depth, color types etc).

What does a typical component bundle contain?

/<ComponentName>.component

|-> Info.plist
|-> /MacOS -> <ComponentName> (executable file)
|-> /Resources -> <ComponentName>.rsrc (compiled resource file)

Where can I find a sample source code for writing a component?

Apple provides ‘ElectricImageComponent‘ as an example. Look into¬†‘EI_ImageDecompressor’ for an example for a decoder. It¬†also has sample an importer and an exporter. Perian and Xiph are two example open source projects – although they might be little large for newbies.

What does a typical XCode project for a component look like?

Let’s call our decoder ‘TestDecoder’. Its project would¬†look like:

/<ProjectName>.xcodeproj

|-> TestDecoderDispatch.h – Contains macros for auto generating boilerplate functions including entry point function and ‘CanDo’ function which calls the functions you implement
|-> TestDecoder.c – Contains implementation for the functions you want to handle
|-> TestDecoder.r – Contains atoms describing what component is implemented and how
|-> Info.plist – Infomation about the bundle such as manufacturer, product name etc

What do these files contain?

  1. TestDecoderDispatch.h – If ‘ComponentResult’ macro is used on a function name, it means you will have to implement it in ‘TestDecoder.c’. Use of ‘ComponentError’ macro means, the auto-generated function will return an error when called by Component Manager. Use of ‘ComponentNoError’ does the opposite. Use of ‘StdComponentCall’ calls the standard function implemented by the macro. Finally, use of ‘ComponentDelegate’ delegates the call to base component’s implementation.
  2. TestDecoder.c – Along with implementation of functions you promised in ‘TestDecoderDispatch.h’ by the use of ‘ComponentResult’ macro, ‘IMAGECODEC_BASENAME’ macro (along with others) ¬†is an important macro defined here. It specifies prefix for implemented functions. For instance, an¬†ASF format importer¬†might set its prefix as ‘ASFImporter’ hence the initialization function (called ‘MovieImportOpen()’ in documentations) should be called ‘ASFImporterOpen()’. Finally, each of the implemented functions have its first parameter as a pointer to a structure that you will have to define. This structure carries context data across the different functions. Do not use global static data storage to communicate among your functions as this will not make them thread safe.
  3. TestDecoder.r – Different type of components use different types of atom structure here to communicate about themselves to the Component Manager. Specifying different atoms allows the Component Manager to see what kind of component is implement, what kind of properties does it support, what kind of file type/mime is it associated with etc. An important function of this resource file is to specify the entry point for a component. You should name your entry point using the same prefix specified using ‘IMAGECODEC_BASENAME’ in TestDecoder.c. For instance, if the prefix specified there was ‘ASFImporter’, then the entry point name should be ‘ASFImporterComponentDispatch’. This file is complied by ‘Rez’ program internally in proxy of¬†XCode and the resultant compiled binary resource file ‘TestDecoder.rsrc’ is produced.

What settings are required for the project?

The only two requirements for a component project are¬†that the project type should be ‘Bundle’ and architecture to build must be set exclusively to ’32-bit’. Frameworks required are ‘QuickTime’ and ‘CoreServices’.

What are the potential pitfall for a component project?

  1. If you are using C++ instead of C, be sure to wrap framework header includes in ‘extern “C”‘. This is to make sure your binary’s entry point function’s name is not mangled by the C++ compiler.
  2. The contents of your ‘<Prefix>Dispatch.h’ must not be guarded using include only once macros/#pragma once.
  3. In a decoder project, when you assign ‘ImageSubCodecDecompressCapabilities *cap->decompressRecordSize’, an empty memory of that size will be available for your use in ‘BeginBand()’, ‘DrawBand()’ and ‘EndBand()’ pointed by¬†‘ImageSubCodecDecompressRecord *drp->userDecompressRecord’. This memory may be used to store per band/frame context information. You do not need to allocate memory yourself.

How does the importer pass data to the decoder?

In your ‘MovieImportDataRef()’ (or just ‘MovieImportFile()’ if you only support importing from files), create a media track and add samples to it using ‘AddMediaSampleReference()’ (or ‘AddMediaSample2()’ if your frame data is in memory) specifying the file offset and size for each frame data and passing the media track back to the caller using ‘usedTrack’ pointer. A ‘ImageDescription’ structure is also necessary which is attached with the media to carry information¬†like frame width, height etc across to the decoder. If you need¬†to carry extra custom data to the decoder, ‘AddImageDescriptionExtension()’ function can be used to append information to this structure. On the other end, QuickTime will read data from the file using the provided offset and size on to the memory and provide data for each frame to the decoder one by one.

Where is the codec data pointer and draw surface destination pointers on the decoder side?

You can access the codec data passed by the importer using ‘ImageSubCodecDecompressRecord * drp->codecData’ pointer in ‘BeginBand()’ and ‘DrawBand()’ parameter list. The size of this codec data is available only at ‘BeginBand()’ in ‘CodecDecompressParams *p->bufferSize’. To use this value in ‘DrawBand()’ carry it across using ‘ImageSubCodecDecompressRecord *drp->userDecompressRecord’. The surface to draw on is pointed to by ‘ImageSubCodecDecompressRecord *drp->baseAddr’.

I have written an importer but my importer takes a lot of time adding frame data so the player UI takes sometime to appear. How can I remedy this?

Usually the decoder is only invoked after the importer is done adding sample references. This is the cause of the delay. To avoid it, in your ‘MovieImportDataRef()’, set the ‘movieImportResultNeedIdles’ flag for ‘outFlags’. You will have to add a placeholder track until the actual frame data is imported. This is to allow QuickTime to calculate duration of your media. You should also implement ‘MovieImportSetIdleManager()’ to save the handle of an idle manager given by the system. An Idle Manager is used to get idle time from QuickTime for doing your work. The system will periodically call your ‘MovieImportIdle()’ function if the player is doing nothing – this is when you can do your importing. Finally, implement ‘MovieImportGetMaxLoadedTime()’ to tell QuickTime how much of the media is loaded.

Tagged , , , , ,

FFmpeg LGPL v3 binaries for Windows

It seems cross compiling FFmpeg for LGPL v3 license (i.e. with non-LGPL v3 components and non-free components stripped) for Windows not only is a lot of hassle but takes a whole part of a day. So, this post is to provide links to statically linked LGPL v3 FFmpeg executables for both 32 and 64 bit architectures.

Download binaries with Debug information (has both 32 and 64 bit binaries)
Download binaries without Debug information (has both 32 and 64 bit binaries)

FFmpeg version: N-62439-g5e379cd

Statically linked library versions:
libavutil      52. 76.100 / 52. 76.100
libavcodec     55. 58.103 / 55. 58.103
libavformat    55. 37.100 / 55. 37.100
libavdevice    55. 13.100 / 55. 13.100
libavfilter     4.  4.100 /  4.  4.100
libswscale      2.  6.100 /  2.  6.100
libswresample   0. 18.100 /  0. 18.100

Build configuration:
–arch=x86_64 –target-os=mingw32 –pkg-config=pkg-config –enable-avisynth –enable-libmp3lame –enable-version3 –enable-zlib –enable-librtmp –enable-libvorbis –enable-libtheora –enable-libspeex –enable-libopenjpeg –enable-gnutls –enable-libgsm –enable-libfreetype –enable-libopus –disable-w32threads –enable-libvo-aacenc –enable-bzlib –extra-cflags=-DPTW32_STATIC_LIB –enable-libopencore-amrnb –enable-libopencore-amrwb –enable-libvo-amrwbenc –enable-libschroedinger –enable-libvpx –enable-libilbc –enable-static –disable-shared –enable-libsoxr –enable-fontconfig –enable-libass –enable-libbluray –enable-iconv –enable-libtwolame –extra-cflags=-DLIBTWOLAME_STATIC –enable-libcaca –enable-libmodplug –extra-libs=-lstdc++ –extra-libs=-lpng –extra-cflags= –extra-cflags= –enable-runtime-cpudetect

Tagged , ,
Follow

Get every new post delivered to your Inbox.

Join 73 other followers