FFmpeg Audio Playback Sample

It was very hard to find a proper sample code that would compile with the newest FFmpeg library, and when some compiled they didn’t play correctly. This one however runs great, the planar audio channel playback was added by me, and later support for float playback(double PCM follows the same but wasn’t added due to media files being scarce) so now it plays the majority of MP3s, WMAs, AACs … and any audio format that you can name, I guess. I modified the code that was found here http://stackoverflow.com/questions/9799560/decode-audio-using-libavcodec-and-play-using-libao many thanks to him. This code was adapted from C, so somethings have not been thoroughly converted to C++; it compiles and runs anyway.

EDIT:

Please refer to this updated code

Compile with

g++ -std=c++11 test.cpp `pkg-config --libs --cflags libavcodec libavformat libavutil ao`

For MSVC, use this libao version: https://github.com/manushanga/libao-win32

C++ vs. Java benchmarks?

Note: I think this should go on the top: This post has been attracting alot of views(which I am very much enjoying). And a many of that think I am wrong to do it in the first place, guys! it’s something that I did in my spare time I have not come to any conclusion other than my optimmized code is faster than the one given by the original author. Who was flaming C++ for being slow. I do not know what content in this post makes my post look like that it was written by someone who says C++ is the only language that should be used. And if you are offended by this post, all I can tell is to read the last paragraph again. This post is not about “what’s the best language evar!”, it is about C++ optimizations and how it is easy to write slow code in C++. And ofcause, the links are given here for a reason read them first.

The internet is full of surprises, today I was searching for a better C++ IDE(I know about KDevelop and how awesome it is, yet there might be something better!). I stumbled upon this conversation,

http://www.cplusplus.com/forum/unices/67839/ and from a link from there this, http://www.cplusplus.com/forum/unices/64290/

Have to say that I was amazed to see them proving C++ slower than Java in the most uninformed way possible! The code has been crafted from a Java perspective and has been implemented in C++. Which is insane because, no one in their right mind would think that the given C++ code is optimal. Unlike in Java where the developer can do very little to optimize the code, C++ has a wide range of optimizations that should be done by the developer. C/C++ is considered not as easy as Java just because of that, it is so much easier to write slower code with it because there are too many ways to get one thing done.

Java is garbage collected, so it does no freeing of memory until its next collection round. This can be a huge problem if memory isn’t preallocated outside loops and again there’s the space-time trade off which any comp. sci. student would know mostly by experience GC systems achieve high performance by killing the space, it eats up as much memory as possible and gives you some performance increase. Given this amount of memory(preallocated) to unmanaged code you will see performance boosts more than you will ever see with a GC, that’s the only thing that I’ve changed in the code that the author claiming C++ is slower has given, I just did a preallocation. This will be faster if everything happened on stack, but that needs a resize of the stack to be run correctly so I used new and delete. shared_ptr was left because the user has claimed its being widely used and needed here(?). He should be informed that there are good garbage collector implementations in C++.

The Java code which is on the post(one similar to the below C++ code) runs in 5.6 seconds on host machine. With following arguments,

-XX:+UseCompressedOops -XX:+DoEscapeAnalysis -XX:+EliminateLocks -XX:+UseBiasedLocking

Host runs Arch Linux, so Java detected that its a server and added -server! Further, the extra arguments had no effect on this machine. And it was Oracle JDK.

#include <vector>
#include <string>
#include <memory>
#include <ctime>
#include <iostream>

void test() {
  const int COUNT = 1000000;

  std::shared_ptr<std::string> *v=(std::shared_ptr<std::string> *) ::operator new(sizeof( std::shared_ptr<std::string>)*COUNT);
  std::shared_ptr<std::string> sp(new std::string("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));

  for(size_t i = 0; i < COUNT; ++i)
      v[i]=sp;
  delete v;
}

int main() {

    std::clock_t b, e;
    b = std::clock();

    for (int i = 0; i < 1000; ++i)
        test();

    e = std::clock();

    std::cout << "Time: " << (double(e - b)/CLOCKS_PER_SEC) << '\n';
    return 0;
}

Above code just runs on 4.6 seconds. With just -O3.

Memory usage

C++ code: 15 MB

Java code: 320 MB

Java VM only(taken from running a infinite loop): 8 MB

There’s a lot of optimizations one would do to the above C++ code, I just did one and mentioned another. But there is another, which runs even faster!

const int COUNT = 1000000;
char v[COUNT*sizeof(std::shared_ptr<std::string>)];
void test() {
  std::shared_ptr<std::string> sp(new std::string("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));

  for(size_t i = 0; i < COUNT; ++i)
      memcpy((void*)&v[i*sizeof(std::shared_ptr<std::string>)],(void*)&sp,sizeof(std::shared_ptr<std::string>));
}

And still keeping shared_ptr! This one runs on 3.6 seconds! Original author might disagree with the usage of memcpy and static allocation(specially when Java doesn’t have static allocation), well C++ doesn’t have a GC by default either!

Languages have their differences and there are claims that people should avoid from when one of the languages in question has two levels of complexities one on actual hardware optimization and another on VM code optimization. Not to mention this post is not and was not about what’s the best language it is just a post detailing C++ optimizations.

Deadlock Synchronization

UPDATE: If you want a fast playback system which can do async requests for buffers better go with a lock-free audio queue. Vrok’s backend will be changed soon to a lock-free audio queue which is the most optimal design that I found after some research.

NOTE: If you want a bug free program in a constrained amount of time, this won’t be the most optimal method; for now at least.

ANOTHER NOTE: DO NOT use common lock names, std::mutex, pthread_mutex_lock ..etc and roll out your own and use Thread Error Detectors they actually think its the standard version of these classes/functions!

After you have read the above note, you can continue reading of my experience in using something that was always considered to avoid. I did it because it did not occur to me in the first place. I did not use much of the common standards that were in use in the field. But I however used the “producer-consumer” problem to gain some ground.

I am talking about my experience while implementing the decoder-playback loop of  Vrok. The loop is documented below,

Decoder Thread

mutex[0].lock()
// work
mutex[1].unlock()

Playback Thread

if (atomic_check(&pause_check))
    atomic_set(&pause, true)
    pause_mutex.lock()
    pause_mutex.unlock()
    atomic_set(&pause, false)
    atomic_set(&pause_check,false)
mutex[1].lock()
// work
mutex[0].unlock()

Anyone would be implementing something like this if they were to follow a producer-consumer approach, and then I remembered that this is a player and I have to pause this from time to time, and I did not pick to just pause the output or stop the looping altogether; I chose to deadlock purposely which led to alot of problems there after, thread error detector programs like Valgrind and Intel’s Inspector XE reported deadlocks in huge amounts that I started to wonder how does it even run at all with all of these deadlocks! And there were actual deadlocks because of the “irresponsible” implementations that I have done in the pause() function.

I had to make up my mind either to keep on using what ever that I have written or go read the source of some successful non-freezing player! It was then that I remembered that I haven’t refered to any documentation nor any proper engineering approach to answer this problem. After drawing the locking diagram, I could see it in a way that I did not see before. Thread Error detectors can help you if you have used the basic elements that are given by standard thread libraries, if you roll your own you are alone! After rethinking about the design I was able to get the more serious problems resolved, problems(i.e. freezing of the control thread: Qt’s main thread) do not appear at normal playback of a song but at the end or the start of a song due to the deadlocks that I have introduced purposely, forget to release the locks in a way that it gurantees that the system resolves the deadlock it is in you will suffer endless nights thinking what you did wrong. Only later you find out that you were doing it right but not in the correct order. I still haven’t seen any documentation on how to implement/debug systems that use deadlocks as a synchronization mechanism, I’d like to see some if some of you know of any drop a comment or a mail if you know of anything that would contribute.

For people who are interested in the technical details of the implementation,

Thread/lock implementation, you need a mutex that’s unlockable from any thread(not just the locked thread). I do know of RW locks but the rwlock() and rdlock() did not suite for this purpose for me because all the existing code was written to use a single lock() function.

Decoder thread code, the implementations of rewind() and pause() are now the same because of some fixes, this was done because to keep the API stable, it’ll be changed soon.

Playback thread code

Single header utility library for C++

For now it only includes string functions such as, split, to_string, from_string, to_upper, to_lower …etc. More will be added in the future, all functions are added to the std namespace if they are related to any std class, such as std::string. If they are not related they’ll be in the cpplib namespace.

License: Public Domain

https://github.com/manushanga/cpplib

UTF-16 reading and writing with C

It seems almost no one does this kind of stuff with C, it was very hard to find material for this on the net. So I thought of posting it so that everyone can benefit. The code is for an XML/HTML tag stripper, which strips all tags. Make sure you keep the error checker intact, it spits out some uncommon errors that at first glance you won’t know what it’s talking about.

A major plus point in using “ccs=UTF16[BE|LE]” is that it handles endianness and BOM automatically. But you have to know what you are dealing with, I *think* “ccs=UTF16” has Native Endianness, which means LE for majority of PCs that are in use the files I got to strip were BE and LE mixed so I had to check BOM manually; 0xFF 0xFE is Little Endian.

Note: This code was only tested on Linux, it should work without any changes on any POSIX system. Windows is a different story as always it may need some changes in “ccs=…” part I saw in some material UTF-16BE/UTF-16LE has been used, but it didn’t work on Linux that way if this code doesn’t work on Windows you might have to change to that.

#include <stdio.h>
#include <wchar.h>
#include <errno.h>
#include <string.h>

int main(int argc, char *argv[])
{
	int i;
	for (i=1;i<argc;i++){
		printf("%s\n",argv[i]);
		FILE *o=fopen(argv[i],"r");
		FILE *f;
		if (fgetc(o)!=0xFF){
			f=fopen(argv[i],"r,ccs=UTF16BE");	
		} else {
			f=fopen(argv[i],"r,ccs=UTF16LE");

		}
		fclose(o);
		char new[32];
		sprintf(new,"./striped/%s.txt",argv[i]);
		FILE *g=fopen(new,"w,ccs=UTF16LE");

		int write=1;
		wint_t c=fgetwc(f);
		while (c!=WEOF) {
			if (c==L'<')
				write=0;

			if (write){
				fputwc(c,g);
			}
			if (c==L'>')
				write=1;

			c=fgetwc(f);

		}
		printf("%s\n",strerror(errno));
		fclose(g);
		fclose(f);

	}
	return 0;
}

Unix toolset for Windows

If you need a Unix toolset on Windows install these packages on Windows and you’ll be having a Unix toolset and a more Unixy environment for Free!(most tools listed are FOSS ones)

MinGW

Needs no introduction, you know what it does. Just make sure you get “mingw-get” or its GUI installer it’s a hell of a deal that does the installation of MinGW in package manager style(the biggest thing you’ll ever miss when building stuff with dependencies you don’t even have a clue how to build successfully). It doesn’t include pkg-config, you can find a build for Windows if you know how to use google. I’ve tried to install zsh on MinGW some headers were lacking and adding cygwin’s version didn’t solve anything because it can’t make use of MinGW style paths.

ConEmu

Best alternative with split console windows.

Dr Memory

Valgrind like memory debugger for Windows(more goodies here).

AutoHotKey(freeware)

Bind a hotkey to console, we just need so many consoles.

qtcreator with cmake

You don’t need the Qt libraries if you aren’t going do a Qt project, but you can still use qtcreator, with cmake or make. You can link MinGW install to qtcreator with relative ease. And its best to make cmake from source and have it in MinGW other than having it installed on Windows, this makes cmake find the dependencies from MinGW distro.

Wrapper for fopen in Windows and OSes that implement POSIX

This is a wrapper that I wrote for Vrok to make fopen behave sanely and accept UTF8 strings on all systems. The defines FOPEN_RB, FOPEN_WB…etc had to be defined since systems other than Windows do not have a binary switch for fopen. You can add this code to a header and use fopenu instead of fopen, on POSIX systems there will be no overhead for calling fopenu since it’s just a #define.

#include <cstdio>
#if defined(_WIN32) && defined(UNICODE)
#include <winnls.h>
inline FILE *fopenu(const char *path,const char *opt){
    wchar_t wpath[1024];
    wchar_t wopt[8];
    MultiByteToWideChar(CP_UTF8, 0, path, -1, wpath, 1024);
    MultiByteToWideChar(CP_UTF8, 0, opt, -1, wopt, 8);

    return _wfopen(wpath, wopt);
}
#define FOPEN_RB "rb"
#define FOPEN_WB "wb"
#define FOPEN_AB "wb+"
#else
#define fopenu fopen
#define FOPEN_RB "r"
#define FOPEN_WB "w"
#define FOPEN_AB "w+"
#endif