Comparing Performance: stb_image vs libjpeg(-turbo), libpng and lodepng

I recently tried out Sean Barrett’s stb_image.h and was blown away by how fucking easy it is to use.
Integrating it into your project is trivial: Just add the header and somewhere do:

#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"

That’s all. (If you wanna use it in multiple files you just #include "stb_image.h" there without the #define.)

And the API is trivial too:

int width, height, bytesPerPixel;
unsigned char *pixeldata, *pixeldata2;
pixeldata = stbi_load("bla.jpg", &width, &height, &bytesPerPixel, 0);
// if you have already read the image file data into a buffer:
pixeldata2 = stbi_load_from_memory(bufferWithImageData, bufferLength,
                                   &width, &height, &bytesPerPixel, 0);
if(pixeldata2 == NULL)
    printf("Some error happened: %s\n", stbi_failure_reason());

There’s also a simple callback-API which allows you to define some callbacks that stb_image will call to get the data, handy if you’re using some kind of virtual filesystem or want to load the data from .zip files or something. And it supports lots of common image file types including JPEG, PNG, TGA, BMP, GIF and PSD.

So I wondered if there are any downsides regarding speed.

In short: (On my machine) it’s faster than libjpeg, a bit slower than libjpeg-turbo, twice as fast as lodepng (another one-file-png decoder which also has a nice API) and a bit slower than libpng. For smaller images stb_image’s performance is even closer to libpng/libjpeg-turbo. GCC produces faster code than Clang. All in all I find the performance acceptable and will use stb_image more in the future (my first “victim” was Yamagi Quake II).

The average times decoding a 4000x3000pixel image in milliseconds for GCC and clang with different optimization levels:

JPEG

libjpeg, libjpeg-turbo

I used libjpeg binaries from distributions, so compilers and optimization flags on my end didn’t make a difference.

  • Debian Wheezy's libjpeg8 8d1-deb7u1, no turbo: 130ms
  • Ubuntu 14.04's libjpeg-turbo8 1.3.0-0ubuntu2: 69ms

stb_image 2.02, using SSE intrinsics

  • clang -O0: 436ms
  • gcc -O0: 402ms
  • clang -O1: 179ms
  • gcc -O1: 97ms
  • clang -O2: 151ms
  • gcc -O2: 93ms
  • clang -O3: 150ms
  • gcc -O3: 85ms
  • gcc -O4: 85ms

Results for JPEG decoding

For JPEG, if you use clang stb_image is a bit slower than libjpeg (and a lot slower than libjpeg-turbo). If you use GCC (and at least -O1), the performance is between libjpeg and libjpeg-turbo.
Using optimization (at -O1 or more) yields significantly faster decoders than unoptimized (-O0) code (>4x as fast for GCC, almost 3x as fast for clang).

This also shows that GCC seems to optimize this much better than Clang.

So stb_image has competitive performance for loading jpegs.

Update: Test with a smaller image

I also did some tests with a 512x512pixel jpg image:

  • libjpeg-turbo: 3.21ms
  • stb clang -O0: 14.92ms
  • stb gcc -O0: 14.24ms
  • stb clang -O2: 5.19ms
  • stb gcc -O2: 3.72ms
  • stb gcc -O4: 3.33ms

libjpeg-turbo is still faster, but stb_image only takes about 16% (-O2) or 3% (-O4) longer - so it’s much closer than with the big image.

PNG

I converted the 4000x3000pixel JPEG used above to PNG with compressionlevel 9, using Gimp. The PNG is pretty big, about 16MB.

libpng 1.2

I used Ubuntu 14.04’s libpng12 (1.2.50-1ubuntu2), so again the compiler and optimization flags didn’t matter.

  • libpng12: 293ms

stb_image 2.02

  • clang -O0: 905ms
  • gcc -O0: 923ms
  • clang -O1: 455ms
  • gcc -O1: 457ms
  • clang -O2: 432ms
  • gcc -O2: 408ms
  • clang -O3: 424ms
  • gcc -O3: 394ms
  • gcc -O4: 393ms

lodepng version 20150321

  • clang -O0: 1902ms
  • gcc -O0: 1862ms
  • clang -O1: 862ms
  • gcc -O1: 814ms
  • clang -O2: 698ms
  • gcc -O2: 680ms
  • clang -O3: 676ms
  • gcc -O3: 587ms
  • gcc -O4: 581ms

Results for PNG decoding:

  • stb_image is a lot faster than lodepng, with and without compiler optimization.
  • gcc produces faster code than clang, but the difference is smaller than in the JPEG case
  • stb_image/lodepng decoders built with with -O1 are more than twice as fast as ones built without optimization (-O0)
  • libpng is fastest, optimized stb_image takes about 33-40% longer, optimized lodepng takes about 100-130% longer
  • See below: For smaller images stb_image's performance is much closer to libpng.

So, I think stb_image’s png decoding speed is still acceptable.. however png in general is kinda slow and should probably not be used for games if you have lots of (big) textures.
If you have the same picture as JPG and PNG (as I had in my tests), decoding from JPG (with stb_image) is more than 4x as fast as decoding from PNG (also stb_image; similar for libjpeg-turbo vs libpng).

If loading performance is important to you (you load that many textures that it really slows you down), you should consider using DDS or a similar format that can be directly uploaded to the GPU. (DDS can be used with and without alpha channel).
Rich Geldreich’s Crunch might be of interest: https://code.google.com/p/crunch/

Also note that if you load your game data from .zip files (like Doom3 .pk4) or another compressed archive format, compressed image files (like PNG or JPEG) loaded from such an archive will be decompressed twice: Once when loading from the .zip (or whatever) and then when decoding the image (i.e. what is benchmarked here).
This makes loading it more expensive without making the files smaller: Already compressed data usually doesn’t get any smaller when compressing it again.
There are two ways to prevent “paying twice” here:

  1. Add them to the archive uncompressed (at least zip allows you to just store files without further compression with -0), so loading them from the archive will be fast
    (=> no decompress when loading from archive, only when decoding image)
  2. You can create PNGs that are not compressed (usually by setting compression level to 0. With pngcrush you could use pngcrush -force -l 0 in.png out.png)
    Then the archiver can compress them with its own compression algorithm, which might even be better than the one used by PNG (deflate, same as zip uses)
    (=> decompressed when loading from archive, but not when decoding image)

Anyway, if PNGs work for you, using stb_image instead of libpng is feasible and might simplify both your code and your build process.

Update: Test with smaller images

I did some additional tests with a 512x512pixel png that has an alpha-channel, which is probably closer to game development requirements. Because it’s much faster to decode this, I ran 300 decode iterations instead of 100.
Furthermore, I only tested this with -O0 and -O2, which should be most relevant in practice (for debug and release builds).

  • libpng: 6.07ms
  • stb gcc -O0: 19.17ms
  • stb clang -O0: 19.96ms
  • stb gcc -O2: 6.53ms
  • stb clang -O2: 7.00ms
  • stb gcc -O4: 6.22ms
  • lodepng gcc -O0: 31.25ms
  • lodepng gcc -O2: 10.81ms

So while for the huge 24bit RGB png stb_image took about 33-40% longer to decode than libpng, for the small 32bit RGBA png it was less than 10% longer (in the optimized cases).

And some more for a 512x512 RGB picture without alpha-channel:

  • libpng: 5.00ms
  • stb gcc -O2: 4.99ms
  • stb gcc -O4: 4.69ms

In this case stb_image even is a bit faster than libpng!

How I tested:

I wrote a hacky test-program that loads an image file into a buffer and then measures how long it takes to decode that buffer 100 times with the tested codec and divided the result by 100, see imgLoadBench.c
I ran that 3 times in a row and used the best result.

I used a random 4000x3000pixel JPG (about 2.6MB) image taken with a digital camera.
For the png tests I converted it to png (about 16MB) with Gimp, using highest compression level (9).
(I also tried compression level 1 - encoding with that is faster and the resulting file is slightly bigger, but decoding actually takes longer.)

I used clang 3.6 1:3.6.1~svn232753-1~exp1 from http://llvm.org/apt/trusty/ llvm-toolchain-trusty-3.6/main and Ubuntu 14.04’s gcc 4.8 4.8.2-19ubuntu1.
Tests were executed on a Intel Haswell i7-4771 system running Linux Mint 17.1 x86_64 with Kernel 3.16.0-29-lowlatency #39-Ubuntu SMP PREEMPT.

Yeah, all this is not highly scientific, but should give a rough idea of the performance of stb_image and lodepng compared to the “normal” libjpeg, libjpeg-turbo and libpng.

stb_image: Sean Barrett’s stb_ libs on Github
lodepng: Lode Vandevenne’s LodePNG
libjpeg-turbo: Project Homepage
libpng: Project Homepage
RBDoom3BFG: I stole the code to use libpng and libjpeg for comparison there

imgLoadBench.c: My crappy test program