Loading Tiff files has been slow for many years in Matlab. With the recent introduction of the TIFF library, things have improved a lot. But still, when it comes to loading large dataset stored in Tiff files, Matlab functions are not as good as they could be. Today I am going to introduce a few lines of codes that will make all of this past history for good.
Quite some time ago, I introduced “inlining” as a way to efficiently boost your code efficiency. the basic principle is to look for Matlab functionalities that are not built-in functions. If it happens that these are slowing down your calculations, you can access the underlying M-file and fish in the only few lines of codes that are relevant to you. Today’s post is basically a comprehensive example of this technique so that, even if you don’t care about Tiff files, this post is interesting as a guideline for optimization
Okay, now that the introduction is done, let’s dig in.
Let’s suppose you have a TIff file named ImageStack.tif with a series of images stored in it. In good old Matlab code, you would use this code to load it in a 3D matrix :
FileTif='ImageStack.tif'; InfoImage=imfinfo(FileTif); mImage=InfoImage(1).Width nImage=InfoImage(1).Height NumberImages=length(InfoImage) FinalImage=zeros(nImage,mImage,NumberImages,'uint16'); for i=1:NumberImages FinalImage(:,:,i)=imread(FileTif,'Index',i); end
imfinfo is used to get the size of the movie stack to preallocate the big matrix. Nothing is especially fancy in this code. This would be the way most people load a tif stack if there was no performance issues.
With this particular code and a decent dataset of 1575 images (256 by 256 pixels) in a single Tiff file, it takes approximately 200 seconds to run on my computer.
To give you an idea how awful this is, ImageJ, a very widely used software in microscopy, takes approximately 3 seconds to load the same stack.
To help solve this issue, Mathworks modified imread to allow feeding some additional info and avoid some overhead within imread, as mentioned in the help :
Note: When reading images from a multi-image TIFF file, passing the output of imfinfo as the value of the ‘Info’ argument helps imread locate the images in the file more quickly.
So the new version of the code, a couple years ago was :
FileTif='ImageStack.tif'; InfoImage=imfinfo(FileTif); mImage=InfoImage(1).Width; nImage=InfoImage(1).Height; NumberImages=length(InfoImage); FinalImage=zeros(nImage,mImage,NumberImages,'uint16'); for i=1:NumberImages FinalImage(:,:,i)=imread(FileTif,'Index',i,'Info',InfoImage); end
This new code, on my computer takes 40 seconds. A big improvements. But we are still quite far from ImageJ and its 3 seconds.
For a few years, nothing happened on this front. Many voices were raised to clearly bring this issue to a higher priority at Mathworks.
And then, a miracle happened (Hallelujah!) with Matlab 2011!
Mathworks decided to port to Matlab the TIFF library. The new code to read TIff stack is now :
FileTif='ImageStack.tif'; InfoImage=imfinfo(FileTif); mImage=InfoImage(1).Width; nImage=InfoImage(1).Height; NumberImages=length(InfoImage); FinalImage=zeros(nImage,mImage,NumberImages,'uint16'); TifLink = Tiff(FileTif, 'r'); for i=1:NumberImages TifLink.setDirectory(i); FinalImage(:,:,i)=TifLink.read(); end TifLink.close();
Again, a big improvement, now the very same file is loaded in 19 seconds. Still this is rather slow compare to ImageJ so I decided to really push on this front as I am loading Tiff files in Matlab many times, every single day, some having many more images than 1575 (up to 10000 and more).
It turned out, as you dig in Mathworks implementation of the TIFF library that they did a very poor job at limiting the overhead when dealing with TiFF stacks. This is especially annoying as I believe that the main advantage of this move was to get faster at stacks.
Indeed when you run the profiler on this code. You should get this :
As you can see, the number one process is Tiff.getTag. getTag is used to get some properties of the image. So they actually duplicated the mistake they did with imread as this function is being called 28350 times to read my stack!
What we want to do now is to use the profiler to select the pieces of code that are relevant and get rid of the rest. So within the profiler I clicked on Tiff.read, I realized that Tiff.read makes a call to Tiff.readAllStips which also make many calls to Tiff.readEncodedStrip and there, deeply buried within a loop that goes over all the pixels of the data, there was the real call to tifflib, the original compiled library.
This is a golden example on why inlining can be extremely efficient.
So I went through all these functions, copied and pasted some code and tried to make a new loader that makes smarter usage of the TIfflib for stacks. This is the new code :
FileTif='ImageStack.tif'; InfoImage=imfinfo(FileTif); mImage=InfoImage(1).Width; nImage=InfoImage(1).Height; NumberImages=length(InfoImage); FinalImage=zeros(nImage,mImage,NumberImages,'uint16'); FileID = tifflib('open',FileTif,'r'); rps = tifflib('getField',FileID,Tiff.TagID.RowsPerStrip); for i=1:NumberImages tifflib('setDirectory',FileID,i); % Go through each strip of data. rps = min(rps,nImage); for r = 1:rps:nImage row_inds = r:min(nImage,r+rps-1); stripNum = tifflib('computeStrip',FileID,r); FinalImage(row_inds,:,i) = tifflib('readEncodedStrip',FileID,stripNum); end end tifflib('close',FileID);
What this codes does is to bypass the M-file wrapper wrote by Mathworks (the one that is very bad at stacks) around their built-in MEX file. So I make direct call to tifflib now.
The problem is that Matlab does not place tifflib within your search path, so you MUST copy the compiled libraries from your own distribution of Matlab into your function folder. On my mac, this file is at :
/Applications/MATLAB_R2011b.app/toolbox/matlab/imagesci/private and is called tifflib.mexmaci64. I copied this file into the folder where my M-file code is located.
This also means that, in my case, this function will work only on MAC 64 bits until I copy the mex files for the other distributions.
Keep in mind that I also removed lots of overhead to check the particular tiff types (in this example it is a chunky file) so you might want to create several loader depending on the file type (instead of checking the file type at every pixel like Mathworks did). The current code works for my particular application.
With this in mind, using TIC/TOC routines, this codes now takes 1.5 seconds. Yes, I am not joking, Matlab is now FASTER than ImageJ.
I hope Mathworks is reading this for their next release… They might consider changing their wrapper as I am not the only one around that use TIff stacks…
NOTA : Mathworks released (a few months after I posted about it) a bug correction of the TIFF class to deal with this issue. The new class is far better and gives very decent loading time. I recommend you download it and overload your local copy of Tiff.m with the bug fix. Direct call to these new libraries still provides a little boost but not as drastic.