Here, I discuss the most widespread piece of knowledge on Matlab : its inherent slowness with for loop. I am sure many of you will be surprised to know that it is no longer the case (at least to the extent people used to refer it to). I demonstrate this statement, explain why, and discuss some of the consequences of this fact.

It is friday night, you are invited to a cocktail party with some of your colleagues. At some point during the evening, a discussion is randomly started about programming languages. You mention that you use Matlab for your data analysis and one of your colleague looks down on you with disdain :

**Him :**“Really? What a terrible language! It’s so terrible at for loops. How can you stand it?”**You :**“I am used to vectorize my code now. It’s not so bad when you get at it.**Someone picking up the conversation :**“Actually, now, Matlab is not so bad at for loops anymore.**You and Him :**“What are you talking about?”**Someone :**“Yes, it’s because of the JIT.

Who is right? As usually in these cocktails, everyone.

As an interpreted language, Matlab is supposedly slow at for loops as it has to read each line of code including all the lines generated by the for loop. But this was true many years ago. Matlab is not a pure interpreted language anymore and incorporate some JIT (for Just In Time) compilation routines. Before we go into the details of what this is, let me demonstrate.

I need a very strong case for this as the slowness of Matlab at for loops is really a strong and established knowledge that is still being taught in universities. I am going to take the best exemple of all : Matlab documentation itself.

To demonstrate the power of vectorizing over for loops, Mathworks used the following example in both their online page and the documentation that is provided to all Matlab installations:

% For loop form i = 0; for t = 0:.01:10 i = i + 1; y(i) = sin(t); end % Vectorized form t = 0:.01:10; y = sin(t);

Nothing fancy here.

To demonstrate my case, I am going to make some very slight modifications to this code. One will be to separate memory allocation from the for loop itself. I am going to encapsulate each calculation in a different function to clearly separate both memory space AND I will add TIC/TOC statements to measure the actual time spent, exactly as it is advised on the documentation:

The second example executes much faster than the first and is the way MATLAB is meant to be used. Test this on your system by creating scripts that contain the code shown, and then using the

ticandtocfunctions to measure the performance.

So the new code is :

function vecto TestVec TestFor function TestVec y=zeros(1,1001); tic; % Vectorized form t = 0:.01:10; y = sin(t); toc; function TestFor y=zeros(1,1001); tic; % For loop form i = 0; for t = 0:.01:10 i = i + 1; y(i) = sin(t); end toc;

Here is what I get :

>> vecto

Elapsed time is 0.000044 seconds.

Elapsed time is 0.000063 seconds.

The for loop version takes about 63 us, i.e. 21 us more than the vectorized version. One would say. Q.E.D. , Matlab is BAD at for loop. Yes, BUT it used to be worse, much much much much worse. I don’t personaly consider this slow (and I am known to be demanding on calculating speed).

As I said, Matlab is not purely an interpreted language anymore. Some form of pre-compilation is ran each time you start a M-file now. This is the infamous JIT for Just-In-Time compilation. Mathworks decided not to document this part of their program as they don’t want you to rely on the JIT. So it is very difficult to know exactly how it works. Apparently it is a fast moving target in the source code. You can discuss this choice (I personaly dislike it very much) but a clear consequence of this is that few people really know about it and Matlab extreme slowness with For loops is still being taught (and discuss in cocktail parties).

This being said, how bad was Matlab in the past with for loop?

You can get a clue at this using another undocumented command :

feature accel off

Once you have executed this, the JIT is deactivated. The new result of our little ‘vecto’ function is :

>> vecto

Elapsed time is 0.000066 seconds.

Elapsed time is 0.001755 seconds.

Yes, without the JIT, the for loop is **30 times** slower. I told you it used to much much much much worse! I don’t personally consider for loop to be so bad nowadays.

Then, if even the one example from Mathworks is not convincing anymore, why vectorizing?

For two important reasons :

- It is a much more elegant way to do mathematic and is usually easier to maintain and read.
- The whole computer industry is going to multi-core and most of Matlab routines are now inherently multi-core. The for loop will not be using your multiple cores while the vectorized code will.

Indeed, if you push our previous code a little to much more elements like this :

function vecto TestVec TestFor function TestVec y=zeros(1,1000001); tic; % Vectorized form t = 0:.00001:10; y = sin(t); toc; function TestFor y=zeros(1,1000001); tic; % For loop form i = 0; for t = 0:.00001:10 i = i + 1; y(i) = sin(t); end toc;

You should get something like :

>> vecto

Elapsed time is 0.014492 seconds.

Elapsed time is 0.062195 seconds.

The difference between the for loop version and vectorized version is increasing a lot. What’s happening?

I have a dual core and the vectorized version is making use of it nicely whereas the for loop version not. Luckily you can also deactivate the multi-core by re-launching matlab in single core mode using :

matlab -singleCompThread

And this time, you will get :

>> vecto

Elapsed time is 0.032697 seconds.

Elapsed time is 0.059152 seconds.

Nice, isn’t it? The vectorized version is now two times slower and the difference between the two codes is now similar to our previous run with fewer elements.

The first conclusion to all this is that Matlab is not really slow at for loop anymore. So make use of them.

The second conclusion is that you should still vectorize to make use of multiple cores.

And the last conclusion is that you have a lot to say at your next cocktail party!

**Code of "Matlab is no longer slow at for loops" 1.35 KB**

I’m not sure if you’re right about multicores. The main benefit of vectorising is that you make use of Single Instruction Multiple Data routines in the floating point units…each core in modern computers is accompanied by its own floating point unit that does arithmetic quickly.

Basically these SIMD routines take huge contiguous blocks of data and apply low level functions (e.g. sine, plus) to the data in batches of 16 or more at a time…depending on how good your computer is. The routines are also good at optimizing use of all the relevant caches so that there is no waiting around for the next batch of data. Things to google: SIMD, BLAS, SSE.

There are many different levels at which parallelisation can occur, but vectorising is synonymous with SIMD.

I am not sure I see your point. I do compare the vectorized code with a single worker and two workers using the command “matlab -singleCompThread”. The exact way it makes use of both cores is not necessarily obvious (it will depend on my particular processor I believe) but I would tend to think it does.

OMG!

I can’t believe that 😐

https://dl.dropboxusercontent.com/u/21297963/omg.PNG

Thanks, you make my point even clearer!

memory preallocation helps a lot.

I also noticed not much difference with a single for loop between matlab and compiled code. However, with a nested for loop, the compiled equivalent was much faster.

Can you show that with an example? Perhaps you can take the same examples you used above and post the processor usage pics here.

I never saw matlab use multiple cores ever since dual core/thread processors debuted in 2004 although my codes are vectorised to a high degree. That is of course without PCT.

I suppose you didn’t use the right functions. Matlab has been inherently multi-threaded for a while.

see here : http://www.mathworks.com/support/solutions/en/data/1-4PG4AN/?solution=1-4PG4AN.

Note that SIN is in the list of multi-threaded functions.

Just checked and I’m able to see multicore usage. With vectorization, CPU usage jumps to 200% (as shown by CPU activity monitor). Whats interesting is, without vectorization, Matlab still uses two different cores during computation but almost exclusively, not simultaneously.

Looks like one of the conditions wasn’t satisfied in instances I explored cpu usage earlier. My hottest pursuit to speed up computations was during 2004-2006, before multithread processing was introduced as per the article. Should be interesting to run those models on later versions. Thanks for getting back.

Elapsed time is 0.000044 seconds.

Elapsed time is 0.000063 seconds.

(.000063-.000044)/.000044 *100% = %43.1818

Elapsed time is 0.032697 seconds.

Elapsed time is 0.059152 seconds.

(.059152 -.032697)/.032697 *100% = %80.9095

>I don’t personaly consider this slow

What *do* you consider slow? I’d say a 40-80% increase in runtime is reason enough to avoid for loops.

Sorry, the point of this post is to introduce the JIT to most Matlab programmers that were not aware of its existence. The particular code you choose is always a balance between code readability and efficiency. I reported, as well as others, that in some conditions for loop performs their job better than thought, sometimes going as fast as vectorized code. This represent a significant change in our usage of Matlab from when for loop were on average 3000% longer.