Here, I discuss the most widespread piece of knowledge on Matlab : its inherent slowness with for loop. I am sure many of you will be surprised to know that it is no longer the case (at least to the extent people used to refer it to). I demonstrate this statement, explain why, and discuss some of the consequences of this fact.

It is friday night, you are invited to a cocktail party with some of your colleagues. At some point during the evening, a discussion is randomly started about programming languages. You mention that you use Matlab for your data analysis and one of your colleague looks down on you with disdain :

**Him :**“Really? What a terrible language! It’s so terrible at for loops. How can you stand it?”**You :**“I am used to vectorize my code now. It’s not so bad when you get at it.**Someone picking up the conversation :**“Actually, now, Matlab is not so bad at for loops anymore.**You and Him :**“What are you talking about?”**Someone :**“Yes, it’s because of the JIT.

Who is right? As usually in these cocktails, everyone.

As an interpreted language, Matlab is supposedly slow at for loops as it has to read each line of code including all the lines generated by the for loop. But this was true many years ago. Matlab is not a pure interpreted language anymore and incorporate some JIT (for Just In Time) compilation routines. Before we go into the details of what this is, let me demonstrate.

I need a very strong case for this as the slowness of Matlab at for loops is really a strong and established knowledge that is still being taught in universities. I am going to take the best exemple of all : Matlab documentation itself.

To demonstrate the power of vectorizing over for loops, Mathworks used the following example in both their online page and the documentation that is provided to all Matlab installations:

% For loop form i = 0; for t = 0:.01:10 i = i + 1; y(i) = sin(t); end % Vectorized form t = 0:.01:10; y = sin(t);

Nothing fancy here.

To demonstrate my case, I am going to make some very slight modifications to this code. One will be to separate memory allocation from the for loop itself. I am going to encapsulate each calculation in a different function to clearly separate both memory space AND I will add TIC/TOC statements to measure the actual time spent, exactly as it is advised on the documentation:

The second example executes much faster than the first and is the way MATLAB is meant to be used. Test this on your system by creating scripts that contain the code shown, and then using the

ticandtocfunctions to measure the performance.

So the new code is :

function vecto TestVec TestFor function TestVec y=zeros(1,1001); tic; % Vectorized form t = 0:.01:10; y = sin(t); toc; function TestFor y=zeros(1,1001); tic; % For loop form i = 0; for t = 0:.01:10 i = i + 1; y(i) = sin(t); end toc;

Here is what I get :

>> vecto

Elapsed time is 0.000044 seconds.

Elapsed time is 0.000063 seconds.

The for loop version takes about 63 us, i.e. 21 us more than the vectorized version. One would say. Q.E.D. , Matlab is BAD at for loop. Yes, BUT it used to be worse, much much much much worse. I don’t personaly consider this slow (and I am known to be demanding on calculating speed).

As I said, Matlab is not purely an interpreted language anymore. Some form of pre-compilation is ran each time you start a M-file now. This is the infamous JIT for Just-In-Time compilation. Mathworks decided not to document this part of their program as they don’t want you to rely on the JIT. So it is very difficult to know exactly how it works. Apparently it is a fast moving target in the source code. You can discuss this choice (I personaly dislike it very much) but a clear consequence of this is that few people really know about it and Matlab extreme slowness with For loops is still being taught (and discuss in cocktail parties).

This being said, how bad was Matlab in the past with for loop?

You can get a clue at this using another undocumented command :

feature accel off

Once you have executed this, the JIT is deactivated. The new result of our little ‘vecto’ function is :

>> vecto

Elapsed time is 0.000066 seconds.

Elapsed time is 0.001755 seconds.

Yes, without the JIT, the for loop is **30 times** slower. I told you it used to much much much much worse! I don’t personally consider for loop to be so bad nowadays.

Then, if even the one example from Mathworks is not convincing anymore, why vectorizing?

For two important reasons :

- It is a much more elegant way to do mathematic and is usually easier to maintain and read.
- The whole computer industry is going to multi-core and most of Matlab routines are now inherently multi-core. The for loop will not be using your multiple cores while the vectorized code will.

Indeed, if you push our previous code a little to much more elements like this :

function vecto TestVec TestFor function TestVec y=zeros(1,1000001); tic; % Vectorized form t = 0:.00001:10; y = sin(t); toc; function TestFor y=zeros(1,1000001); tic; % For loop form i = 0; for t = 0:.00001:10 i = i + 1; y(i) = sin(t); end toc;

You should get something like :

>> vecto

Elapsed time is 0.014492 seconds.

Elapsed time is 0.062195 seconds.

The difference between the for loop version and vectorized version is increasing a lot. What’s happening?

I have a dual core and the vectorized version is making use of it nicely whereas the for loop version not. Luckily you can also deactivate the multi-core by re-launching matlab in single core mode using :

matlab -singleCompThread

And this time, you will get :

>> vecto

Elapsed time is 0.032697 seconds.

Elapsed time is 0.059152 seconds.

Nice, isn’t it? The vectorized version is now two times slower and the difference between the two codes is now similar to our previous run with fewer elements.

The first conclusion to all this is that Matlab is not really slow at for loop anymore. So make use of them.

The second conclusion is that you should still vectorize to make use of multiple cores.

And the last conclusion is that you have a lot to say at your next cocktail party!

**Code of "Matlab is no longer slow at for loops" 1.35 KB**

I’m not sure if you’re right about multicores. The main benefit of vectorising is that you make use of Single Instruction Multiple Data routines in the floating point units…each core in modern computers is accompanied by its own floating point unit that does arithmetic quickly.

Basically these SIMD routines take huge contiguous blocks of data and apply low level functions (e.g. sine, plus) to the data in batches of 16 or more at a time…depending on how good your computer is. The routines are also good at optimizing use of all the relevant caches so that there is no waiting around for the next batch of data. Things to google: SIMD, BLAS, SSE.

There are many different levels at which parallelisation can occur, but vectorising is synonymous with SIMD.

I am not sure I see your point. I do compare the vectorized code with a single worker and two workers using the command “matlab -singleCompThread”. The exact way it makes use of both cores is not necessarily obvious (it will depend on my particular processor I believe) but I would tend to think it does.

OMG!

I can’t believe that 😐

https://dl.dropboxusercontent.com/u/21297963/omg.PNG

Thanks, you make my point even clearer!

memory preallocation helps a lot.

I also noticed not much difference with a single for loop between matlab and compiled code. However, with a nested for loop, the compiled equivalent was much faster.

Can you show that with an example? Perhaps you can take the same examples you used above and post the processor usage pics here.

I never saw matlab use multiple cores ever since dual core/thread processors debuted in 2004 although my codes are vectorised to a high degree. That is of course without PCT.

I suppose you didn’t use the right functions. Matlab has been inherently multi-threaded for a while.

see here : http://www.mathworks.com/support/solutions/en/data/1-4PG4AN/?solution=1-4PG4AN.

Note that SIN is in the list of multi-threaded functions.

Just checked and I’m able to see multicore usage. With vectorization, CPU usage jumps to 200% (as shown by CPU activity monitor). Whats interesting is, without vectorization, Matlab still uses two different cores during computation but almost exclusively, not simultaneously.

Looks like one of the conditions wasn’t satisfied in instances I explored cpu usage earlier. My hottest pursuit to speed up computations was during 2004-2006, before multithread processing was introduced as per the article. Should be interesting to run those models on later versions. Thanks for getting back.

Elapsed time is 0.000044 seconds.

Elapsed time is 0.000063 seconds.

(.000063-.000044)/.000044 *100% = %43.1818

Elapsed time is 0.032697 seconds.

Elapsed time is 0.059152 seconds.

(.059152 -.032697)/.032697 *100% = %80.9095

>I don’t personaly consider this slow

What *do* you consider slow? I’d say a 40-80% increase in runtime is reason enough to avoid for loops.

Sorry, the point of this post is to introduce the JIT to most Matlab programmers that were not aware of its existence. The particular code you choose is always a balance between code readability and efficiency. I reported, as well as others, that in some conditions for loop performs their job better than thought, sometimes going as fast as vectorized code. This represent a significant change in our usage of Matlab from when for loop were on average 3000% longer.

Pingback: Python:for loop in python is 10x slower than matlab – IT Sprite

Pingback: MatLab | New ThinKing

The used example is probably not ideal because Octave simply vectorise this for statement and the excecutions times are the same. When I used simple y(i) = y(i-1) in for statement, the execution time was extremly bad! You can try it on Matlab : ))

I’m running Matlab R2015b, and I encountered some very strange results when I ran the above experiment. Details below:

I first created a function titled vecto.m whose contents are:

function [ ] = vecto()

TestVec

TestFor

end

function [] = TestVec()

y=zeros(1,1001);

tic;

% Vectorized form

t = 0:.01:10;

y = sin(t);

toc;

end

function [] = TestFor()

y=zeros(1,1001);

tic;

% For loop form

i = 0;

for t = 0:.01:10

i = i + 1;

y(i) = sin(t);

end

toc;

end

Then I typed vecto in the matlab command line and here was my first result:

>> vecto

Elapsed time is 0.054629 seconds.

Elapsed time is 0.021166 seconds.

Somehow, TestFor was twice as fast.

But stranger still, was what happened when I didn’t change anything else, and simply ran vecto ~10 more times, and here were the results:

>> vecto

Elapsed time is 0.037829 seconds.

Elapsed time is 0.001041 seconds.

>> vecto

Elapsed time is 0.000068 seconds.

Elapsed time is 0.000086 seconds.

>> vecto

Elapsed time is 0.000062 seconds.

Elapsed time is 0.000065 seconds.

>> vecto

Elapsed time is 0.000109 seconds.

Elapsed time is 0.000099 seconds.

>> vecto

Elapsed time is 0.000064 seconds.

Elapsed time is 0.000065 seconds.

>> vecto

Elapsed time is 0.000066 seconds.

Elapsed time is 0.000065 seconds.

>> vecto

Elapsed time is 0.000071 seconds.

Elapsed time is 0.000058 seconds.

>> vecto

Elapsed time is 0.000062 seconds.

Elapsed time is 0.000057 seconds.

>> vecto

Elapsed time is 0.000061 seconds.

Elapsed time is 0.000064 seconds.

>> vecto

Elapsed time is 0.000070 seconds.

Elapsed time is 0.000058 seconds.

>> vecto

Elapsed time is 0.000061 seconds.

Elapsed time is 0.000057 seconds.

>> vecto

Elapsed time is 0.000061 seconds.

Elapsed time is 0.000057 seconds.

>> vecto

Elapsed time is 0.000061 seconds.

Elapsed time is 0.000059 seconds.

>> vecto

Elapsed time is 0.000066 seconds.

Elapsed time is 0.000063 seconds.

Notice the 2nd result where TestVec was 37 times slower.

Why is this happening?

Notice also that every subsequent iteration the times are going down (and eventually appears to stabilize), but still on many of them TestVec is slower!

Question: Why are subsequent calls of the function faster?

Question: Why is TestVec slower in my example?

I’m not doubting the validity of your article, because I find everything on this site very helpful, but this is a very confusing result for me. Can you please explain this?

When writing the vecto function, Matlab is warning me against pre-allocation, saying that it is not correct to use it in “line 7” which is under TestVec. Does this have something to do with it?

I really hope you can explain this… thanks in advance.

Follow up comment:

When I “pushed our previous code” further, making y=zeros(1,1000001); and t=0:.00001:10

I got the expected result that TestFor is slower.

However, it was about 4 to 6 times slower! So now, I’m even more confused about this article. I thought JIT would make TestFor operate just slightly slower than TestVec.

These were the explicit results:

>> vecto2

Elapsed time is 0.011950 seconds.

Elapsed time is 0.057968 seconds.

>> vecto2

Elapsed time is 0.009618 seconds.

Elapsed time is 0.056805 seconds.

>> vecto2

Elapsed time is 0.010805 seconds.

Elapsed time is 0.043946 seconds.

>> vecto2

Elapsed time is 0.010209 seconds.

Elapsed time is 0.060344 seconds.

>> vecto2

Elapsed time is 0.011321 seconds.

Elapsed time is 0.056583 seconds.

>> vecto2

Elapsed time is 0.012950 seconds.

Elapsed time is 0.060555 seconds.

>> vecto2

Elapsed time is 0.012303 seconds.

Elapsed time is 0.060541 seconds.

>> vecto2

Elapsed time is 0.010759 seconds.

Elapsed time is 0.058935 seconds.

>> vecto2

Elapsed time is 0.011128 seconds.

Elapsed time is 0.058968 seconds.

>> vecto2

Elapsed time is 0.007729 seconds.

Elapsed time is 0.036107 seconds.

Hi,

Thanks for sharing your finding! The JIT is a moving undocumented target so it’s hard to know how things evolve after each Matlab release.

1-The fact that the second run is much faster than the first one is exactly due to the JIT. The second run is actually using compiled code.

2-I don’t know why you find a larger difference when increasing the object size. I would be curious to know if you have many cores on your machine. Following the article findings on vectorize code better using multiple cores, you should have a 6 core cpu (which is pretty common these days).