The art of vectorizing – Part 1

Vectors and matrices are at the heart of any Matlab program. Matlab is extremely efficient at doing any operation with these. Vectorizing is the art of transforming a calculation done element by element into an operation on vectors. Here I start a serie of posts on how to vectorize your code.

I had a very good teacher in chemistry when I was in college. He used to say something like this :

To understand chemistry is easy, you just have to understand the very few building blocks on which all this science is based upon. And they don’t take more space than a metro ticket…

Programming is similar, I believe. They are some important principles and all the rest is just addon knowledge built on these principles. If you were to ask me, vectorizing is one of the few Matlab core principles that should sit on this metro ticket. Some day I will make a post to actually draw this metro ticket in detail…

I already talked about why you should vectorize in numerous occasions here, here and here. So today I am going to dive straight into the hard core. How to vectorize?

They are many ways to process data in blocks and I want to cover all the tips and tricks I have used in my Matlab life so I will make as many posts as needed on this.

But I guess to start, we should talk about what I consider the most important Matlab operator : the point.

Such a small thing and yet I remember my first days with Matlab. This point was not intuitive. Part of the reason is because I was not taught Mathematic this way.

Let’s assume you have two matrices A and B of the same size. There are two ways to multiply them element by element :

A=rand(100,1);
B=rand(100,1);
for i=1:100
   C(i)=A(i)*B(i);
end

Or you could use the magical . operator. The point actually modify the multiplication operator so that it operates element by element, instead of doing a complicated matrix multiplication.

A=rand(100,1);
B=rand(100,1);
C=A.*B;

Please note that the small point. A.*B is not equal to A*B. Here A*B will give you an error because you are not using properly the matrix multiplication (mathematically correct operations would be A’*B or A*B’ here where ‘ is the transpose). So the point MODIFIES the definition of * to make a multiplication of the entire A and B, element by element.

If you were to ask me, I believe this should be the default behavior of matrix multiplication but I can’t restart Mathematic history…

The point is (I couldn’t resist…) that this is EXTREMELY usefull and is probably one of the simplest form of vectorization you could do. The vectorize version A.*B is much faster than the for loop as Matlab operates on the entire matrix at once via its built-in routines.

In the same way, you can modify the behavior of mathematical operators like / 0r ^.

So you could equally do A./B to divide element by element and A.^B to take the power of each element of A with the corresponding element in B.

Already you can do a lot with this.

Vectorizing is more or less the art of getting rid of for loops. With this point operator, there is a type of for loops that you can easily simplify.

For instance let’s assume you are doing the following :

A=rand(100,1);
B=rand(100,1);
for i=1:100
   if B(i)>0.5
      C(i)=A(i)^2;
   else
      C(i)=exp(B(i));
   end
end

Vectorizing this is straight forward if you make use of the point and create an intermediate boolean matrix D.

A=rand(100,1);
B=rand(100,1);
D=(B>0.5);
C=D.*(A.^2)+(~D).*exp(B);

Here D is a matrix of 1 and 0 of the same size of B. For each element in B that is superior to 0.5, there is a 1 or true in the corresponding element in D (and vice versa).

~ is the logical not operator so it is the exact opposite of D.

C is the sum of two matrix which are filled with A(i)^2 when D(i) is 1 or exp(B(i)) when ~(D(i)) is 1.

Related Posts

This entry was posted in Beginners, Optimizing your code. Bookmark the permalink.

7 Responses to The art of vectorizing – Part 1

  1. Brian says:

    Hi, I would like you to know that I have found your website really useful, and even enjoyable. over the years, I look topics up via Google and I’ve come across your website so many times that I recognize it without reading anything on the page! I’m finally going to bookmark your website, but, I’ve been learning from you all along. I’m an undergrad by the way; Applied Math and Materials Engineering major – thank you for your hard work.

  2. Richard says:

    I have always heard vectorizing is the way go but never tried too hard to go out of my way to make sure I was doing it — I guess because what I was doing just wasn’t that time sensitive. Today I thought I’d try out your suggestion. However I’m seeing some unusual results.

    Can you think of any reason why if I use tic…toc around the 2 examples you give above, the for loop comes out on top?

    Here’s the code:

    A=rand(100,1);
    B=rand(100,1);

    tic
    for i=1:100
    if B(i)>0.5
    C(i)=A(i)^2;
    else
    C(i)=exp(B(i));
    end
    end
    toc

    tic
    D=(B>0.5);
    C=D.*(A.^2)+(~D).*exp(B);
    toc

    Here’s the results:

    Elapsed time is 0.000007 seconds.
    Elapsed time is 0.000777 seconds.

    Am I missing something?

    • Jerome says:

      Hi Richard,

      I think you are seeing a mixture of preallocation time + JIT (for Just In Time) compilation. JIT can speed up computation of for loop quite significantly. The JIT is especially efficient on these simple for loops.

      Try to read my post on FOR loop here :
      http://www.matlabtips.com/matlab-is-no-longer-slow-at-for-loops/

      Try to deactivate the JIT and compare. Also typically, the first time you run the code should be much slower (especially on the for loop side).

      • Richard says:

        Hey Jerome,

        Thanks for the reply. That does make a lot of sense. I think I did run it a couple times before the numbers in my original post and I had about the JIT improving things. I have an older version of MATLAB kicking around on another computer that I may try this on out of curiosity, as well.

        Thanks again for the reply and for this site (I just stumbled upon it and plan on digging around a bit more).
        Cheers

  3. Pieter van Vugt says:

    I tried the following code, but i ran into trouble. A as a vector of real numbers, some of which are NaN, because sometimes a division by zero occurs when calculating A. I want to replace these NaN’s with zeros.

    NANs=(isnan(A));
    A=(~NANs).*A + (NANs).*0;

    It doesn’t seem to work, because, 0 * NaN = NaN.

    Do you have a clever way to do this in a vectorized manner, or do I have to resort to a for-loop?

    • Jerome says:

      A(isnan(A))=0 ?

      • Pieter van Vugt says:

        Aah! I didn’t expect that to work, because I know e.g. “A([0 1])=0″ returns an error because the first index is zero. However your suggestion works because isnan() returns an array of logicals the same size as A, and not an array of numbers.

        Brilliant in its simplicity! Thanks

Leave a Reply