I would like today to talk about one very important concept that is often overlooked when you learn to use a computer for data analysis : Rounding errors. In Matlab as in other languages, numbers are represented as a series of 0 and 1 in a way that depends on its type (double or int16, for instance). Each number type is inherently designed to provide a certain precision. Because of this, making mathematical operations on these numbers behave differently than what you expect from simple mathematical formulas. In this post, I hope to allow you to identify these behaviors and avoid all associated problems.
To get you acquainted with this problem, let’s immediately start with some code. Let’s try the following commands at the command line :
This is expected. x and y are clearly different numbers so their comparison gives 0.
Now, please try :
Again, expected. y is very slightly different than y (by 1e-15).
Now, and I am sure you see where I am going, let’s try the following :
y is now different from x by 1e-16 and apparently this is too small for Matlab to notice the difference. Why, you might ask? Simply said, x and y are stored, by default in Matlab, as double precision floating point numbers. This means that their values are stored on 8 bytes or a serie of 64 0 or 1. As a result of this, it cannot use an infinite precision as required for mathematical real numbers. Matlab actually provide you with a function to check for the minimal difference noticeable between two numbers: eps. Given the floating point design, it is different for every number. Please now try :
Now, we understand why going below 1e-15, Matlab started to mess things around.
I am sure you are thinking, this is way too small, why should I care about this?
First, if you use single instead of double, the precision will actually drop to 1e-7 (for a variable valued at 1).
Second, this sort of problems can happen more easily than you might think.
For instance, let’s suppose you record the value of a particular variable as a double and then at some point in your code, you change it to single to save some space as you create large arrays. I am going to use as it is a good real number example that is commonly used.
Now later on in your code you check that y is valued at pi :
and that can cause all sort of problems because here Matlab is, behind the scene, comparing two numbers with a different precision. In some particular context, you would not be aware at all that there is rounding error problem here as checking the numbers would give you :
These problems can cause nasty little bugs that can be very hard to find. Sometimes your program might actually run without you noticing the problem. A good historical example of this kins of numerical errors (although slightly more complicated) is the famous explosion of the first Ariane 5. Somewhere in the rocket control system, there was a number that was recording the horizontal velocity of the rocket. For some reasons, that speed went over the range of the variable stored on the system. The control program of the rocket compared that truncated number to its expectation and thought that the rocket was presumably going out of balance (which it was not) causing the rocket to falsely correct its trajectory and get out of balance for real.
It is also my experience that these rounding errors problems occurs when you change the precision of your numbers and forget about it.
Here is an example that I found that is revealing of the problem that could have taken you a while to figure it out :
Another example using integers :
x and y are stored as integers. When you divide x by y, Matlab expects the output to be an integer as well, so the result end up being truncated to 1 (instead of 0.5). Therefore if you are acquiring some datasets stored as integers, don’t forget to use the right data type to do processing or you are likely to have a lot of rounding errors problems without any actual errors or warning displayed.
Maybe one of the most famous numerical error example is the year 2000 bug. At the time, many programs stored the year as 2 digits so that 1990 would be stored as the integer number 90. As a result, it was completely unclear how all these program would deal with the year 2000. Although many predicted the end of the world, most programs were updated and very few glitches happened.
Maybe it is worth mentioning that the U.S. Naval Observatory, which runs the master clock that keeps the country’s official time, gave the date on its website as 1 Jan 19100.