How and when to convert between data types

Using the right data type is an act of wisdom - Source : Image

Confucius : Do not use a cannon to kill a mosquito – Source : Image

In this post, we talk about data types and extend on our previous post on the matter. You will learn how to convert between them but more importantly when and why you should consider such things.

As often, we delve into more advanced examples of data conversion that will be of interest to a large audience. 

“Do not use a cannon to kill a mosquito”

Though Confucious didn’t have data types in mind when he penned this saying, the quote can easily be applied to the idea of using the appropriate data type for a particular application.

If you’ve read our previous post on Matlab data types, you are probably wondering how to convert between the different available data types and how you can use this to your advantage.

The two biggest reasons that people will be concerned with converting between data types are:

  • Computation Speed
  • Size in Memory

For very small matrices and small applications, you won’t see HUGE improvements if say you’re using an int8 rather than a double. The effectiveness of using the appropriate data type really is apparent when using larger matrices (think of pictures, videos, big data for artificial intelligence, huge neuronal recordings).

To help explain some of these concepts, we’ll do a simple example. First, we’ll start with the gray Confucius image above. Since images are typically represented by a matrix of values where each pixel corresponds to some integer between 0-255, it makes sense to only use uint8 as a data type. Therefore, we store it as a <900×590 uint8> matrix. What would happen if we wanted to instead represent the image with the double data type? To do this, use the following command:

>> double_confucious = double(gray_confucious);

ScreenshotNow, if you look in the variable editor, you have 2 variables (see attached workspace image), the gray image represented by a matrix of uint8 values and a gray image represented by a matrix of double values

When a value is converted from one value to another, the new value is  adjusted to fit into the new data type. Briefly, Matlab chooses the nearest element in the newer data type that fit the previous value, as you can tell in the following examples (from double to uint8) :

>> uint8(240.1)
ans = 240

>> uint8(240.7)
ans = 241

>> uint8(275)
ans = 255

Since the uint8 data type is limited to integers in the range of 0-255, the value passed to the uint8( ) conversion function will be rounded to the nearest integer. In the last example, the argument to the conversion function is larger than the 255 limit, so it is rounded to the nearest uint8 value: 255. It is very important that you remember this. Forgetting this point can lead to bugs that are usually very hard to track.

With a little background of how the conversion works, let’s explore why you would want to convert to different data types. Returning to the Confucius image example, the first thing to consider is the memory it takes to store the uint8 image versus the double image. Use the whos( ) function to look at information about your stored variables.

>> whos()

Name                     Size              Bytes Class     Attributes

double_confucious     900×598           4305600 double

gray_confucious       900×598             538200 uint8

Here you can see one of the most important reasons why you would want to specify the data type or class. The image in the double data type is 8 times as large as the uint8 image. You can see how this can be an issue with even larger sets of data such as sequential images (i.e. a movie).

Now, let’s take a look at the other main advantage of specifying the right data type: speed. In general, using the smallest data type leads to faster calculation. Indeed, at the CPU level, summing two integers represented over 8 bits requires less operations than summing two doubles represented over 64 bits, even if these are of the same values.

But, and this is also important, it is not always the case. let’s take an example of this in Matlab.

In this example, we are going to use an edge detection technique on both the uint8 image and the double image. A morphological closing is then performed on both images. The imclose function is typically used to connect unconnected lines after edge detection.

% Preallocation
im1 = zeros(size(gray_confucious),'uint8');
im2 = zeros(size(double_confucious),'double');

tic
im1 = edge(gray_confucious);
im1 = imclose(im1,strel('disk',2));
toc
tic
im2 = edge(double_confucious);
im2 = imclose(im2,strel('disk',2));
toc

Surprisingly, you get the following :

Elapsed time for the uint8 image: 0.172707 seconds
Elapsed time for the double image: 0.061692 seconds

As you can tell in the above example, the uint8 version of this operation takes over 2.5 times longer than the double version! Taking a look into the code of the edge function explains why. The default edge detector, the Sobel detector, is a derivative based method. Since this derivative is not always an integer, Matlab will be spending valuable time converting values from the uint8 input image to a double internally in the edge function.

More generally, one of the most common situation where you should upgrade you data class is when dealing with averaging and this for two reasons. We will illustrate this point with images.

  1. If you are averaging a large number of uint8 images, you need to sum all pixels together. Depending on the number of images you sum, the final value of all summed pixels will saturate and reach the end of range of your data type. You will therefore lose some precious information if you don’t store your sum properly.
  2. When creating the final averaged image, you need to make a division that should create a real number. Because you are averaging many images, you final data resolution should be better than each individual image. Therefore, you should store the final result in a higher precision class to acknowledge that your result can go beyond the initial resolution of your image (i.e. even if your images are acquired at 8 bit, you can obtain 12 or even 16 bits images). This is a technique that is extensively used in microscopy (notably intrinsic imaging for the neuroscientists).

The choice of the right data type is therefore very dependent on the types of calculations you expect to be done on that data. Once you have decided what data type your data will fit into, you can speed up your programs and use less memory by always converting to the right type.

This entry was posted in Beginners. Bookmark the permalink.

2 Responses to How and when to convert between data types

  1. Dan says:

    When I run tic/edge/toc in the reversed order (so first for double_confucious and next for gray_confucious), uint8 is faster. Do you have the same? If so, this suggests that the difference in speed does not come from the data type used.

Leave a Reply

Your email address will not be published. Required fields are marked *