This post serves as an introduction to numerical computing as well as a nice overview of the current landscape of numerical calculation. I also present some of my perspectives on the future of the field which could raise general interest, even to long-time and experienced programmers. I don’t intend to be prophetic but I hope to, maybe, generate an interesting discussion on the topic.
We are living a numerical revolution. All computers are now entangled in a complicated but extremely efficient network of numerical identities. Already, this has greatly modified our way to think and work. Surprisingly enough, computers have been networked for quite a while. The personal computer era has retrospectively been quite short in the computer history. Originally, before the 80s, you would already access a computer via a terminal as can be seen in the famous TRON movie.
In any case, numerical calculation has always been at the heart of computer science history. The one and only reasons computer were initially built was to solve tedious mathematical problems. For those living in California, I highly recommend to pay a visit to the Computer History Museum on that topic. The progress we made with computers in the last 50 years are absolutely amazing. If you can’t reach that museum, I encourage you to consult the wikipedia page on this, this is a very interesting read.
Matlab played an important role in the semi-recent history of numerical calculation. But before we talk about Matlab, we should probably say a few words about FORTRAN. Fortan largely preceded Matlab in existence. It was a general purpose programming language that quickly took over as a powerful numerical calculation language in the 1950s. There are still a large number of people that are still using Fortran for their data analysis, either because they already have a large amount of code in that language or that they rely on high-performance calculations also written in Fortran.
Matlab really started in the late 1970s with the idea that numerical calculation should be easily accessible to all scientists. It started as a tool for mathematicians and engineers and expanded to all data driven aspects. Cleve Moler initially wrote Matlab as a wrapper to numerical libraries that perform linear algebra (LINPACK and EISPACK). If you are interested to learn more on the origin of Matlab, you can watch this video made by Cleve. 40 years later, Matlab now provides a comprehensive environment that contain a large number of resources and functionalities to test very quickly new algorithms as well as one of the best documentation around.
Objectively, looking at all the improvements in data analysis related issues, of all the progresses that happened in the last 20-30 years, I think I can safely say that the improvement in hardware was the most important. Moore’s law has been self-fulfilled (Moore prophesied in 1965 that transistor density would double every 2 years). As a result, processing speed, memory capacity and hard drive size are all many order of magnitude faster than what they were at the dawn of Matlab. Yourself, reading that post, have in your hand a very sophisticated machine that any data scientist would have just dreamt of having 20 years ago. All the calculation libraries that constitutes the core of Matlab has improved as well, but I doubt they just scratch the surface of the improvements we have seen in hardware.
The present and future
We are living in a very exciting time regarding data analysis. Things are changing very quickly, especially in the last few years. The present is already a foot into the future so I am going to merge my “present” and “future” paragraph into one big list of changes that are occurring right now and are shaping how we deal with data. It’s really hard to predict where we will be in 10 years from now but I will try to do my best.
I think it’s fair to say that Matlab has been quite dominating the data processing market in the 1990s. The 90s are interesting years of dominance in the computer industry. As much as Microsoft was just everywhere, Matlab was in every universities. Slowly but surely, Python and its associated numerical packages has slowly made its way and his now challenging Mathworks quite seriously. R, another free and open-source programming language is now dominating the statistical field. My neuroscience field is right now slowly transitioning from Matlab hegemony to explore many new software avenue to analyse data. The pros and cons of each and everyone preferred languages are often discussed at scientist cocktail parties. That’s our present. If you ask me, I don’t think the real challenger to Matlab is R or Python. The future is in the cloud. All major software companies have understood that data does not stay put on our old desktop computers anymore. As a result all analysis can now be run directly on powerful connected clusters of computers. The network hardware is there, the bandwidth also and people are ready to do that, but the software is just not up to the task… yet.
It’s extremely likely that the future winners of the data analysis are going to be those who understood that early. In other words, those who have already been working on performing data analysis directly online : companies like Amazon and their high performance computing plateforms or Google and their google app engine.
The advantage of the cloud is that you don’t need to worry about the hardware and that you can scale up really easily. What still needs to be done is to make these platform easy to use to as large as an audience as Matlab or Python can be. I dream of a not so-distant future where we actually don’t need to worry about which programming language to use. We would log onto a remote cluster, program our analysis and use standardized portals and object types to send data in between programming languages easily to benefit from them all. Indeed Matlab is better documented at image processing while R is just better at statistics so you want to use both! It’s only on the cloud that you can really do that.
In short, this is why I am still there, using Matlab. I don’t see the point in changing to Python or else as I know Matlab sufficiently well. I am waiting for that online revolution to unfold.
Software development is one of these things that you want the next guy to do for you. It literally take forever to do anything. You have to predict every minute detail to make sure everything is done right as computers are still quite dumb creatures. In that context, we NEED collaboration to perform more and more sophisticated data analysis. The advent of the Internet and social network has recently helped a lot on that front. Github understood very well the potential of mixing social network with code generation. Researchers are not very good at sharing programs as it takes a lot of effort and time to clean up your code and help collaborators use your work properly. But slowly but surely, this is changing. We have realized that in the face of overwhelming data analysis, we need to connect our dots to achieve our means.
To make everyone code more available, it will also be necessary to ease the creation of intuitive interfaces. The App approach, introduced by Apple and recently adopted by Matlab has still a lot potential application in that regard.
Most importantly, I believe the future in data collaborative programming is also in the hand of the clouds. As soon as more and more online analysis will be performed, it will be possible to standardize data types. As a result we will be able to easily connect independently created pieces of code. Right now, every time you take someone else code, you have to adopt a new data convention. Imagine a world in which you can “just” drag and drop anybody routines onto you data and see absolutely immediately how it works…
I can’t finish this post without a few words on hardware. I do believe there is more to expect now from software than hardware. However, one important thing that is happening right now is the use of more and more GPU computing. Most graphics cards have several hundred processing units that run simultaneously. Used in the right way, it leverages a lot of computational power. As time progresses, I believe the use of cloud computing will mask out any progress done in the hardware as this will become out of our control more and more.
In any case, I am very interested by what you think. What is, in your opinion, the future of numerical computing?