Organizing your code is something that is absolutely crucial if you wish to grow your code base, especially if you wish to keep your code for a long time and/or pass it on to new programmers. There are multiple ways to do this in Matlab. In this post, I talk about these as well as some of the basic principles you must keep in mind when you organize your code.
Building a nice program is a long and iterative procedure. Matlab is designed for you to quickly test things so you have no constraints, whatsoever, on the way you organize your code. You only have infinite freedom. And THAT is the problem. Democracy is the hardest of all regimes to organize. For programming, you want to be a crazily bad dictator that micro-manage every single details. So you want to settle on a structure that is JUST flexible enough to encapsulate all of your future evolutions.
In my experience, this has a great deal of influence on how your program grows, and whether it can grow at all. I won’t give you a perfect recipe as there are none. I am still learning that every day and I feel like I will never stop on that side. I hope to give you enough support so that you can take an educated decision on how you should organize things.
- Rule number one : use version control
Whatever the scheme you decide to follow. PLEASE use Version Control for your code. There are absolutely no reasons today not to do that.
What is version Control ?
Version Control is a system to store your code and all of its evolution. This is particularly useful if you wish to share and work with other people on the same code. But even if you are and will always be the only programmer, it makes tons of sense to store all intermediate version of your program. Matlab files are text files, so all iterations of your program will only take few MB. Version Control programs will do all of this job for you and more. Believe me, I started using this few years ago already. First, I wish I had started earlier and then I know I will never ever come back to good old saving techniques.
There are many versioning system like CVS, SVN, GIT or else. Each one has its specific features. CVS and SVN are centralized, i.e. they have a central repository, a server that stores all the code. GIT was created by Linus Torvalds (who also created the linux kernel) and is distributed, i.e. everybody with a local copy of the program is in essence a repository. I started with SVN and I am now moving to GIT. For starters, I recommend GIT. You can go to Github and set up your system. They have a very easy to use software (too polished, in my opinion but a good start). I have been using SmartSVN and SmartGIT very successfully as well.
Matlab is supposedly capable of integration with versioning softwares but forget about it, it’s not worth your time (in my opinion again).
- Rule number two : avoid code duplication
Now that you have, or plan to, set up source control for your code, how to start organizing?
The very very basic way to organize your code is to make functions. I already talked about this. You need to identify pieces of codes that are used multiple times. Avoid repetitions of code at all costs. If you need to repeat the same code multiple times, you have a design issue to address first. Clarify your ideas and shape your code to make use of functions. I always recommend to first take a piece of paper, sit down with your favorite drink and organize your code before rushing into coding.
- Rule number three : understand how functions can be stored in M-files
Now that you have a bunch of separated functions, how to organize them in files? Rule number three is homework. If you have not done so, you need to understand how functions interact with each others. Indeed there are many many ways to store functions and sub-functions. You can create individual files for each function, create nested functions, make methods in a class objects. M-files can be private or not. They can also be in a package or a function handle and so on. The list is long, I was lost at some point in all the possibilities.
In my opinion, the best way to decide which particular storage scheme is good for your function is to first find out its scope. Is this function going to be used only locally as a subfunction or is it an important piece of code that is used everywhere all other your program? You need to settle on a scope. Scope is a fundamental concept in programming languages. It is EXTREMELY important that you understand that concept. I invite you to consult the wikipedia page on that subject. Matlab has its own set of rules on that particular subject. These are referred as : function precedence rules. I invite you again to consult the referred page from Mathworks. Most of the time you only have to decide among few options : whether you need to create a nested function, a separated function in your current M-file or a separated function file in another M-file. There are other things to consider like Methods in an Object Oriented file or function handle but let’s start simple. Combining your knowledge on your code scope with the various function scope you can achieve, you should be able to settle upon the best solution.
- Rule number four : understand how M-files can be organized in folders
Most file systems are hierarchical nowadays, files stand on a tree. So to organize your files, you must organize them as a hierarchy where each branch is a folder. Unfortunately (to my opinion), Matlab does not always organize its code files as a hierarchy, instead it uses its already mentioned precedence rules to find code. Again, it is important you understand these rules as you create your hierarchy of folders. In that regards, there are several important different ‘types’ of folder than you can make in Matlab.
- Standard folders can be added to the path of matlab using addpath. Code in there will be immediately available everywhere if it is on top of the precedence rules. Keep in mind that Matlab doesn’t care at which level your file is in the hierarchy of folders. What matters is the order of its own search path. I am not sure I consider this very elegant but that’s the way it works.
- private folders : Just create a folder named ‘private’ and only the M-file that are next to or in the folder will see the functions in the folder. These folders are useful if your function are only relevant to local files, you want to protect them from overloading and/or you want to reuse these function names in another location.
- + or package folders : This is useful if you are going for something large. Package folder are probably familiar to all Java users. The advantage here is that code can be really organized as a hierarchy. Basically if you place code names ‘test.m’ in a folder called ‘+foo’, you will need to type in foo.test to access that function or use the import function to bring the scope of these functions local.
- @ or class folders : This is useful if you go in Object Oriented programming and starting making very large classes. The last two categories,+ and @, can be combined if you have multiple classes.
- Last rule : Keep your main file short
This rule is just from my gut feeling. Most programs have a main M-file. This is usually the one you started with. We always have the tendency to accumulate code in this file. Try to break into that habit and send as much code as possible to sub-files. Usually the shorter the main file is, the easier the program is to maintain and read.
And you, how do you organize your files?