Here I present ideas to organize large datasets in Matlab. This is not meant to be a definitive answer as this is a complicated subject that dependsĀ a lot on your particular application. I propose the usage of trees organization and demonstrate how to achieve that in Matlab with native objects. I also think this post is interesting regarding its advanced usage of structure and cells. This is also the first time I introduce recursive functions.
In the past, I encountered this problem several times. Organizing your data in a convenient way is not an easy thing. It is important though as this choice will affect all of your programs. Depending on how you set this up, your coding can be a pain or relatively easy.
The inherent problem with this is that each experiment, recording, trials, … has some specific details that you want to keep in the data even if everything else is the same. Another issue is that some trials can have different numbers of data points or sampling frequencies. And so on. I am sure many of you have faced this problem.
Arrays are not adapted to this problem because if each one of your recording is of different length then you can’t aggregate all of your data points in a single matrix.
I believe one elegant solution to this is a tree. Inherently when you record or accumulate data, you create a tree with multiple branches. One experimental day is a branch and each “leaf” is an individual recording. Within a single branch you can also perform multiple type of recording leading to sub-branches of your branch.
Another reason to why I believe this is a good solution is due to how we store data. File and folders architecture is also a tree so any data organization that mimics this architecture will easily be loaded and saved to hard drive. It is also intuitive as we are used to organize hard drive this way.
Some folks use complicated databases to achieve this. I believe a simpler solution based on a mixture of cell and structure can just do the job. I think simplicity is something we should really enforce here.
Let’s assume you have two days of experiments with 50 time traces recorded on each day. Here is how I would organize this in this scheme :
DataTree{1}.data='06/21/12';
DataTree{1}.type='day';
DataTree{2}.data='06/22/12';
DataTree{2}.type='day';
% Filling recording on day1
for i=1:50
DataTree{1}.branch{i}.data=rand(1000,1);
DataTree{1}.branch{i}.type='traces';
DataTree{1}.branch{i}.sampling=0.1; % for 0.1 Hz for instance
end
% Filling recording on day2
for i=1:50
DataTree{2}.branch{i}.data=rand(1000,1);
DataTree{2}.branch{i}.type='traces';
DataTree{1}.branch{i}.sampling=0.1; % for 0.1 Hz for instance
end
As you can see, I first make a cell using { } and fill the cells with structures. Then for all structure, I have a field named branch that contain a cell again. And the syntax repeats itself as deeply as you need.
Each field name is indicative of some relevant information. So you can decide to have multiple fields for any structure.
Using both objects is advantageous because this way all structures are independent of each other so they can have different fields name. Using cells make also sense because this is where repetition comes into play. The advantage of this approach is that you can do whatever you want. You can add field at your convenience. It keeps the freedom but allows growing to very large dataset.
Another important point here is that all the objects don’t need to be contiguous in memory. So adding more objects doesn’t require any memory reallocation.
Let’s assume now that you have an entire tree of data stored this way. Each branch is at a different depth in the tree. How to retrieve the relevant piece of data from such a large and complicated architecture?
Browsing through such a tree is, on a first look, quite a complicated task. You need to make multiple nested for loops that search for particular data type. However, as for browsing through hard drive, there is a very elegant solution to this problem.
First, I would decide on some fixed field like ‘type’ and ‘branch’, this way you can help your searching algorithm a little.
Then, you should use recursive functions (function that call themselves). An example of this technique is provided here :
function FoundData=RecursiveFunction(FoundData,CurrentNode)
for i=1:numel(CurrentNode)
if strcmp(CurrentNode{i}.type,'traces')
FoundData=[FoundData CurrentNode{i}.data];
elseif isfield(CurrentNode{i},'branch')
FoundData=RecursiveFunction(FoundData,CurrentNode{i}.branch);
end
end
As you can see, this function calls itself until there are no branches. This way you can position the relevant data (here the type ‘traces’) at any depth in the tree. The function will locate it and aggregate in FoundData all the data fields with ‘traces’ as type.
The way to call this function is :
FoundData=RecursiveFunction({},DataTree)
You can easily modify this searching function to search for branches that are sub-branches of a particular type. For example you could have a branch with a property that specifies some experimental conditions and limit the data collection to some chosen values of this experimental condition.
Here is an example of such a DataTree that would work as well :
DataTree{1}.data='06/21/12';
DataTree{1}.type='day';
DataTree{1}.branch{1}.type='baseline';
% Filling recording on day1 - baseline
for i=1:50
DataTree{1}.branch{1}.branch{i}.data=rand(1000,1);
DataTree{1}.branch{1}.branch{i}.type='traces';
DataTree{1}.branch{1}.branch{i}.sampling=0.1; % for 0.1 Hz for instance
end
DataTree{1}.branch{2}.type='stimulation';
% Filling recording on day1 - stimulation
for i=1:50
DataTree{1}.branch{2}.branch{i}.data=rand(300,1);
DataTree{1}.branch{2}.branch{i}.type='traces';
DataTree{1}.branch{2}.branch{i}.sampling=0.3; % for 0.1 Hz for instance
end
The beauty of this system is that you can keep your search algorithm constant. They will work whatever the depth of your Tree. So you are more flexible in the way you organize your data. For example, if one day you did your experiment with more conditions, you can add a branch. Or if your days are grouped as well, you can make several branches for the days.








