Bristol GIN for Silicon Probe Data#

Create GIN Account#

In order to be able to follow this tutorial, you need to have a GIN account either on Bristol GIN or on the public GIN service. GIN web interface documentation has it all explained here.

After creating the account, login to it by typing:

gin login

Create Repository to Store Your Research Data#

The example repository is organised according to the Tonic Research Project Template. You can organise your data according to this template by following a few simple steps. First, go to your GIN web page (GIN-domain-name/your-username) and click on the Import Repository button.

../_images/import-repo.png

       Figure 1. Import Repository

You should be brought to the Import Repository page. As a Clone Address specify the Tonic template github page: tonic-team/Tonic-Research-Project-Template. Give the name to your repository and a concise description of the kind of data stored in the repository. In my case I am creating a repository containing dual silicon probe recordings of spontaneous neural activity in various brain areas of the mouse. Then click the green Migrate Repository button.

../_images/import-repo2.png

       Figure 2. Describe Your New Repository

You should see the template contents cloned under your new repository name.

Now open the terminal and navigate to the folder on your local file system where you would like to keep your research data. Download the remote repository to that location by typing this command (replace the repository path with your own):

gin get dervinism/infraslow-dynamics

You can now use this empty repository to store all of your newly acquired research project data and documents. If you already have data that was generated in the past, you can copy it here.

Set up Your Research Data Repository#

I am going to copy old research data to this newly created repository and make certain rearrangements within the existing folder structure to make it more suitable for already existing data structures created while carrying out my research project. I am also going to use mock research data to reduce the size of the repository so that repository management actions can be performed fast for this tutorial. The mock data repository is available to download on GIN. You can download it by typing the line below in your terminal:

gin get dervinism/mock-ecephys-project
cd mock-ecephys-project
gin get-content

Once downloaded, open the repository and copy the contents of the repository except for the .git folder. Delete the contents of the infraslow-dynamics repository and paste the copied contents from the mock-ecephys-project repository. Edit the README file accordingly to reflect the new name of the repository and other info.

Record Your Local Research Data Repository Changes#

Once your repository is set up (data folders organised, data files placed in right locations, etc.), you should register the state of your repository with the local version control system. By doing so you create the image of your repository that can always be reverted to in the future in case the need to do so arises. When commiting local repository changes to your version control system you typically provide a concise message describing the changes. By convention, the message length should not exceed 50 characters. As for the first record, we type:

gin commit . -m "Initial commit"

This action would commit all changes locally. Dot means that the command is executed on the contents of the current working directory; therefore, make sure that you are inside the root folder of your repository when carrying out this action. The flag -m is used to pass the commit message. When you make new changes to the repository, whether editing text files or manipulating your data files, you should commit these changes periodically to your local versioning system by executing in the terminal a similar command:

gin commit . -m "A message describing new changes"

Update Your Remote Research Data Repository#

All of the changes that were commited previously, were done so locally. We were working on a local copy of our research data repository. In order to update our remote research data respository, whether residing on Bristol GIN or on the public GIN server, we need to push our local changes to the remote copy of our repository. We do so by simply executing the line below in the terminal:

gin upload .

Any new files and any new changes to exisiting files should now be uploaded onto the remote repository and we should be able to see them if we navigated to the repository web page. Alternatively, we can also update the remote repository using the web interface. We do so by navigating to the repository webpage and clicking the blue Upload file button.

../_images/upload-repo.png

       Figure 3. Update Remote Repository via Web

The limitation of using the web interface is that every time you update your remote repository, you will be limited to uploading 100 files at a time with each file being no larger than 10 gigabytes. Therefore, it is more effecient/effective to use the command line tools which have none of these limitations.

When you use the web interface, you can specify the commit message title (no more than 50 characters by convention) and the commit message body (no more than 72 characters by convention).

Remove Content of Your Local Research Data Repository#

One advantage of using GIN for your data repository mangement is that you do not need to keep duplicate repositories in order to prevent accidental detrimental changes to your main repository. One reason for that is having version control system. The other reason is that you can safely remove the content of your local repository and replace it with pointers to original files. As a result you can save space on your local hard-drive. To remove the local content type the following line in your terminal:

gin remove-content

Local files larger than 10 megabytes should be replaced by symbolic links. In case you want to remove the content of specific files only, you can type:

gin remove-content <absolute or relative path to file or folder>

For example, to remove the raw research data from our silicon probe recording repository, we type:

gin remove-content 03_data/001_uol_neuronexus_exp_raw_derived
gin remove-content 03_data/002_uol_neuropixels_exp_raw_derived

To simply restore the file content type in

gin get-content

If you no longer need to work on your repository and its remote copy is up to date with the local copy, you can simply delete the local repository copy altogether. You should always be able to restore your repository and all of its contents on your local machine by executing these commands in your terminal (replace the repository path as appropriate):

gin get dervinism/mock-ecephys-project
cd mock-ecephys-project
gin get-content

Compress Raw Data#

Silicon probe recording raw data files take a lot of space, especially neuropixels recordings. Often these files consume most of the hard-disk space compared to any other files produced during the data processing. It is, therefore, advisable to compress them. The International Brain Laboratory has provided compression software for that purpose which is easy to use with instructions on how to install and compress your files provided here. Briefly, you can install it by typing in your terminal:

pip install mtscomp

Make sure you also have the required dependencies installed.

Once installed, the compression is straight forward. All you need is to specify the name of your binary file, number of recording channels, the sampling rate, and the binary data type as in the example below:

mtscomp <data-filename>.bin -n 385 -s 30000 -d int16

The compression software will produce <data-filename>.cbin compressed file and <data-filename>.ch json metadata file describing the compression parameters. These two files should be preserved while the original data binary file can be deleted. To restore the original file, type

mtsdecomp <data-filename>.cbin -o <data-filename>.bin

You can expect a three-fold reduction in the file size after the compression. So it is a good way to greatly reduce the size of your repositories.

Convert Your Data to Standardised Format#

In order to increase your research data’s adherence to the FAIR principles of scientific data management and, in particular, to increase the interoperability of your data and chances of it being reused beyond its original purpose of collection, it is highly advisable to convert your data into one of the more popular standard file formats for neuroscience data. One such format is the Neurodata Without Borders (NWB) which is highly suitable for most of the neurophysiology data. Programming interfaces in both Matlab and Python are available for converting your data. Here we are going to provide explanations of how you can convert your data in both programming languages. While showing you examples of that, we will continue our focus on the extracellular electrophysiology data.

Convert to NWB Using Matlab#

Install MatNWB Toolbox#

To begin with the Matlab demonstration, you would need to install MatNWB toolbox. To download the toolbox, type in your terminal:

git clone https://github.com/NeurodataWithoutBorders/matnwb.git

Move the downloaded repository to the folder where you keep your code libraries. Then type the following in the Matlab command line:

cd matnwb
addpath(genpath(pwd));
generateCore();

You can now start using MatNWB. MatNWB interface documentation can be accessed here.

Record Metadata#

We have prepared example repositories containing single session extracellular physiology recording data collected with Neuronexus and Neuropixels probes and Matlab scripts that would convert that data to the NWB format. They can be used to familiarise with the Matlab NWB conversion scheme for spiking data combined with behavioural measurements. Both conversion scripts are very similar and, thus, we will focus on the Neuropixels use case.

To donwload the Neuropixels repository, type in your terminal:

gin get dervinism/convert2nwbEcephysMatNpx
cd convert2nwbEcephysMatNpx
gin get-content

It will take some time to download the full repository. Once the download is complete you can open the convert2nwb.m file and and execute it right away. The script would load derived spiking and behavioural data from convert2nwbEcephysMatNpx/npx_derived_data/M200324_MD/M200324_MD.mat file, convert it to the NWB format, and save it inside convert2nwbEcephysMatNpx/npx_derived_data_nwb folder as ecephys_session_01.nwb file.

We will now analyse the conversion script in more detail. The script starts by executing three parameter files to initiate the conversion environment. The first parameter file nwbParams.m contains the most general type of parameters that apply to all animals and recording sessions of the experiment, like:

projectName = 'Brainwide Infraslow Activity Dynamics';
experimenter = 'Martynas Dervinis';
institution = 'University of Leicester';
publications = 'In preparation';
lab = 'Michael Okun lab';
dataset = 'neuropixels';
videoFrameRate = 25; % Hz

The names of most of these parameters are self-explanatory. The videoFrameRate variable contains the camera’s frame rate recording the animal’s pupil. There are also input and output folders defined at the bottom of the parameter file. They include:

  • rawDataFolder which contains probe recording channel maps and unit waveform files;

  • derivedDataFolder which contains processed spiking data;

  • derivedDataFolderNWB which is the output folder where converted NWB files are saved.

As the name implies, nwbAnimalParams.m file contains parameters specific to a particular animal and common to all recording sessions for that animal. They include:

animalID = 'M200324_MD';
dob = '20200206'; % yyyymmdd
...
strain = 'C57BL/6J';
sex = 'M';
species = 'Mus musculus';
weight = [];
description = '025'; % Animal testing order.

Names of these parameters are self-explanatory. The script also defines input and output folders specifically for the animal of interest.

The nwbSessionParams.m file stores parameters about recording sessions. Some of these parameters are defined at the top of the file explicitly for each individual session and some even explicitly for each recording probe, like:

sessionID = {'20200324161349'};
sessionDescription = {'anaesthesia'};
sessionNotes = {'...'};
endCh{1}{1} = [88 117 149 342 384]; % Corresponding probe end channels starting from the tip of the probe. Corresponding and previous end channels are used to work out probe channels that reside in the corresponding brain area.
endCh{1}{2} = [41 091 138 217 304 384];

The for loop that follows next sets the remaining parameters that are common across recording sessions, like electrodeName, electrodeDescription, electrodeManufacturer, nShanks (number of probe shanks), nChannelsPerShank (number of recording channels per shank), nCh (total number of probe channels), areas (brain areas that probes span), electrodeCoordinates (electrode insertion coordinates), electrodeLabel, and electrodeImplantationType (i.e., acute or chronic).

Most of the parameters defined in the three parameter files comprise metadata. The way you define your metadata may be different. For example, you may have your own custom scripts that contain the metadata or you may store your metadata in files organised according to one of standard neuroscientific metadata formats like odML. Whichever your preference is, this part of the NWB conversion procedure will vary depending on the individual researcher.

The initialisation process is completed by intialising the Matlab NWB classes by calling

generateCore;

within the convert2nwb.m script file.

The conversion process then goes through every recording session and generates NWB files individually for each session inside the for loop. We start by creating an NWBFile object and defining session metadata:

% Assign NWB file fields
nwb = NwbFile( ...
  'session_description', sessionDescription{iSess},...
  'identifier', [animalID '_' sessionID{iSess}], ...
  'session_start_time', sessionStartTime{iSess}, ...
  'general_experimenter', experimenter, ... % optional
  'general_session_id', sessionID{iSess}, ... % optional
  'general_institution', institution, ... % optional
  'general_related_publications', publications, ... % optional
  'general_notes', sessionNotes{iSess}, ... % optional
  'general_lab', lab); % optional

Each file must have a session_description identifier and a session_start_time parameter. We then initialise the Subject object and the metadata it contains:

% Create subject object
subject = types.core.Subject( ...
  'subject_id', animalID, ...
  'age', age, ...
  'description', description, ...
  'species', species, ...
  'sex', sex);
nwb.general_subject = subject;

Construct Electrodes Table#

Storing extracellular electrophysiology data are not possible without defining the electrodes table which is a DynamicTable object. We do it first by creating a Matlab table array using a code wrapped inside the createElectrodeTable function. We put the table generation code inside the function, because it is going to be reused for each probe. The function call for probe 1 is executed by the code below:

% Create electrode tables: Info about each recording channel
input.iElectrode = 1;
input.electrodeDescription = electrodeDescription{iSess};
input.electrodeManufacturer = electrodeManufacturer{iSess};
input.nShanks = nShanks{iSess};
input.nChannelsPerShank = nChannelsPerShank{iSess};
input.electrodeLocation = electrodeLocation{iSess};
input.electrodeCoordinates = electrodeCoordinates{iSess};
input.sessionID = sessionID{iSess};
input.electrodeLabel = electrodeLabel{iSess};
if probeInserted{iSess}{input.iElectrode} && ~isempty(endCh{iSess}{1})
  tbl1 = createElectrodeTable(nwb, input);
else
  tbl1 = [];
end

The table array for the probe 1 is then created within the function:

% Create a table with given column labels
variables = {'channel_id', 'channel_local_index', 'x', 'y', 'z', 'imp', 'location', 'filtering', 'group', 'channel_label', 'probe_label'};
tbl = cell2table(cell(0, length(variables)), 'VariableNames', variables);

% Register the probe device
device = types.core.Device(...
  'description', input.electrodeDescription{iEl}, ...
  'manufacturer', input.electrodeManufacturer{iEl} ...
  );
nwb.general_devices.set(['probe' num2str(iEl)], device);

for iShank = 1:nSh{iEl}
  
  % Register a shank electrode group (only one because this is a single shank probe)
  electrode_group = types.core.ElectrodeGroup( ...
    'description', ['electrode group for probe' num2str(iEl)], ...
    'location', input.electrodeLocation{iEl}{end}, ...
    'device', types.untyped.SoftLink(device), ...
    'position', table(input.electrodeCoordinates{iEl}(1,1), ...
    input.electrodeCoordinates{iEl}(1,2), ...
    input.electrodeCoordinates{iEl}(1,3), ...
    'VariableNames',{'x','y','z'}) ...
    );
  nwb.general_extracellular_ephys.set(['probe' num2str(iEl)], electrode_group);
  group_object_view = types.untyped.ObjectView(electrode_group);
  
  % Populate the electrode table
  for iCh = 1:nCh{iEl}
    if iCh < 10
      channelID = str2double([input.sessionID num2str(iEl) '00' num2str(iCh)]);
    elseif iCh < 99
      channelID = str2double([input.sessionID num2str(iEl) '0' num2str(iCh)]);
    else
      channelID = str2double([input.sessionID num2str(iEl) num2str(iCh)]);
    end
    channel_label = ['probe' num2str(iEl) 'shank' num2str(iShank) 'elec' num2str(iCh)];
    tbl = [tbl; ...
      {channelID, iCh, input.electrodeCoordinates{iEl}(iCh, 1), input.electrodeCoordinates{iEl}(iCh, 2), input.electrodeCoordinates{iEl}(iCh, 3),...
      NaN, input.electrodeLocation{iEl}{iCh}, 'unknown', group_object_view, channel_label, input.electrodeLabel{iEl}}]; %#ok<*AGROW>
  end
end

The code initialises an empty table with given column labels. It then records the probe Device (which is an object itself) inside the NWBFile object. Next, we create an ElectrodeGroup object which is used to define electrode groupings within the probe. In our case we have a single shank probe and, therefore, we define all recording channels to be part of a single group. Though, grouping on some other basis is possible. You may have noticed that the device property is set as a Softlink object which is used to link to an already existing NWB object (in our case) or to an object within an NWB file using a path. Similarly, ElectrodeGroup is used to create an ObjectView object which works in a very similar way to the SoftLink object, and which is later used in constructing the electrodes table. The position property is a Matlab table array with columns being stereotaxic coordinates. Finally, the electrodes table is filled in channel by channel (rows) with channel subtables.

The same procedure is then repeated for the probe 2. Finally, the combined table array is then converted into a DynamicTable object using util.table2nwb function which also takes in the table description as the second argument:

tbl = [tbl1; tbl2];
electrode_table = util.table2nwb(tbl, 'all electrodes');
nwb.general_extracellular_ephys_electrodes = electrode_table;

Note

You can add any number of custom columns to a DynamicTable object and, therefore, you can expand an electrodes table object with any additional metadata you deem necessary.

Load and Initialise Spiking Data#

The next line of code loads processed spiking data from the Matlab MAT file by calling the getSpikes function:

[spikes, metadata, derivedData] = getSpikes(derivedData, animalID, sessionID{iSess}, tbl);

This is a custom function containing the loading algorithm that very much depends on the processed data structure stored inside the MAT file. I will not go into the detail of how the function runs as your own data are very likely to be structured differently. However, you are welcome to explore the code yourself as it is commented generously. It will suffice to say that the function outputs the spikes variable which is a 1-by-n cell array with unit spike times in seconds, where n is the number of units. Moreover, the function also outputs the metadataTbl variable which is a Matlab table array with rows corresponding to individual clusters (units) and columns to various metadata types describing unit properties, like cluster_id, local_cluster_id, type, channel_index, channel_id, local_channel_id, rel_horz_position, rel_vert_position, isi_violations, isolation_distance, area, probe_id, and electrode_group. You can find the description of each of these properties in the getSpikes function definition.

Once the spike times are extracted, we convert them into VectorData and VectorIndex objects by executing the line below:

[spike_times_vector, spike_times_index] = util.create_indexed_column(spikes);

spike_times_vector.data property is simply a vector of spike times where unit spike times are consequtively arranged into this single vector grouped on a unit basis. Meanwhile, spike_times_index.data property is a vector of indices which are then used to index the end of corresponding unit spike times data (i.e., row breaks). In conjunction, VectorData and VectorIndex objects can be used to encode ragged arrays. Ragged arrays have rows with different number of columns. All units taken together and their spike times form such a ragged array which is illustrated in the figure below.

../_images/ragged-array.png

       Figure 4. Ragged Array
       Original image taken from here.

Load Waveforms#

Before converting spiking data we load unit waveforms. The waveforms data can also be stored as a ragged array, even as a double-indexed one. You can find more information here on how to construct such arrays. In our case, the waveforms arrays are rather simple: we are only interested in average waveforms on the probe recording channel with the largest waveform amplitude. This data are stored in the waveformMeans variable which is a cell array of average waveforms with cells corresponding to individual units. The variable is constructed after loading and reshaping the waveforms located inside the convert2nwbEcephysMatNpx/npx_raw_derived_data/M200324_MD/<session-ID>/waveforms.mat files:

% Load and reshape unit waveforms
...
for iWave = 1:numel(waveformMeans)
  if isempty(waveformMeans{iWave})
    waveformMeans{iWave} = nan(1,nWaveformSamples);
  end
end

Construct Units Table#

In order to store spiking data and other related data, we construct the units table which is, like the electrodes table, a DynamicTable object. As such it supports addition of additional metadata columns. The code that constructs the units table is shown below:

nwb.units = types.core.Units( ...
  'colnames', {'cluster_id','local_cluster_id','type',...
  'peak_channel_index','peak_channel_id',... % Provide the column order. All column names have to be defined below
  'local_peak_channel_id','rel_horz_pos','rel_vert_pos',...
  'isi_violations','isolation_distance','area','probe_id',...
  'electrode_group','spike_times','spike_times_index'}, ...
  'description', 'Units table', ...
  'id', types.hdmf_common.ElementIdentifiers( ...
  'data', int64(0:length(spikes) - 1)), ...
  'cluster_id', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,1}), ...
  'description', 'Unique cluster id'), ...
  'local_cluster_id', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,2}), ...
  'description', 'Local cluster id on the probe'), ...
  'type', types.hdmf_common.VectorData( ...
  'data', metadata{:,3}, ...
  'description', 'Cluster type: unit vs mua'), ...
  'peak_channel_index', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,4}), ...
  'description', 'Peak channel row index in the electrode table'), ...
  'peak_channel_id', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,5}), ...
  'description', 'Unique ID of the channel with the largest cluster waveform amplitude'), ...
  'local_peak_channel_id', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,6}), ...
  'description', 'Local probe channel with the largest cluster waveform amplitude'), ...
  'rel_horz_pos', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,7})./1000, ...
  'description', 'Probe-relative horizontal position in mm'), ...
  'rel_vert_pos', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,8})./1000, ...
  'description', 'Probe tip-relative vertical position in mm'), ...
  'isi_violations', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,9}), ...
  'description', 'Interstimulus interval violations (unit quality measure)'), ...
  'isolation_distance', types.hdmf_common.VectorData( ...
  'data', cell2mat(metadata{:,10}), ...
  'description', 'Cluster isolation distance (unit quality measure)'), ...
  'area', types.hdmf_common.VectorData( ...
  'data', metadata{:,11}, ...
  'description', ['Brain area where the unit is located. Internal thalamic ' ...
  'nuclei divisions are not precise, because they are derived from unit locations on the probe.']), ...
  'probe_id', types.hdmf_common.VectorData( ...
  'data', metadata{:,12}, ...
  'description', 'Probe id where the unit is located'), ...
  'spike_times', spike_times_vector, ...
  'spike_times_index', spike_times_index, ...
  'electrode_group', types.hdmf_common.VectorData( ...
  'data', metadata{:,13}, ...
  'description', 'Recording channel groups'), ...
  'waveform_mean', types.hdmf_common.VectorData( ...
  'data', cell2mat(waveformMeans), ...
  'description', ['Mean waveforms on the probe channel with the largest waveform amplitude. ' ...
  'MUA waveforms are excluded. The order that waveforms are stored match the order of units in the unit table.']) ...
);

The property colnames specifies units table column names and should be used to indicate column order. The description property is simply used to describe what kind of information is stored in the table. When constructing the table you have to provide the id property which is the ElementIdentifiers list object containing indices starting with 0 and used to identify DynamicTable rows. Custom properties are passed in as VectorData objects with their own data and description properties. In the above code snippet these include cluster_id, local_cluster_id, type, peak_channel_index, peak_channel_id, local_peak_channel_id, rel_horz_pos, rel_vert_pos, isi_violations, isolation_distance, area, and probe_id. You can also pass in a few optional properties like electrode_group VectorData object which specifies an electrode group that each spike unit came from and electrodes DynamicTableRegion object used to reference the electrodes table containing the electrodes that each spike unit came from, as well as other properties. Finally, we also pass in the waveform_mean property containing a matrix of mean unit waveforms as a VectorData object. The latter property will not be one of the table columns as it is not listed in the colnames property but it will be accessible via dot indexing.

Note

VectorData objects should not have cell arrays of integers (cell arrays of strings are fine) as their data property. During the NWB conversion they are not going to be encoded properly and will generate an error. You can use Matlab’s cell2mat function to convert them into regular number arrays.

Add Behavioural Module: Pupil Area Size#

We load pupil area size data from the same file containing processed spiking data. We then convert this data into a TimeSeries object which is designed to store any general purpose time series data. This object has a few mandatory properties, like data, data_unit, and starting_time_rate, and a few optional ones, like timestamps, control, control_description, and description. The data property can be stored as any 1-D, 2-D, 3-D, or 4-D array where the first dimension is time. We use it to store our pupilAreaSize data vector. The data_unit property has to be a string, while the starting_time_rate property is a scalar of float32 type. The timestamps property should give data sampling times and should be stored as a unidimensional number array of float64 type. The control property is used for labeling data samples with integers. The length of this array should be the same as the length of the first data dimension representing time. In our case I am using integers 0 and 1 to mark acceptable quality data. The meaning of these labels is provided by the control_description property which is a cell array of strings with cells describing labels in the increasing order. The full code that converts the pupil area size data into the appropriate form is shown below:

pupilAreaSize = types.core.TimeSeries( ...
  'data', pupilAreaSize, ...
  'timestamps', videoFrameTimes, ...
  'data_unit', 'pixels^2', ...
  'starting_time_rate', videoFrameRate,...
  'control', uint8(acceptableSamples),...
  'control_description', {'low quality samples that should be excluded from analyses';...
  'acceptable quality samples'},...
  'description', ['Pupil area size over the recording session measured in pixels^2. ' ...
  'Acceptable quality period starting and ending times are given by data_continuity parameter. ' ...
  'The full data range can be divided into multiple acceptable periods.'] ...
);

pupilTracking = types.core.PupilTracking('TimeSeries', pupilAreaSize);
behaviorModule = types.core.ProcessingModule('description', 'contains behavioral data');
behaviorModule.nwbdatainterface.set('PupilTracking', pupilTracking);

As you will notice, our TimeSeries object is assigned to a PupilTracking object which can hold one or more of the TimeSeries objects. We then create a ProcessingModule container object to store the bahavioural data. The final line creates a nwbdatainterface.PupilTracking property with our PupilTracking object within the ProcessingModule object.

Add Behavioural Module: Total Facial Movement#

The facial movement data conversion process into appropriate form for NWB storage is almost identical to the conversion process applied to the pupil area size data. First, we load the data and store it inside the TimeSeries object. The steps are identical. Next, we store our TimeSeries object inside the BehavioralTimeSeries object designed for holding one or more TimeSeries objects:

behavioralTimeSeries = types.core.BehavioralTimeSeries('TimeSeries', totalFaceMovement);

We then create a nwbdatainterface.BehavioralTimeSeries property with our BehavioralTimeSeries object within the ProcessingModule container object:

behaviorModule.nwbdatainterface.set('BehavioralTimeSeries', behavioralTimeSeries);

Finally, we create the behaviour module within the NWBFile object:

nwb.processing.set('behavior', behaviorModule);

Save NWB File#

We call the nwbExport function to save our data in the NWB format:

if iSess < 10
  nwbExport(nwb, [animalDerivedDataFolderNWB filesep 'ecephys_session_0' num2str(iSess) '.nwb']);
else
  nwbExport(nwb, [animalDerivedDataFolderNWB filesep 'ecephys_session_' num2str(iSess) '.nwb']);
end

Read NWB File#

Now if you want to open the NWB file that you just saved in Matlab, you can issue a command

nwb2 = nwbRead('ecephys_session_01.nwb');

which will read the file passively. The action is fast and it does not load all of the data at once but rather make it readily accessible. This is useful as it allows you to read data selectively without loading the entire file content into the computer memory.

If you want to read the entire unit table which has all unit spiking data and associated metadata, the easiest way is to issue a command

unitsTable = nwb2.units.loadAll.toTable();

which will convert the DynamicTable object into a Matlab table array. The latter is a common Matlab data type and can be manipulated easily using regular Matlab syntax and indexing.

In contrast, if you are interested in a single unit data, you can load it using the following line of code:

unitRow = nwb2.units.getRow(1);

To load only spiking data for the same unit, the following line of code will do:

unitRowSpikeTimes = nwb2.units.getRow(1).spike_times{1};

If you want to load mean unit waveforms, the following line of code will load them as a floating-point number array:

unitsWaveforms = nwb2.units.waveform_mean.loadAll.data;

Meanwhile, behavioural data can be accessed by running the code below:

pupilAreaSize = nwb2.processing.get('behavior'). ...
    nwbdatainterface.get('PupilTracking'). ...
    timeseries.get('TimeSeries').data(:);

totalFacialMovement = nwb2.processing.get('behavior'). ...
    nwbdatainterface.get('BehavioralTimeSeries'). ...
    timeseries.get('TimeSeries').data(:);

Other associated behavioural properties can be accessed by replacing the data property by timestamps or control and so on.

Some metadata are often directly available as properties of the NWBFile object, like:

sessionDescription = nwb2.session_description;

Subject metadata are available via the command, for example:

animalID = nwb2.general_subject.subject_id;

While the electrode metadata can be accessed in the following way:

electrodesTable = nwb2.general_extracellular_ephys_electrodes.toTable();

electrodes table is a DynamicTable object and, therefore, methods of accessing it are the same as those discussed for accessing the units table. For more detailed account of how you can access and manipulate DynamicTable objects refer to an external MatNWB DynamicTables Tutorial.

Validate NWB File#

MatNWB does not provide its own NWB file validator. However, you can validate NWB files generated using Matlab in Python. For instructions on how to do it, refer to the corresponding PyNWB validation section of the tutorial.

Resources#

This section explained how you can use Matlab to convert your processed spiking and behavioural data into NWB files. It is not an exhaustive overview of Matnwb toolbox extracellular electrophysiology functionality. There are other potential use cases involving local field potentials (LFPs) and different types of behavioural data. There is a number of externally available tutorials covering some aspects of converting data obtained in extracellular electrophysiology experiments to the NWB file format using Matlab:

Convert to NWB Using Python#

Install PyNWB#

To convert your extracellular electrophysiology data, you will need to install PyNWB first. To do so, type in your terminal:

pip install -U pynwb

Now you’re ready to start working with PyNWB. The full installation instructions are available here.

Record Metadata#

Just like in the Matlab tutorial, we have prepared example repositories containing single session extracellular physiology recording data collected with Neuronexus and Neuropixels probes and Python scripts that would convert that data to NWB format. They can be used to familiarise with the Python NWB conversion scheme for spiking data combined with behavioural measurements. Both conversion scripts are very similar and, thus, we will focus on the Neuropixels use case.

To donwload the Neuropixels repository, type in your terminal:

gin get dervinism/convert2nwbEcephysPyNpx
cd convert2nwbEcephysPyNpx
gin get-content

It will take some time to download the full repository. Once the download is complete you can open the convert2nwb.py file and and execute it right away. The script would load derived spiking and behavioural data from convert2nwbEcephysPyNpx/npx_derived_data/M200324_MD/M200324_MD.mat file, convert it to the NWB format, and save it inside convert2nwbEcephysPyNpx/npx_derived_data_nwb folder as ecephys_session_01.nwb file.

We will now analyse the conversion script in more detail. The script starts by importing general and pynwb module dependencies, as well as custom functions located inside the localFunctions.py module file.

from datetime import datetime
from dateutil.tz import tzlocal
import numpy as np

from pynwb import NWBFile, NWBHDF5IO, TimeSeries
from pynwb.behavior import PupilTracking, BehavioralTimeSeries
from pynwb.core import DynamicTable
from pynwb.ecephys import ElectricalSeries, LFP
from pynwb.file import Subject

from localFunctions import createElectrodeTable, array2list, getSpikes, reshapeWaveforms, concatenateMat, parsePeriod, markQualitySamples

These are not all modules that we need. Some of them are loaded inside parameter files. These other modules include: pathlib, os, re, datetime, numpy, h5py, and scipy among others.

We then execute three parameter files to initiate the conversion environment. The first parameter file nwbParams.py contains the most general type of parameters that apply to all animals and recording sessions of the experiment, like:

projectName = 'Brainwide Infraslow Activity Dynamics'
experimenter = 'Martynas Dervinis'
institution = 'University of Leicester'
publications = 'In preparation'
lab = 'Michael Okun lab'
dataset = 'neuropixels'
videoFrameRate = 25.0 # Hz

The names of most of these parameters are self-explanatory. The videoFrameRate variable contains the camera’s frame rate recording the animal’s pupil. There are also input and output folders which are initialised as global variables and defined at the bottom of the parameter file. They include:

  • rawDataFolder which contains probe recording channel maps and unit waveform files;

  • derivedDataFolder which contains processed spiking data;

  • derivedDataFolderNWB which is the output folder where converted NWB files are saved.

As the name implies, nwbAnimalParams.py file contains parameters specific to a particular animal and common to all recording sessions for that animal. They include:

animalID = 'M200324_MD'
dob = '20200206'
...
strain = 'C57BL/6J'
sex = 'M'
species = 'Mus musculus'
weight = []
description = '025' # Animal testing order.

Names of these parameters are self-explanatory. The script also defines input and output folders specifically for the animal of interest.

The nwbSessionParams.py file stores parameters about recording sessions. Some of these parameters are defined at the top of the file explicitly for each individual session and some even explicitly for each recording probe, like:

sessionID = ['20200324161349']
sessionDescription = ['anaesthesia']
sessionNotes = ['...']
endCh = [[None] * 2 for i in range(len(sessionID))]
endCh[0][0] =  np.array([88, 117, 149, 342, 384]) # Corresponding probe end channels starting from the tip of the probe. Corresponding and previous end channels are used to work out probe channels that reside in the corresponding brain area.
endCh[0][1] = np.array([41, 91, 138, 217, 304, 384])

The for loop that follows next sets the remaining parameters that are common across recording sessions, like electrodeName, electrodeDescription, electrodeManufacturer, nShanks (number of probe shanks), nChannelsPerShank (number of recording channels per shank), nCh (total number of probe channels), areas (brain areas that probes span), electrodeCoordinates (electrode insertion coordinates), electrodeLabel, and electrodeImplantationType (i.e., acute or chronic).

Most of the parameters defined in the three parameter files comprise metadata. The way you define your metadata may be different. For example, you may have your own custom scripts that contain the metadata or you may store your metadata in files organised according to one of standard neuroscientific metadata formats like odML. Whichever your preference is, this part of the NWB conversion procedure will vary depending on the individual researcher.

The conversion process then goes through every recording session and generates NWB files individually for each session inside the for loop of the convert2nwb.py file. We start by creating an NWBFile object and defining session metadata:

# Assign NWB file fields
nwb = NWBFile(
  session_description = sessionDescription[iSess],
  identifier = animalID + '_' + sessionID[iSess], 
  session_start_time = sessionStartTime[iSess], 
  experimenter = experimenter,  # optional
  session_id = sessionID[iSess],  # optional
  institution = institution,  # optional
  related_publications = publications,  # optional
  notes = sessionNotes[iSess],  # optional
  lab = lab) # optional

Each file must have a session_description identifier and a session_start_time parameter. We then initialise the Subject object and the metadata it contains:

# Create subject object
nwb.subject = Subject(
  subject_id = animalID,
  age = age,
  description = description,
  species = species,
  sex = sex)

Construct Electrodes Table#

Storing extracellular electrophysiology data are not possible without defining the electrodes table which is a DynamicTable object. We do it first by creating a numpy array object using a code wrapped inside the createElectrodeTable function. We put the table generation code inside the function, because it is going to be reused for each probe. The function call for probe 1 is executed by the code below:

input = {
  "iElectrode": 0,
  "electrodeDescription": electrodeDescription[iSess],
  "electrodeManufacturer": electrodeManufacturer[iSess],
  "nShanks": nShanks[iSess],
  "nChannelsPerShank": nChannelsPerShank[iSess],
  "electrodeLocation": electrodeLocation[iSess],
  "electrodeCoordinates": electrodeCoordinates[iSess],
  "sessionID": sessionID[iSess],
  "electrodeLabel": electrodeLabel[iSess]}
if probeInserted[iSess][input["iElectrode"]] and endCh[iSess][input["iElectrode"]].any():
  (tbl1, columnLabels, columnDescription) = createElectrodeTable(nwb, input)
else:
  tbl1 = []; columnLabels = []; columnDescription = []

The table numpy array for the probe 1 is then created within the function:

# Parse input
iEl = input["iElectrode"]
nSh = input["nShanks"]
nCh = input["nChannelsPerShank"]

# Create a table with given column labels
columnLabels = ['channel_id', 'channel_local_index', 'x', 'y', 'z', 'imp', 'location', 'filtering', 'group', 'channel_label', 'probe_label']
columnDescription = [
  'A unnique probe channel ID formed by combining session ID, probe reference number, and channel number relative to the tip of the probe.',
  'Channel index relative to the tip of the probe. Channel indices are only unique within a probe.',
  'Channel AP brain surface coordinate (probe inisertion location mm).',
  'Channel ML brain surface coordinate (probe inisertion location mm).',
  'Channel location relative to the tip of the probe in mm.',
  'Channel impedance.',
  'Channel brain area location.',
  'Type of LFP filtering applied.',
  'Channel electrode group (e.g., shank 1).',
  'Channel_label.',
  'Probe_label.'
]
tbl = np.empty((0, len(columnLabels)))

# Register the probe device
device = nwb.create_device(
  name = 'probe' + str(iEl+1),
  description = input["electrodeDescription"][iEl],
  manufacturer = input["electrodeManufacturer"][iEl])

for iShank in range(nSh[iEl]):
  
  # Register a shank electrode group (only one because this is a single shank probe)
  electrode_group = nwb.create_electrode_group(
    name = 'probe' + str(iEl+1),
    description = 'electrode group for probe' + str(iEl+1),
    location = input["electrodeLocation"][iEl][-1],
    device = device,
    position = input["electrodeCoordinates"][iEl][-1])
  
  # Populate the electrode table
  for iCh in range(nCh[iEl]):
    if iCh < 10-1:
      channelID = str(input["sessionID"] + str(iEl+1) + '00' + str(iCh+1))
    elif iCh < 99-1:
      channelID = str(input["sessionID"] + str(iEl+1) + '0' + str(iCh+1))
    else:
      channelID = str(input["sessionID"] + str(iEl+1) + str(iCh+1))
    channel_label = 'probe' + str(iEl+1) + 'shank' + str(iShank+1) + 'elec' + str(iCh+1)
    tbl = np.append(tbl, np.matrix([
      channelID, iCh+1, input["electrodeCoordinates"][iEl][iCh][0], input["electrodeCoordinates"][iEl][iCh][1], input["electrodeCoordinates"][iEl][iCh][2],
      np.nan, input["electrodeLocation"][iEl][iCh], 'unknown', electrode_group, channel_label, input["electrodeLabel"][iEl]]), axis=0)
  
return np.array(tbl), columnLabels, columnDescription

The code initialises an empty table with given column labels. It then records the probe Device inside the NWBFile object (see the method documentation here). Next, we create an ElectrodeGroup object which is used to define electrode groupings within the probe (see the method documentation here). In our case we have a single shank probe and, therefore, we define all recording channels to be part of a single group. Though, grouping on some other basis is possible. We set the device property to the previously created Device object. The location property defines the location on the cortical surface where the probe was inserted, while the position property is a numpy array with columns being stereotaxic coordinates. Finally, the electrodes table is filled in channel by channel (rows) with channel subtables.

The same procedure is then repeated for the probe 2. The numpy arrays for the two probes are then combined and converted into a column list:

if len(tbl1) and len(tbl2):
  tbl = array2list(np.append(tbl1, tbl2, axis=0), columnLabels, columnDescription)
elif len(tbl1) and not len(tbl2):
  tbl = array2list(tbl1, columnLabels, columnDescription)
elif not len(tbl1) and len(tbl2):
  tbl = array2list(tbl2, columnLabels, columnDescription)
else:
  tbl = []

Finally, this list is converted into an electrodes table DynamicTable object. We start by defining electrodes table columns using add_electrode_column NWBFile method (documented here). Next we add table entries electrode by electrode using the add_electrode method (documented here). We finish by converting the electrodes table into a pandas DataFrame object by calling the to_dataframe DynamicTable method (documented here).

if len(tbl):
  for iColumn in range(len(columnLabels)):
    if columnLabels[iColumn] is not 'location' and columnLabels[iColumn] is not 'group':
      nwb.add_electrode_column(name=columnLabels[iColumn], description=columnDescription[iColumn])
  
  for iElec in range(len(tbl[0].data)):
    nwb.add_electrode(
      channel_id=tbl[0].data[iElec],
      channel_local_index=tbl[1].data[iElec],
      x=tbl[2].data[iElec],
      y=tbl[3].data[iElec],
      z=float(tbl[4].data[iElec]),
      imp=tbl[5].data[iElec],
      location=tbl[6].data[iElec],
      filtering=tbl[7].data[iElec],
      group=tbl[8].data[iElec],
      channel_label=tbl[9].data[iElec],
      probe_label=tbl[10].data[iElec]
    )

  nwb.electrodes.to_dataframe()

Note

You can add any number of custom columns to a DynamicTable object and, therefore, you can expand an electrodes table object with any additional metadata you deem necessary.

Load and Initialise Spiking Data#

The next line of code loads processed spiking data from the Matlab MAT file by calling the getSpikes function:

(spikes, metadata, derivedData, columnLabels, columnDescription) = getSpikes(derivedData, animalID, sessionID[iSess], tbl)

This is a custom function containing the loading algorithm that very much depends on the processed data structure stored inside the MAT file. I will not go into the detail of how the function runs as your own data are very likely to be structured differently. However, you are welcome to explore the code yourself as it is commented generously. It will suffice to say that the function outputs the spikes variable which is a a 1-by-n list of numpy arrays of unit spike times in seconds, where n is the number of units. Moreover, the function also outputs the metadata variable which is a numpy array with rows corresponding to individual clusters (units) and columns to various metadata types describing unit properties, like cluster_id, local_cluster_id, type, peak_channel_index, peak_channel_id, local_peak_channel_id, rel_horz_position, rel_vert_position, isi_violations, isolation_distance, area, probe_id, and electrode_group. You can find the description of each of these properties inside the getSpikes function definition.

For convenience and computational efficiency, the function also outputs the already loaded MAT file which can be reused later (the derivedData variable). The column labels of the metadata array and their descriptions are also provided as columnLabels and columnDescription variables, respectively.

Note

Spike times are stored as a ragged array inside the NWB file. Because each unit has a different number of spikes, spiking data from all units are pooled together into a single vector but spikes are grouped on the unit basis. An index vector is constructed and stored together inside the NWB file indexing final spikes of each unit (i.e., row breaks of the initial numpy array). You can read more about ragged arrays here.

Load Waveforms#

Before converting spiking data we load unit waveforms. The waveforms data can also be stored as a ragged array, even as a double-indexed one. You can find more information here on how to construct such arrays. In our case, the waveforms arrays are rather simple: we are only interested in average waveforms on the probe recording channel with the largest waveform amplitude. This data are stored in the waveformMeans variable which is a numpy array of average waveforms with rows corresponding to individual units (MUAs are stored as NANs). The variable is constructed after loading and reshaping the waveforms located inside the convert2nwbEcephysPyNpx/npx_raw_derived_data/M200324_MD/<session-ID>/waveforms.mat files by calling the reshapeWaveforms function:

# Load data fields
if len(waveforms):
  maxWaveforms = np.array(waveforms.get('maxWaveforms')).transpose()
  cluIDs = np.array(waveforms.get('cluIDs')).transpose()
else:
  maxWaveforms = []
  cluIDs = []

# Load waveforms
metadataInds = np.where(np.array(np.isin(metadata[:,11], 'probe' + str(iEl))))[0]
metadata = metadata[metadataInds]
waveformsMean = []
if len(maxWaveforms) and len(maxWaveforms.shape):
  nWaveformSamples = maxWaveforms.shape[1]
else:
  nWaveformSamples = 200

for iUnit in range(metadata.shape[0]):
  row = np.where(np.isin(cluIDs, metadata[iUnit,1]))[0]
  if row.size:
    waveformsMean.append(maxWaveforms[row])
  else:
    waveformsMean.append(np.full([1,nWaveformSamples], np.nan))

return waveformsMean

Construct Units Table#

In order to store spiking data and other related data, we construct the units table which is, like the electrodes table, a DynamicTable object. As such it supports addition of extra metadata columns. The code that constructs the units table is shown below:

for iColumn in range(len(columnLabels)-1):
  nwb.add_unit_column(name=columnLabels[iColumn], description=columnDescription[iColumn])
for iUnit in range(len(spikes)):
  nwb.add_unit(
    cluster_id=metadata[iUnit,0],
    local_cluster_id=metadata[iUnit,1],
    type=metadata[iUnit,2],
    peak_channel_index=metadata[iUnit,3],
    peak_channel_id=metadata[iUnit,4],
    local_peak_channel_id=metadata[iUnit,5],
    rel_horz_pos=metadata[iUnit,6],
    rel_vert_pos=metadata[iUnit,7],
    isi_violations=metadata[iUnit,8],
    isolation_distance=metadata[iUnit,9],
    area=metadata[iUnit,10],
    probe_id=metadata[iUnit,11],
    electrode_group=metadata[iUnit,12],
    spike_times=spikes[iUnit],
    waveform_mean=waveformMeans[iUnit]
  )

Like when constructing the electrodes table, we first define the columns of the units table by invoking the NWBFile add_unit_column method (you can find its documentation here). The column names are derived from the columnLabels variable which is defined inside the getSpikes function discussed in one of the previous subsections. Column descriptions are derived from the columnDescription variable which is also defined inside the same function. Once columns are defined, we add metadata, spike times, and waveforms for each unit individually using the add_unit method (documented here).

Add Behavioural Module: Pupil Area Size#

The code inside the convert2nwb.py file up to this point described how to convert spiking data into the NWB format. If there was no behavioural or other data recorded, the NWB file can now be saved. That is rarely the case, however, and this and the next subsections describe how to convert certain type of behavioural data into this format. We start with the data extracted after analysing recordings of the pupil area size.

We load pupil area size data from the same file containing processed spiking data. We then convert this data into a TimeSeries object which is designed to store any general purpose time series data. This object has a few mandatory properties, like name, data, and unit and a few optional ones, like timestamps, control, control_description, and description. The data property can be stored as any 1-D, 2-D, 3-D, or 4-D array where the first dimension is time. We use it to store our pupilAreaSize numpy vector. The data_unit property has to be a string, while the timestamps property should give data sampling times and should be stored as a unidimensional numpy array. The control property is used for labeling data samples with integers. The length of this array should be the same as the length of the first data dimension representing time. In our case I am using integers 0 and 1 to mark acceptable quality data. The meaning of these labels is provided by the control_description property which is a string describing labels in the increasing order. The full code that converts the pupil area size data into the appropriate form is shown below:

pupilAreaSize = TimeSeries(
  name='TimeSeries',
  data=pupilAreaSize.squeeze(),
  timestamps=videoFrameTimes.squeeze(),
  unit='pixels^2',
  control=acceptableSamples.squeeze(),
  control_description='low quality samples that should be excluded from analyses; acceptable quality samples',
  description='Pupil area size over the recording session measured in pixels^2. ' +
    'Acceptable quality period starting and ending times are given by data_continuity parameter. ' +
    'The full data range can be divided into multiple acceptable periods.'
)
behaviour_module = nwb.create_processing_module(name="behavior", description="contains behavioral data")
pupilTracking = PupilTracking(pupilAreaSize)
behaviour_module.add(pupilTracking)

As you will notice, our TimeSeries object pupilAreaSize is assigned to a PupilTracking object which can hold one or more of the TimeSeries objects. We also create a ProcessingModule container object to store the bahavioural data using the NWBFile create_processing_module method (documented here). The final line adds the PupilTracking object to the ProcessingModule object by invoking the add method (documented here).

Add Behavioural Module: Total Facial Movement#

The facial movement data conversion process into appropriate form for NWB storage is almost identical to the conversion process applied to the pupil area size data. First, we load the data and store it inside the TimeSeries object. The steps are identical. Next, we store our TimeSeries object inside the BehavioralTimeSeries object designed for holding one or more TimeSeries objects:

behavioralTimeSeries = BehavioralTimeSeries(totalFaceMovement)

We then then add the BehavioralTimeSeries object to the ProcessingModule container object:

behaviour_module.add(behavioralTimeSeries)

Save NWB File#

We use the NWBFile NWBHDF5IO object to save our data in the NWB format:

if iSess < 9:
  nwbFilename = os.path.join(animalDerivedDataFolderNWB, 'ecephys_session_0' + str(iSess+1) + '.nwb')
else:
  nwbFilename = os.path.join(animalDerivedDataFolderNWB, 'ecephys_session_' + str(iSess+1) + '.nwb')
with NWBHDF5IO(nwbFilename, "w") as io:
  io.write(nwb)

Read NWB File#

Reading NWB files in Python is done via the NWBHDF5IO class. Hence, import it by calling:

from pynwb import NWBHDF5IO

Now if you want to open the NWB file that you just saved in Python, you can issue a command

file_path = './npx_derived_data_nwb/M200324_MD/ecephys_session_01.nwb'
io = NWBHDF5IO(file_path, mode="r")
nwb2 = io.read()

which will read the file passively. The action is fast and it does not load all of the data at once but rather make it readily accessible. This is useful as it allows you to read data selectively without loading the entire file content into the computer memory.

If you want to read the entire unit table which has all unit spiking data and associated metadata, the easiest way is to issue a command

units = nwb2.units.to_dataframe()

which will specifically load and convert the HDF5 DynamicTable object into a pandas DataFrame object. The latter is a common Python data type.

In contrast, if you are interested in a single unit data, you can load it using the following line of code:

unitRowColLabels = units.columns.values
unitRowValues = units.iloc[0].values

To load only spiking data for the same unit, the following line of code will do:

unitRowSpikeTimes = units["spike_times"][0]

If you want to load mean unit waveforms, the following line of code will load them as a numpy array:

unitWaveforms = units["waveform_mean"].values

Meanwhile, behavioural data can be accessed by running the code below:

pupilAreaSize = nwb2.processing['behavior']['PupilTracking']['TimeSeries'].data[:]
totalFacialMovement = nwb2.processing['behavior']['BehavioralTimeSeries']['TimeSeries'].data[:]

Other associated behavioural properties can be accessed by replacing the data property by timestamps or control and so on.

Some metadata are often directly available as properties of the NWBFile object, like:

sessionDescription = nwb2.session_description

Subject metadata are available via the command, for example:

animalID = nwb2.subject.subject_id

While the electrode metadata can be accessed in the following way:

electrodesTable = nwb2.ec_electrodes.to_dataframe()

electrodes table is a pandas DataFrame object and, therefore, methods of accessing it are the same as those discussed for accessing the units table.

For more detailed account of how you can access and manipulate DynamicTable objects refer to an external HDMF DynamicTable Tutorial.

Validate NWB File#

For validating NWB files use command line and type in

python -m pynwb.validate ./npx_derived_data_nwb/M200324_MD/ecephys_session_01.nwb

in order to validate the file that was recently created. The output should be:

Validating ./nnx_derived_data_nwb/M190114_A_MD/ecephys_session_01.nwb against cached namespace information using namespace 'core'.
 - no errors found.

The program exit code should be 0. On error, the program exit code is 1 and the list of errors is outputted.

For more details on the validation process, refer to an external resource.

Resources#

This section explained how you can use Python to convert your processed spiking and behavioural data into NWB files. It is not an exhaustive overview of PyNWB package extracellular electrophysiology functionality. There are other potential use cases involving local field potentials (LFPs) and different types of behavioural data. There is a number of externally available tutorials covering some aspects of converting data obtained in extracellular electrophysiology experiments to the NWB file format using Python:

Examine NWB File Structure Using HDFView#

You can use HDF View software to visualise the structure of your NWB files using a simple graphical interface.

Upload Your Standardised Data to Remote Repository#

Let’s say that you have completed the conversion of your data to the NWB format and now want to upload them to the remote repository. First of all, copy the NWB files to appropriate folders within your local repository. For the Neuronexus and Neuropixels example data in this tutorial, the NWB files should be copied to these locations:

infraslow-dynamics\03_data\101_uol_neuronexus_exp_derived_data\M190114_A_MD
infraslow-dynamics\03_data\102_uol_neuropixels_exp_derived_data\M200324_MD

Subsequently, navigate to the repository root folder and issue the following command in your terminal:

gin commit . -m "Converted derived data to NWB"

This will commit changes to the local repository. To upload them to the remote repository, just type:

gin upload .

Roll back Repository to an Earlier Version#

Now let’s assume you are not happy with the converted NWB files and you want to roll back to an earlier version of your repository prior to the conversion. Before we can do that, we need to find out the latest commit IDs by typing:

gin version

You should see a terminal output that looks similar to this one:

[ 1]  798d1e6 * Fri Nov 11 10:11:41 2022 (+0000)

    Converted derived data to NWB

  Added
    03_data/101_uol_neuronexus_exp_derived_data/M190114_A_MD/ecephys_session_01.nwb,
    03_data/102_uol_neuropixels_exp_derived_data/M200324_MD/ecephys_session_01.nwb

[ 2]  f288303 * Mon Sep 19 13:10:27 2022 (+0100)

    Added a few larger files

  Modified
    03_data/001_uol_neuronexus_exp_raw_derived/M190114_A_MD/201901221911031/continuous.ap_CAR.cbin,
    03_data/001_uol_neuronexus_exp_raw_derived/M190114_A_MD/201901221911031/continuous.lf.cbin,
    03_data/001_uol_neuronexus_exp_raw_derived/M190114_A_MD/2019012219110326/continuous.ap_CAR.cbin,
    03_data/001_uol_neuronexus_exp_raw_derived/M190114_A_MD/2019012219110326/continuous.lf.cbin,
    03_data/001_uol_neuronexus_exp_raw_derived/M190114_A_MD/video_79938.mp4,
    03_data/002_uol_neuropixels_exp_raw_derived/M200324_MD/202003241613491/continuous.ap_CAR.cbin,
    03_data/002_uol_neuropixels_exp_raw_derived/M200324_MD/202003241613491/continuous.lf.cbin,
    03_data/002_uol_neuropixels_exp_raw_derived/M200324_MD/202003241613492/continuous.ap_CAR.cbin,
    03_data/002_uol_neuropixels_exp_raw_derived/M200324_MD/202003241613492/continuous.lf.cbin,
    03_data/002_uol_neuropixels_exp_raw_derived/M200324_MD/video_67929_low_quality.mp4
...

Obviously we are interested in restoring the repository to the version just prior to the conversion and its ID is f288303. Now we type:

gin version --id f288303

The local repository now should be rolled back to its previous version.

Rolling back versions may produce some unintuitive results. Rolling back versions replaces new files with their earlier versions. However, if those files were not present in previous versions, they are kept in the rolled back version. Therefore, the new files have to be deleted manually after the roll back. This may seem like a bug but it actually is an intentional behaviour.