neurosift-blog

Analyzing NWB Datasets on DANDI with Dendro

INCF Neuroinformatics Assembly, September 26, 2024

Jeremy Magland, Center for Computational Mathematics, Flatiron Institute

Developed in collaboration with CatalystNeuro

Dendro overview

The goal of Dendro is to provide a user-friendly interface for analyzing DANDI NWB files in a shared and collaborative environment.

Note: Dendro is still at an early stage of development.

Dendro example: computing autocorrelograms

Most Neurosift features are client-side only meaning that the raw data is pulled directly from the NWB file and all calculations are made on the local machine. However, more advanced features require pre-processing of the data (e.g., using Python scripts). This is where Dendro comes in.

For example, suppose we want to view the Autocorrelograms for the units in the 000458 dataset. Sometimes this information is included in the Units table, but in this case it is not. We can use Dendro to calculate the Autocorrelograms and then view them in Neurosift.

Again go to our 000458 example. Open the Units panel and click on the “Units” link. Then click on the “Units Summary” tab.

image

We notice that somebody has already run the Dendro job to calculate the units summary, and so we can already see the Autocorrelograms.

Click on the job ID link to get more information on the job that what executed.

image

We see that this was completed 3 days ago with an elapsed time of 41 seconds.

Scroll down to see more information.

image

We can see the URLs of the input and output files as well as the values of the parameters that were used. You can also see the console output of the job and resource utilization graphs.

Computing autocorrelograms for a new dataset

Suppose you navigate to an NWB file that does not have the Autocorrelograms pre-computed. You can run a new Dendro job yourself.

Select a different 000458 example from here (note, some of them don’t have a Units table)

As above, click on the “Units” link, click on the “Units” link and then the “Units Summary” tab.

Maybe it will say “no jobs found”. You can specify your desired parameters and then click “SUBMIT JOB”.

image

Here’s where you have a choice. You can:

Let’s focus on the second option since it’s more interesting and will illustrate how the Dendro system works!

Hosting your own Dendro compute client

First you’ll need to send your Github user name to Jeremy so he can give you permission to fulfill jobs for Neurosift.

The following can be done either on Dandihub or on your local machine (see above for setup).

Create a new directory for your compute client (on Dandihub open a new terminal using File -> New -> Terminal)

mkdir dendro_compute_client
cd dendro_compute_client

Create the compute client

dendro register-compute-client

image

You can restrict to particular users or not. If you do not restrict, then your compute client is added to the public network and can be used by all Neurosift users.

You can also adjust the available CPU/GPU/RAM resources to match your machine.

Now run your compute client!

CONTAINER_METHOD=apptainer dendro start-compute-client

Note: If you don’t specify CONTAINER_MEthod=apptainer the default is to use docker (which is not available on Dandihub).

Make a note of the compute client ID and leave your terminal open. You can use tmux or screen if you are worried about closing the terminal or losing connection.

Back to computing autocorrelograms

Now that you have your compute client running, you can go back to Neurosift and submit the job to your compute client.

Enter your compute client ID in the “Compute client (optional)” field and click “SUBMIT JOB”.

You should see some activity in your terminal where the compute client is running. It does the following

You can use the refresh button in Neurosift to see the status of the job or click on the job ID to see more details such as the console output. It might be in “starting” status for a bit while the docker image is downloaded and the apptainer container is built. Subsequent runs will be faster because the container is cached.

When the job is complete, you’ll see the Autocorrelograms! And in the future, anyone who views this NWB file in Neurosift will see the Autocorrelograms without needing to recompute them.

For those in this workshop, if you did not restrict your compute client to only your user, we now have a shared pool of compute resources that can be used for everyone’s Neurosift jobs, making efficient use of idle CPU/GPU resources. This has not really been tested yet with a large number of users, so we’ll see how it goes!

Dendro Provenance Tab in Neurosift

Let’s go back to our 000458 example and click on the Dendro tab.

image

You can see that this file was used as input for two Dendro jobs. You can click on those to see the job details.

Now check out this example which is the result of spike sorting. Click on the Dendro tab to see the provenance pipeline of the files that were used to generate this output.

image

Tip: you can also generate a Python script to resubmit the job (or modify the parameters). Click to open the job in Dendro and then click the “Python Script” button.

CEBRA embedding example

CEBRA is a machine-learning method that can be used to compress time series in a way that reveals otherwise hidden structures in the variability of the data.

Let’s take a look at

Dandiset 000140 – MC_Maze_Small: macaque primary motor and dorsal premotor cortex spiking activity during delayed reaching

Open the one session in Neurosift

image

We’ve got a trials table (100 trials), three SpatialSeries objects (cursor_pos, eye_pos, hand_pos), and 142 neural Units.

Click on the “Units” link and then the “CEBRA” tab. Here you can queue up a job to compute a CEBRA embedding for the Units.

image

This produces a new NWB file with the CEBRA embedding added on as a new TimeSeries object. Click on the “View output in Neurosift” link to view the output file.

image

Notice there is a new object at processing/CEBRA/embedding.

Tick the checkboxes for “trials and “embedding” and then click “View 2 items” in the left panel to get a synchronized view of the trials and the CEBRA embedding.

image

You can see that the embedding has periodic structure that matches the trial structure! This is significant because in this case we did not provide the trial structure or the behavioral data to the CEBRA process. It was able to infer the trial structure from the neural data alone.

LINDI - output NWB files contain references to the input NWB files

In that last CEBRA example, the input was an NWB file from DANDI, and the output was a new NWB file containing all the information and data from the input file plus the CEBRA embedding. This was a relatively small file, but what happens when the input file is very large (e.g., contains raw electrophysiology data)? That’s where LINDI comes in.

Read more about LINDI here.

You can use the lindi Python package to read .lindi.json and .lindi.tar files as though they were HDF5 files, and you can even use pynwb!

For example, to load that embedding object in Python, do the following.

You should see something like this (copy it into your Jupyter notebook, e.g, on Dandihub):

import lindi

url = 'https://tempory.net/f/dendro/f/hello_world_service/hello_cebra/cebra_nwb_embedding_6/ujOk88BJmLM1zjGH4Xwr/output/output.nwb.lindi.tar'

# Load the remote file
f = lindi.LindiH5pyFile.from_lindi_file(url)

# load the neurodata object
X = f['/processing/CEBRA/embedding']

starting_time = X['starting_time'][()]
rate = X['starting_time'].attrs['rate']
data = X['data']

print(f'starting_time: {starting_time}')
print(f'rate: {rate}')
print(f'data shape: {data.shape}')

You can then do this using pynwb:

import pynwb

io = pynwb.NWBHDF5IO(file=f, mode='r')
nwbfile = io.read()
embedding = nwbfile.processing['CEBRA']['embedding']
print(embedding)

That’s a remote .nwb.lindi.tar file that has embedded references to the remote HDF5 .nwb file on DANDI, and we are able to load it as though it were a local .nwb file!

Viewing DANDI .avi video files

DANDI supports uploading of .avi files, but currently there is no way to preview/stream those files in the browser. Neurosift provides a workaround by using Dendro to precompute .mp4 files associated with portions of those .avi files. Here is an example.

image

“Hello world” Dendro examples

Note to self: determine at the time of the workshop whether we will have time to work through these examples.

Here’s an introduction on submitting simple “hello world” jobs and pipelines to Dendro.

For creating your own containerized Dendro apps, check out these examples.

Example: Dandiset 000363

000363: Mesoscale Activity Map Dataset

View Dandiset in Neurosift

Let’s explore this example session: sub-440956/sub-440956_ses-20190208T133600_behavior+ecephys+ogen.nwb

Behavior timeseries from DeepLabCut: Jaw tracking, Nose tracking, and Tongue tracking:

image

image

Table of trials:

image

1735 Units:

image

Task from Vincent Prevosto:

We created a special Dendro function to do perform these tasks. Here is the source code.

We then submitted this custom job using the following Python script (you need to set your DENDRO_API_KEY environment variable as in the hello world example above):

from dendro.client import submit_job, DendroJobDefinition, DendroJobRequiredResources, DendroJobInputFile, DendroJobOutputFile, DendroJobParameter

# https://neurosift.app/?p=/nwb&url=https://api.dandiarchive.org/api/assets/0eab806c-c5c3-4d01-bd7c-15e328a7e923/download/&dandisetId=000363&dandisetVersion=draft
input_url = 'https://api.dandiarchive.org/api/assets/0eab806c-c5c3-4d01-bd7c-15e328a7e923/download/'

service_name = 'hello_world_service'
app_name = 'hello_neurosift'
processor_name = 'tuning_analysis_000363'
job_definition = DendroJobDefinition(
    appName=app_name,
    processorName=processor_name,
    inputFiles=[
        DendroJobInputFile(
            name='input',
            url=input_url,
            fileBaseName='input.nwb'
        )
    ],
    outputFiles=[
        DendroJobOutputFile(
            name='output',
            fileBaseName='output.nwb.lindi.tar'
        )
    ],
    parameters=[
        DendroJobParameter(
            name='units_path',
            value='/units'
        ),
        DendroJobParameter(
            name='behavior_paths',
            value=[
                '/acquisition/BehavioralTimeSeries/Camera0_side_JawTracking',
                '/acquisition/BehavioralTimeSeries/Camera0_side_NoseTracking',
                '/acquisition/BehavioralTimeSeries/Camera0_side_TongueTracking'
            ]
        ),
        DendroJobParameter(
            name='behavior_dimensions',
            value=[
                1,
                1,
                1
            ]
        ),
        DendroJobParameter(
            name='behavior_output_prefixes',
            value=[
                'jaw',
                'nose',
                'tongue'
            ]
        )
    ]
)
required_resources = DendroJobRequiredResources(
    numCpus=4,
    numGpus=0,
    memoryGb=4,
    timeSec=60 * 50
)

job = submit_job(
    service_name=service_name,
    job_definition=job_definition,
    required_resources=required_resources,
    target_compute_client_ids=None,
    tags=[],
    skip_cache=False,
    rerun_failing=True,
    delete_failing=True
)

print(job.job_url, job.status)

To see the results, go to the DENDRO tab in our example.

image

Click on the “output” for the tuning_analysis_000363 job.

It’s the same NWB file but with some additional objects

image

Tick the two checkboxes shown in the screenshot, then “View 2 items”, and you’ll be able to see the computed phase compared with the position of the jaw.

image

If you open the Units table and scroll to the right, you’ll see new columns for the tuned phase, including the p-values.

image

Next steps

Spike sorting

Spike sorting is CHALLENGING due to several factors:

Despite these challenges, we are still excited to tackle this problem with Dendro! (It’s actually our motivating example.)

Currently in the early proof-of-concept stage, we are looking for labs willing to test it as we continue to refine and develop its capabilities.

If we succeed, Neurosift/Dendro will:

Spike sorting example 000463

Let’s head over to Dandiset 000463 - Electrophysiological Recordings in Anesthetized Rats in the Primary Somatosensory Cortex with Phased Ultrasound Array Stimulation

Open the first session in Neurosift.

We’ve got an ElectricalSeries with 32 channels for 9 minutes. Good to start small.

image

Click to open this ElectricalSeries, set number of visible channels to 32, increase the spacing between the channels, and zoom in to see the traces.

image

Next head over to the “Ephys Summary” tab. It looks like a Dendro job has already been executed there, so you should see estimated firing rates and power spectra for the electrodes. This will give an idea of which channels to include in the sorting.

image

Now click on the “Spike Sorting (WIP)” tab. There are three steps in the spike sorting pipeline:

image

The interface is a bit difficult to navigate at this point. What will improve over time.

Drill down to one of the post processing results and click “View output in Neurosift”.

You will see a new units table in processing/ecephys/units_mountainsort5. This has the spike trains, quality metrics, and other data needed to visualize autocorrelograms and average waveforms.

image

In this particular example, MountainSort 5 only finds two units, and Kilosort 4 crashes (I think it can’t handle fewer than 64 electrodes). As mentioned, spike sorting is a challenging business!

Spike sorting with neuropixels: Dandiset 000409

Let’s head over to Dandiset 000409 - IBL - Brain Wide Map

Navigate to sub-CSHL045 and the first session.

Click on Acquisition/ElectricalSeriesAp and go to the “Spike Sorting (WIP)” tab.

I selected a 20 minute segment and 64 channels. This time Kilosort 4 worked and found 85 units!

Click on the spike_sorting_post_processing for kilosort4 and then click “View output in Neurosift”. Then go to processing/ecephys/units_kilosort4. We can see the autocorrelograms, average waveforms, unit locations, and quality metrics.

image

image

image

image

To see the spike trains overlaid on top of the electrical series, click on the EphysAndUnits link next to acquisition/ElectricalSeriesAp_pre, and then “EphysAndUnits: /processing/ecephys/units_kilosort4”.

image

Increase the number of visible channels, increase the channel separation, sort by number of spikes, select some units, and zoom in.

image

So there we have a proof-of-concept for spike sorting with neuropixels data in Neurosift/Dendro!