neurosift-blog

October 2024

Neurosift AI agent

2024-10-31

I have made a preliminary version of the Neurosift AI agent available for public use. Intentionally I have limited its scope to a few well-definied capabilities so that it is less likely to make up incorrect information. I also instructed it to be scientific and concise, so hopefully not annoying. :)

It has the ability to probe the DANDI Archive in a variety of ways. Lexical search, semantic search, search for neurodata types, search for column names in Units tables, the data structure of individual NWB files, and more. It also has the ability to respond with inline interactive plots (although for now this is limited to timeseries alignment plots).

I doesn’t yet know how to write Python code for loading/processing data, but that’s coming next.

Feel free to try it out!

Or check out some example saved chats.

Experimental AI Agent for Finding Relevant Datasets on DANDI

2024-10-25

Today, I implemented an experimental AI agent capable of locating and describing relevant datasets on DANDI. This agent collaborates with “helper assistants” designed specifically to retrieve precise information without cluttering the main chat context. Here’s how it works:

To start, the main agent runs a semantic search, comparing an embedding of the user’s query with a database of precomputed embeddings for public Dandisets (as described below). Once it finds relevant datasets, it consults a helper assistant for more detail. The helper assistant directly accesses the DANDI API and extracts the necessary metadata—like authors, dataset size, asset counts, or experimental details—then condenses this information into a concise summary. This keeps the main conversation streamlined, even when multiple Dandisets are being summarized. Importantly, the potentially lengthy response from the DANDI API doesn’t directly get included in the context.

Below is a screenshot of an example session. In it, I asked the agent: “I’m looking for dandisets that will help me test out spike sorting algorithms.” The left panel shows sub-discussions with the helper assistants, and in the main window, we see a clean, summarized response.

This is still in the early stages, so there’s no public link yet. If interested, reach out for access.

Support for iEEG Data in EDF Files on OpenNeuro

2024-10-22

Neurosift now lets you visualize iEEG data stored in EDF files, commonly found on OpenNeuro. European Data Format (EDF) is a standard format for storing biological timeseries data such EEG.

Try it out:

Visit an example dataset like this one.
Navigate to a session folder, go to the ieeg folder, and right-click the download button for an .edf file to copy the download URL.
Paste the URL into this link format: https://neurosift.app?p=/edf&url=… (replace the … with the copied URL to visualize the data)

Here’s an example.

The hope is to integrate Neurosift directly into OpenNeuro for easier access.

Update 10/23/2024: There is now a left panel where you can view the channel labels and sampling frequencies and select/deselect channels for viewing.

Preview TwoPhotonSeries as Videos

2024-10-18

Neurosift now lets you preview TwoPhotonSeries data as videos, making it easier to assess data quality, detect motion artifacts, corruption, or simply explore the dataset. Since TwoPhotonSeries arrays in NWB files are often large and not optimized for streaming, loading can be slow. But with this Dendro-enabled view, you can quickly preview a section of video to get an overview.

How it works:

When you open a TwoPhotonSeries object, you’ll see a “Movie” tab where you can watch the video preview. If the Dendro job has already run, the video loads immediately. Otherwise, you can submit a new job, provided you have the necessary API key. The default preview duration is one minute, but you can adjust it for longer previews, though this will take more time to process.

Here’s an example from Dandiset 000871 (click the Movie tab).

AI generated Dandiset summaries and semantic search

2024-10-18

Neurosift has a feature where you can do a semantic search for relevant Dandisets by pasting in a scientific abstract (click on “Search by abstract”). This is powered by a database of embeddings of Dandiset summaries that were automatically generated here. These summaries were generated by an LLM using prompts created based on the content of the NWB files in the Dandiset and metadata for the Dandiset. You can check out the repo to see all the generated prompts and summaries for many of the public datasets. Right now this is just a snapshot in time, so unfortunately the search results do not include the most recent Dandisets.

Multiscale Spike Density and Rastermap Integration

2024-10-17

Today, I came across a new paper in Nature Neuroscience that presents an exciting technique, which I decided to incorporate into Neurosift/Dendro (thanks Ben). Rastermap is a method for visualizing large-scale neural recordings by sorting neurons based on their activity patterns. To build this into Neurosift, I added two new Dendro-enabled features.

Multiscale Spike Density Plot. The first new Dendro processor I implemented is a multiscale spike density plot. This plot downscales the spike train data (in a NWB units table) at multiple temporal resolutions, allowing users to zoom all the way out and still have the interface be responsive. By preparing the multiscale spike density array via a Dendro job, Neurosift now handles large spike train datasets much more efficiently. Here’s the source code.

Rastermap Integration. The second feature is the integration of the Rastermap method. I added a new Dendro processor that computes the Rastermap ordering of the units, allowing for a more structured view of neural activity. Once the job is complete, Neurosift users can click a checkbox to display the spike density map using this Rastermap ordering, providing a refined visualization of unit activity based on their temporal relationships. Here’s the source code.

I tried this out on a random Neuropixels example on Dandiset 000409 (IBL - Brain Wide Map). Go to this random example, wait for the spike density map to load, and then click the “Use rastermap order” checkbox.

Here are screenshots without (top) and with (bottom) the rastermap order:

This can now be applied to any DANDI NWB file with a units table. You just need a Dendro API key with permission to run jobs.

Speedups in time series rendering

2024-10-16

Today, I made improvements to how time series data are rendered in Neurosift. Previously, users had to zoom in before they could see the data clearly. While this approach saved computational resources, it wasn’t the most user-friendly experience. The new changes introduce an adaptive downsampling technique that samples minimum and maximum values based on the zoom level and the width of the window. This change allows the data to load at a wider range of zoom levels, improving the experience for users who found it annoying to have to zoom in initially.

The downsampling occurs entirely on the client side, so while the rendering performance has been improved, the same amount of data is downloaded to the browser as before. Optimizing the amount of data transferred is a more complex problem that involves preparing multiscale downsampled data ahead of time, which I’m actively working on. However, that will take more time to implement.

These enhancements should make the initial data viewing smoother and faster, especially for those working with large time series datasets.

Conditional loading of documentation using AI function calls

2024-10-15

The Neurosift AI assistant can conditionally load documentation resources as needed based on the user’s prompt. Here’s the instructions that the assistant receives at this point.

The following external resources are available using the "load_external_resource" tool - you should utilize these when appropriate.
...

* Create 2D tuning curves using Pynapple: https://github.com/magland/dandiset-notes/blob/main/dandisets/000582/000582.ipynb
* Load data objects from an NWB file using Pynapple: ./pynapple-docs.md
* Run spike sorting on data from this NWB file: ./spike-sorting-docs.md
* Iterate through the NWB files in a Dandiset: https://github.com/magland/dandiset-notes/blob/main/howto/dandi/iterate_through_nwb_files_in_dandiset.ipynb

The assistant/agent can call the load_external_resource tool to load the resource into the context. Typically tool calls or function calls are used to retrieve records from a database or to do a calculation, so I wasn’t sure if it would work for loading a document into the context. But based on preliminary testing with the gpt-4o model, it seems to work!

For example, on the main Neurosift page you can open the chat and ask "how to iterate through NWB files on Dandiset 000888?". You’ll then see the assistant respond with "load_external_resource https://github.com/magland/dandiset-notes/blob/main/howto/dandi/iterate_through_nwb_files_in_dandiset.ipynb", and then respond with the script based on what it learned from the content of the notebook.

From my preliminary experiments, other LLM models don’t do as well with this type of thing.

Pynapple 2D tuning curves

2024-10-14

Expanding on the previous post, I wanted to give the AI agent knowledge about creating 2D tuning curves using Pynapple (this was suggested by Ben). So I adapted this tutorial notebook into this notebook which is meant to be ingested by the AI agent. As described in the previous post, I then added an annotation to Dandiset 000582 on Neurosift to point to this notebook. Now when you open any NWB file from 000582 (for example this one), the AI agent will give you the option of including this as a resource inside the context, and you can ask something like: “Plot 2D tuning curves”. There was an issue with the number of rows in the figure the first time I tried this, so I had to follow up with “You should take into consideration the number of units when you are generating the plot figure.” The agent then adjusted the code accordingly and produced the following script:

# Import necessary libraries
import pynwb
import lindi
import matplotlib.pyplot as plt
import pynapple as nap
import seaborn as sns

# Define the URL for the NWB file
url = 'https://lindi.neurosift.org/dandi/dandisets/000582/assets/834ac48a-9bdf-4e79-a028-cbbf3136b01e/nwb.lindi.json'

# Load the NWB file
f = lindi.LindiH5pyFile.from_lindi_file(url)
io = pynwb.NWBHDF5IO(file=f, mode='r')
nwbfile = io.read()

# Load the NWB file with Pynapple
nwbp = nap.NWBFile(nwbfile)

# Access the units and position data
units = nwbp["units"]  # TsGroup
position = nwbp["SpatialSeriesLED1"]  # 2D position data

# Compute 2D tuning curves
tc, binsxy = nap.compute_2d_tuning_curves(units, position, 20)

# Adjust plot size based on the number of units
num_units = len(units)
cols = 4
rows = (num_units + cols - 1) // cols  # Calculate rows needed
plt.figure(figsize=(15, 4 * rows))

# Styles for plotting
custom_params = {"axes.spines.right": False, "axes.spines.top": False}
sns.set_theme(style="ticks", palette="colorblind", font_scale=1.5, rc=custom_params)

# Plot 2D tuning curves
for i, (unit_id, tc_data) in enumerate(tc.items()):
    plt.subplot(rows, cols, i + 1)
    plt.imshow(tc_data, origin="lower", aspect="auto")
    plt.title(f"Unit {unit_id}")
plt.tight_layout()
plt.show()

# Close the file
io.close()

And upon execution you get a figure like the one on the Pynapple docs.

Of course there are other Dandisets that could benefit from this resource, and in the future it would be nice if we didn’t need to manually attach the notebook annotation to each Dandiset. Instead we would want to include this as a general resource that the agent could access when needed.

AI-Driven Exploration of NWB Files: A Hands-on Experience with Dandiset 001037

2024-10-11

The last post focused on the Neurosift AI assistant in the context of Dandisets and DANDI as a whole. Today I extended the functionality of Neurosift’s AI assistant to provide a more hands-on experience for individual NWB files within a Dandiset, especially for generating Python code for exploration and analysis, leveraging metadata from both the NWB file, the surrounding Dandiset, and external resources.

Experimenting with 001037

I started by experimenting with Dandiset 001037, aiming to reproduce a figure from the associated paper, “Causal evidence of a line attractor encoding an affective state”. While the assistant could load and explore the structure of the NWB file via pynwb, I quickly realized that it would need to be guided through the process. This ultimately resulted in a Jupyter notebook that was able to reproduce Figure 1d in the paper. I uploaded it to GithHub.

Then I realized it would be great to provide this specific notebook as a resource for my future interactions with the assistant, as well as for others coming across the same dataset. I used the existing Neurosift’s annotations feature to attach the URL of the notebook to the Dandiset. Then I added functionality to the assistant to detect linked resources and offer to optionally include them in the chat context for each session (for me or others).

Now in my chat (after selecting the notebook resource) I can just write “produce figure 1d for this NWB file” and it writes the code for me in a tidy script. Of course, it’s just reproducing what I did in the notebook. However the usefulness became apparent when I tried it with a different NWB file in the same Dandiset. Again, for the second file, I selected the notebook in the assistant’s context and prompted it to “produce figure 1d for this NWB file.” The assistant analyzed the structure of the new NWB file and adjusted the code accordingly. It tweaked necessary names and parameters and successfully generated code for producing the corresponding figure for the new file.

Try it out

Go to Dandiset 001037.
Open one of the NWB files.
Click the chat icon in the left panel and select the notebook resource.
Ask the assistant to “produce figure 1d for this NWB file.”
Copy the generated code and run it in your Python environment.
Repeat the process with another NWB file in the same Dandiset.

The figure for the first NWB file (from the paper):

The figure for the second NWB file (not in the actual paper):

Implications: Of course the question is why does the second figure look weird? Could it be a problem with the AI-generated code? Or is something odd with the data? Or is there more context and information needed? These are valuable questions! And questions we never would have known to ask without this type of exploration. The assistant is not a replacement for a human, but it can be a powerful tool for guiding exploration.

Next Steps: Try this type of thing with other Dandisets.

AI assistant developments

2024-10-10

There is now a chat assistant side panel on the main DANDI exploration page. You can ask about DANDI, Neurosift, or how to search for data. It also has the ability to do a lexical search on its own, so you can ask it for data on a specific topic and it will return relevant Dandisets.

A more interesting new assistant is on the Dandiset page. As an agent, it can autonomously retrieve the following information:

Metadata about the Dandiset (description, contributors, number of files, etc)
A list of the NWB files in the Dandiset
Details about a specific NWB files (efficient loading via LINDI).

So for example you can ask it “What are the Neurodata types in this Dandiset?” and it will do the necessary research. For example, try it out for Dandiset 000472. You can even follow up with questions about the neurodata objects, as shown in the screenshot.

Next steps: The next step is to develop the assistant for the single NWB file view. It should be aware of the data objects, available visualizations, and available Dendro analyses. It should also be able to generate Python code for loading the data objects, and creating basic figures using matplotlib. Ultimately, with Dendro integration, it should be able semi-automatically create create and execute analysis pipelines.

Future: Ultimately, we would like to be able to ask the assistant high level questions, and have it do research across all public datasets on DANDI.