Inspiration

Neuroscientists spend weeks manually analyzing neural recordings to identify functional cell types - a tedious, subjective process that slows discovery. We were inspired by the challenge of turning thousands of raw spike trains into interpretable insights about how the brain controls movement. What if AI could automatically discover distinct neuron populations based on their firing patterns during behavior?

What it does

Our tool automatically classifies neurons into functional groups based on their activity during motor tasks. It:

  • Loads neural firing rate data from behavioral experiments (leading vs. trailing limb movements)
  • Applies unsupervised machine learning (hierarchical clustering + K-means) to discover natural groupings
  • Generates publication-ready visualizations: dendrograms, heatmaps, PCA plots, and schematic diagrams
  • Identifies neurons with distinct response profiles (e.g., "early LEAD preference", "sustained TRAIL response")

Input: MATLAB files with neural firing rates × time
Output: Clustered cell types with interpretable functional signatures
Speed: What took days now takes <5 minutes

How we built it

Tech Stack:

  • Python (scipy, scikit-learn, matplotlib, seaborn)
  • K-means & hierarchical clustering with silhouette score optimization
  • Z-score normalization to handle neurons with different baseline firing rates
  • PCA for dimensionality reduction and visualization

Workflow:

  1. Load MATLAB .mat files containing neural activity during LEAD/TRAIL forelimb movements
  2. Concatenate firing rate time series (201 neurons × 100 timepoints × 2 conditions)
  3. Normalize and cluster using Ward's linkage + K-means (K=3-6)
  4. Validate with silhouette scores to find optimal cluster count
  5. Generate interpretable visualizations showing each cluster's functional signature

Challenges we ran into

  • MATLAB ↔ Python interop: Navigating nested struct arrays from .mat files required careful indexing (mat_contents['plot_data'][0,0]['field'])
  • Choosing the right features: Should we use raw rates, z-scored rates, temporal derivatives, or peak times? We iterated on feature engineering
  • Interpretability: Clustering worked, but making results meaningful to neuroscientists required domain-specific visualizations (schematic diagrams with annotations)
  • Normalization dilemma: High-firing neurons dominated clustering until we applied per-neuron z-scoring

Accomplishments that we're proud of

Fully automated pipeline from raw data → clustered results with zero manual intervention
Discovered 4-6 distinct functional cell types in motor cortex data Publication-quality figures ready for scientific papers
Generalized solution: Works on any neural dataset with trial-aligned firing rates
Fast: 201 neurons clustered in ~30 seconds on a laptop

What we learned

  • Unsupervised ML reveals hidden structure: Even without labels, clustering found biologically meaningful patterns (neurons preferring specific movement phases)
  • Visualization >> metrics: A good dendrogram tells more than a silhouette score
  • Domain knowledge matters: Understanding neuroscience (e.g., LEAD/TRAIL = ipsilateral/contralateral limb control) was crucial for interpreting clusters
  • Prompt engineering for AI tools: Effectively using Gemini for code generation required precise data structure descriptions

Technical lessons:

  • Hierarchical clustering with Ward linkage works well for high-dimensional neural data
  • Z-scoring per neuron (not per time bin) preserves temporal dynamics while normalizing magnitude
  • PCA's first 2 components captured 40-60% variance - surprisingly good for 200-dim data

What's next for this neural project

Real-time classification: Deploy as a web app where researchers upload data and get instant cluster assignments
Multi-region analysis: Extend to classify neurons across brain areas (thalamus, cortex, cerebellum)
Temporal dynamics: Add

Built With

Share this project:

Updates