Skip to content

neurodatasets Documentation

Welcome

The neurodatasets package provides a curated collection of neuroscience and brain-related datasets for data analysis, statistical modeling, and machine learning research. It includes datasets related to biomedical voice recordings, MRI brain imaging, Alzheimer's and Parkinson's disease biomarkers, epilepsy seizures, ADHD symptoms, dopamine and serotonin measurements, cognitive impairment, sleep studies, PTSD, and migraine treatment.

The package contains datasets related to biomedical voice features from Parkinson's patients, longitudinal MRI data from demented and nondemented adults, plasma protein biomarkers for Alzheimer's research, speech signal features for Parkinson's classification, hippocampal lesion measurements, gray and white matter patterns from the ENIGMA consortium, brain cell marker genes for humans and mice, and much more from curated R packages on CRAN.

Philosophy

The author's vision is to create specialized dataset packages focused on specific themes and topics. Instead of searching through multiple generic data packages to find relevant datasets, users can go directly to a thematic package where all datasets are carefully curated around a particular subject.

In the case of neurodatasets, every dataset is exclusively focused on neuroscience and brain-related research, making it the go-to resource for researchers, data scientists, statisticians, neurologists, and students working in the fields of neurology, psychiatry, cognitive science, bioinformatics, and machine learning.

Cross-Platform Ecosystem

neurodatasets has a sibling package in the R ecosystem called NeuroDataSets, maintaining consistency across programming languages and ensuring that users can work with the same high-quality datasets whether they prefer Python or R.

This cross-platform approach reflects our commitment to making specialized datasets accessible to the widest possible audience, regardless of their preferred data analysis environment.

Getting Started

Installation

The easiest way to install neurodatasets is directly from PyPI:

pip install neurodatasets

From GitHub (Latest Development Version)

To get the latest development version with the newest features and bug fixes:

pip install git+https://github.com/lightbluetitan/neurodatasets-py

Quick Start Tutorial

1. Import the Package

import neurodatasets as nd

2. List Available Datasets

See all datasets included in the package:

# Get list of all datasets
datasets = nd.list_datasets()
print(datasets)

3. Load a Dataset

Load any dataset as a pandas DataFrame:

# Load cocaine_dopamine
df = nd.load_dataset('cocaine_dopamine')

# Display first rows
print(df.head())

# Check dataset dimensions
print(f"Shape: {df.shape}")

4. Describe a dataset


# Describe a dataset
print(nd.describe("chimp_brains"))

Basic Concepts

Dataset Naming Convention

All dataset names in neurodatasets follow a consistent naming pattern:

  • Lowercase with underscores: chimp_brains
  • Descriptive names that reflect content

Some Datasets available at neurodatasets

Every dataset is exclusively focused on neuroscience and brain-related topics for data analysis, statistical modeling, and machine learning:

  • gray_matter_patterns: Expected patterns of gray matter derived from large-scale meta-analyses by the ENIGMA consortium.
  • white_matter_patterns: Expected patterns of white matter derived from large-scale meta-analyses by the ENIGMA consortium.
  • encephalitis_herpes: Cases of herpes encephalitis in children observed in Bavaria and Lower Saxony between 1980 and 1993.
  • sleep_study_college: Sleep patterns and academic performance data for college students.
  • alzheimer_smoking: Smoking history and disease classification from a case-control study on Alzheimer's disease.
  • brain_size_iq: Brain size, IQ scores and gender data from psychology students for intelligence research.
  • mammal_brains: Brain weight, body weight, gestation length and litter size across 96 mammal species.
  • chimp_brains: Brodmann's area 44 asymmetry measurements in chimpanzees by sex.

Disclaimer: The datasets included in neurodatasets are provided strictly for educational, research, and informational purposes. For clinical or medical decision-making, always consult qualified healthcare professionals.

Data Licenses

All datasets are sourced from the R package ecosystem (CRAN) and maintain their original open-source licenses:

  • Most datasets use GPL-2 or GPL (>= 2)
  • Some use MIT License or GPL-3
  • The neurodatasets package itself is licensed under GPL-3.0