Skip to content

legaldatasets Documentation

Welcome

The legaldatasets package provides a curated collection of legal, criminal justice, and political datasets for data analysis, statistical modeling, and machine learning research. It includes datasets related to U.S. crime rates, court decisions, asylum and immigration data, election results, lawsuits, incarceration statistics, insurance claims, foreign aid, corruption investigations, domestic violence, jury behavior, and death penalty sentencing.

The package contains datasets related to takeover bids received by U.S. firms, state-level drunk driving and fatality statistics, U.S. incarceration records, county-level crime rates in North Carolina, personal injury and liability insurance claims, employment sex discrimination, the Bush–Gore ballot controversy, mock jury sentencing decisions, Xi Jinping's anti-corruption campaign, Mafia presence across Italian provinces, UK asylum cases, U.S. law school enrollments, bankruptcy claims, and much more from curated R packages on CRAN.

Philosophy

The author's vision is to create specialized dataset packages focused on specific themes and topics. Instead of searching through multiple generic data packages to find relevant datasets, users can go directly to a thematic package where all datasets are carefully curated around a particular subject.

In the case of legaldatasets, every dataset is exclusively focused on legal, criminal justice, and political research, making it the go-to resource for researchers, data scientists, statisticians, lawyers, and students working in the fields of criminology, political science, law, public policy, and machine learning.

Getting Started

Installation

The easiest way to install legaldatasets is directly from PyPI:

pip install legaldatasets

From GitHub (Latest Development Version)

To get the latest development version with the newest features and bug fixes:

pip install git+https://github.com/lightbluetitan/legaldatasets-py

Quick Start Tutorial

1. Import the Package

import legaldatasets as ld

2. List Available Datasets

See all datasets included in the package:

# Get list of all datasets
datasets = ld.list_datasets()
print(datasets)

3. Load a Dataset

Load any dataset as a pandas DataFrame:

# Load us_lawschool_enrollments
df = ld.load_dataset('us_lawschool_enrollments')

# Display first rows
print(df.head())

# Check dataset dimensions
print(f"Shape: {df.shape}")

4. Describe a dataset


# Describe a dataset
print(ld.describe("us_lawschool_enrollments"))

Basic Concepts

Dataset Naming Convention

All dataset names in legaldatasets follow a consistent naming pattern:

  • Lowercase with underscores: bankruptcy_claims
  • Descriptive names that reflect content

Some Datasets available at legaldatasets

Every dataset is exclusively focused on legal, criminal justice, and political topics for data analysis, statistical modeling, and machine learning:

  • us_lawschool_enrollments: U.S. law school enrollment statistics by year and gender from 1963 to 2015.
  • bankruptcy_claims: Creditor claims in a bankruptcy case by creditor type and claim amount.
  • economic_benefits_justice: Cross-national data on justice systems and economic indicators including FDI, growth, and development.
  • corporate_takeover_bids: This dataset contains information on takeover bids received by U.S. firms.
  • trump_lawsuits: Federal and state lawsuits involving Donald Trump, including case details and court locations.

Disclaimer: The datasets included in legaldatasets are provided strictly for educational, research, and informational purposes. For legal advice, case strategy, or any law-related decision-making, always consult a qualified legal professional.

Data Licenses

All datasets are sourced from the R package ecosystem (CRAN) and maintain their original open-source licenses:

  • Most datasets use GPL (>= 2), GPL-2, or GPL-3
  • Some use MIT + file LICENSE
  • The legaldatasets package itself is licensed under GPL-3.0