legaldatasets Documentation
Welcome
The legaldatasets package provides a curated collection of legal, criminal justice, and
political datasets for data analysis, statistical modeling, and machine learning
research. It includes datasets related to U.S. crime rates, court decisions,
asylum and immigration data, election results, lawsuits, incarceration statistics,
insurance claims, foreign aid, corruption investigations, domestic violence,
jury behavior, and death penalty sentencing.
The package contains datasets related to takeover bids received by U.S. firms, state-level drunk driving and fatality statistics, U.S. incarceration records, county-level crime rates in North Carolina, personal injury and liability insurance claims, employment sex discrimination, the Bush–Gore ballot controversy, mock jury sentencing decisions, Xi Jinping's anti-corruption campaign, Mafia presence across Italian provinces, UK asylum cases, U.S. law school enrollments, bankruptcy claims, and much more from curated R packages on CRAN.
Philosophy
The author's vision is to create specialized dataset packages focused on specific themes and topics. Instead of searching through multiple generic data packages to find relevant datasets, users can go directly to a thematic package where all datasets are carefully curated around a particular subject.
In the case of legaldatasets, every dataset is exclusively focused on legal,
criminal justice, and political research, making it the go-to resource for
researchers, data scientists, statisticians, lawyers, and students working in
the fields of criminology, political science, law, public policy, and machine learning.
Getting Started
Installation
From PyPI (Recommended)
The easiest way to install legaldatasets is directly from PyPI:
pip install legaldatasets
From GitHub (Latest Development Version)
To get the latest development version with the newest features and bug fixes:
pip install git+https://github.com/lightbluetitan/legaldatasets-py
Quick Start Tutorial
1. Import the Package
import legaldatasets as ld
2. List Available Datasets
See all datasets included in the package:
# Get list of all datasets
datasets = ld.list_datasets()
print(datasets)
3. Load a Dataset
Load any dataset as a pandas DataFrame:
# Load us_lawschool_enrollments
df = ld.load_dataset('us_lawschool_enrollments')
# Display first rows
print(df.head())
# Check dataset dimensions
print(f"Shape: {df.shape}")
4. Describe a dataset
# Describe a dataset
print(ld.describe("us_lawschool_enrollments"))
Basic Concepts
Dataset Naming Convention
All dataset names in legaldatasets follow a consistent naming pattern:
- Lowercase with underscores:
bankruptcy_claims - Descriptive names that reflect content
Some Datasets available at legaldatasets
Every dataset is exclusively focused on legal, criminal justice, and political topics for data analysis, statistical modeling, and machine learning:
- us_lawschool_enrollments: U.S. law school enrollment statistics by year and gender from 1963 to 2015.
- bankruptcy_claims: Creditor claims in a bankruptcy case by creditor type and claim amount.
- economic_benefits_justice: Cross-national data on justice systems and economic indicators including FDI, growth, and development.
- corporate_takeover_bids: This dataset contains information on takeover bids received by U.S. firms.
- trump_lawsuits: Federal and state lawsuits involving Donald Trump, including case details and court locations.
Disclaimer: The datasets included in
legaldatasetsare provided strictly for educational, research, and informational purposes. For legal advice, case strategy, or any law-related decision-making, always consult a qualified legal professional.
Data Licenses
All datasets are sourced from the R package ecosystem (CRAN) and maintain their original open-source licenses:
- Most datasets use GPL (>= 2), GPL-2, or GPL-3
- Some use MIT + file LICENSE
- The
legaldatasetspackage itself is licensed under GPL-3.0