Skip to contents
library(infectiousR)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

Introduction

The infectiousR package provides a seamless interface to access real-time data on infectious diseases through the disease.sh API, a RESTful API offering global health statistics. The package enables users to explore up-to-date information on disease outbreaks, vaccination progress, and surveillance metrics across countries, continents, and U.S. states.

It includes a set of API-related functions to retrieve real-time statistics on COVID-19, influenza-like illnesses from the Centers for Disease Control and Prevention (CDC), and vaccination coverage worldwide.

Additionally, infectiousR offers a built-in function to view the datasets available within the package. The package also includes curated datasets on infectious diseases such as influenza, measles, dengue, Ebola, tuberculosis, meningitis, AIDS, and others — making it a comprehensive resource for real-time monitoring and historical analysis of global infectious disease data.

Functions for infectiousR

The infectiousR package provides several core functions to retrieve real-time infectious disease data from the disease.sh API. Below is a list of the main API-access functions included in the package:

These functions enable users to access up-to-date, structured information on infectious diseases, which can be combined with tools such as dplyr and ggplot2 for powerful epidemiological analysis and visualization. In the next section, we’ll explore a use case to demonstrate how to visualize COVID-19 data with infectiousR.

US COVID-19 Statistics: Top 5 States by Total Cases


# CRAN-safe handling: avoid Internet dependency failures
covid_data_safe <- tryCatch(
  get_us_states_covid_stats(),
  error = function(e) NULL
)

# Fallback static dataset (to ensure vignette builds without Internet)
if (is.null(covid_data_safe) || nrow(covid_data_safe) == 0) {
  covid_data_safe <- data.frame(
    state = c("California", "Texas", "Florida", "New York", "Illinois"),
    cases = c(12000000, 9500000, 8200000, 7000000, 5800000)
  )
}

# Clean and plot safely
covid_clean <- covid_data_safe %>%
  slice_head(n = 5) %>%
  select(where(~ !all(is.na(.))))

if ("cases" %in% names(covid_clean) && any(!is.na(covid_clean$cases))) {
  ggplot(covid_clean, aes(x = reorder(state, -cases), y = cases, fill = state)) +
    geom_bar(stat = "identity") +
    scale_y_continuous(labels = function(x) format(x, big.mark = ",", scientific = FALSE)) +
    labs(
      title = "COVID-19: Total Reported Cases by State (Top 5)",
      x = "State",
      y = "Total Cases"
    ) +
    theme_minimal() +
    theme(legend.position = "none")
} else {
  message("No valid COVID-19 data available to plot.")
}

COVID-19 Case Rates in Latin America


# CRAN-safe: gracefully handle Internet or data retrieval failure
covid_data_safe <- tryCatch(
  get_covid_stats_by_country(),
  error = function(e) NULL
)

# Ensure covid_data_safe is always a valid data frame
if (is.null(covid_data_safe) || !is.data.frame(covid_data_safe) || nrow(covid_data_safe) == 0) {
  covid_data_safe <- data.frame(
    country = c("Argentina", "Brazil", "Chile", "Colombia", "Mexico"),
    cases = c(12000000, 36500000, 6000000, 7200000, 9800000),
    population = c(45000000, 214000000, 19000000, 51000000, 128000000)
  )
}

# Proceed only if the data frame exists and has the required columns
if (all(c("country", "cases", "population") %in% names(covid_data_safe))) {
  
  covid_latam <- covid_data_safe %>%
    filter(country %in% c(
      "Argentina", "Bolivia", "Brazil", "Chile", "Colombia",
      "Costa Rica", "Cuba", "Dominican Republic", "Ecuador",
      "El Salvador", "Guatemala", "Honduras", "Mexico"
    )) %>%
    mutate(case_rate = (cases / population) * 100000)
  
  # Plot only if valid numeric data exists
  if ("case_rate" %in% names(covid_latam) && any(!is.na(covid_latam$case_rate))) {
    ggplot(covid_latam, aes(x = reorder(country, -case_rate), y = case_rate, fill = country)) +
      geom_col() +
      scale_fill_manual(values = rainbow(n = nrow(covid_latam))) +
      labs(
        title = "COVID-19 Case Rates in Latin America",
        subtitle = "Cases per 100,000 population",
        x = NULL,
        y = "Cases per 100k"
      ) +
      theme_minimal() +
      theme(
        axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(face = "bold"),
        legend.position = "none"
      )
  } else {
    message("No valid COVID-19 case rate data available to plot.")
  }
  
} else {
  message("COVID-19 data unavailable; using fallback dataset.")
}

Dataset Suffixes

Each dataset in infectiousR is labeled with a suffix to indicate its type and structure:

  • _df: A standard data frame.

  • _tbl_df: A tibble, a modern version of a data frame with better formatting and functionality.

  • _ts: A time series.

Datasets Included in infectiousR

In addition to API functions, infectiousR includes several preloaded datasets that provide valuable insights into various aspects of infectious diseases such as influenza, measles, dengue, Ebola, tuberculosis, meningitis,AIDS, and others:

  • spanish_flu_df: Contains daily mortality records from the 1918 influenza pandemic.

  • fungal_infections_df: Provides clinical treatment outcomes for systemic fungal infections.

  • aids_azt_df: Documents AIDS symptom progression and zidovudine (AZT) treatment responses.

  • meningitis_df: Records meningococcal disease cases with treatment response metadata (includes missing data indicators).

Conclusion

The infectiousR package provides a robust toolkit for accessing and analyzing global infectious disease data through the disease.sh API and curated epidemiological datasets. From real-time COVID-19 statistics to historical records of bacterial, viral, and fungal infections (including tuberculosis, AIDS, meningitis, and the 1918 influenza pandemic), infectiousR empowers researchers to conduct comprehensive disease surveillance and trend analysis.