Manipulate EpiTrax Data and Generate Reports • epitraxr

Overview

EpiTrax is a central repository for epidemiological data developed by Utah State’s Department of Health and Human Services (DHHS). It is now used by several other states. Through EpiTrax, public health officials have access to many different types of disease surveillance data, which they use to produce regular (e.g., weekly, monthly, annual) reports on their respective jurisdictions. This can be a tedious, time-intensive process, often involving multiple spreadsheets.

The epitraxr package makes it fast and easy to process EpiTrax data and generate reports. With epitraxr it is simple to setup a report pipeline, then you simply hit “run” and select your latest EpiTrax export—epitraxr will do the rest!

Installation

You can install the stable version of epitraxr from CRAN with:

install.packages("epitraxr")

To get a bug fix or to use a feature from the development version, you can install the development version of epitraxr from GitHub with:

# install.packages("devtools")
devtools::install_github("EpiForeSITE/epitraxr")

For local health departments, we recommend going through the steps under Installation For Local Health Departments (below). Our experience is that LHD-managed machines need additional tools and packages installed first. This list will walk you through those steps.

Installing Package Dependencies

Many features of epitraxr will work with the standard installation. However, there are several optional features, such as PDF report generation and the R Shiny app, that require you to install additional dependencies. You can do that by adding the dependencies = TRUE flag when installing the package:

# Stable version from CRAN
install.packages("epitraxr", dependencies = TRUE)

# Development version from GitHub
# install.packages("devtools")
devtools::install_github("EpiForeSITE/epitraxr", dependencies = TRUE)

For PDF reports, you may need to install a LaTeX distribution as well. We recommend using TinyTeX as described in the R Markdown Cookbook. First, install the tinytex package:

install.packages("tinytex")

Then, install TinyTeX (the LaTeX distribution) using the tinytex package:

tinytex::install_tinytex()
# to uninstall TinyTeX, run
# tinytex::uninstall_tinytex()

After that, you can use epitraxr to generate PDF reports.

Installation For Local Health Departments

In our experience, machines managed by local health departments (LHDs) are less likely to have installed all necessary packages and tools. For these users, we recommend installing using the following steps.

Install the latest version of R: https://cran.rstudio.com
Install RStudio: https://posit.co/download/rstudio-desktop/
Install devtools

install.packages("devtools")

Install the tinytex package

install.packages("tinytex")

Install TinyTeX (the LaTeX distribution) using the tinytex package

tinytex::install_tinytex()

Install epitraxr

# Stable version from CRAN
install.packages("epitraxr", dependencies = TRUE)

Installing the Development Version of epitraxr

We recommend installing the stable CRAN version, but you can install the development version from GitHub instead. If you’re on Windows, you’ll need to first install the version of Rtools corresponding to the version of R you installed. This is needed to build and install certain types of packages. Then you install the development version using:

# Development version from GitHub
devtools::install_github("EpiForeSITE/epitraxr", dependencies = TRUE)

Note: If you previously attempted to install epitraxr from GitHub and ran into an error (such as one solved by the above steps), you will likely need to install the package again with the option force = TRUE to ensure the full package is downloaded and installed properly:

# Development version from GitHub
devtools::install_github("EpiForeSITE/epitraxr", dependencies = TRUE, force = TRUE)

Usage

Inputs

Input Data

epitraxr expects input data in CSV format (not Excel) that contains EpiTrax data exported with the following columns:

patient_mmwr_week (integer)
patient_mmwr_year (integer)
patient_disease (character)

For example:

"patient_mmwr_week","patient_mmwr_year","patient_disease"
"26","2020","Chlamydia trachomatis infection"

Your dataset can include other columns, but epitraxr will ignore them.

Also, rows with missing or NA values in the input dataset will be ignored and a warning will be printed to the Console. The warning is intended to inform the user that their input had missing/NA values, in case the user wants to correct that (and re-run the report generation). It does not mean the program was unable to generate reports from the rest of the data.

Report Settings

To configure your reports, epitraxr accepts three OPTIONAL files:

Internal Disease List: CSV file listing the diseases to include in internal reports as they are represented in EpiTrax.
- This file is named internal_report_diseases.csv in our example scripts.
- This file can have up to two columns:
  - EpiTrax_name: A list of all diseases to include in the report AS THEY ARE NAMED IN EPITRAX. This column is required.
  - Group_name: Containing the disease group (e.g., “Zoonotic Disease”) for each disease in the EpiTrax_name column. This column is optional and is only used by the grouped disease report functions (create_report_grouped_stats() and epitrax_report_grouped_stats()). If omitted, most epitraxr functions will operate normally, but the grouped disease report functions will set all diseases to the single group “Uncategorized”.
- If the Internal Disease List is not provided, epitraxr will default to using whatever diseases are found in the input dataset.
Public Disease List: CSV file listing the diseases to include in public reports as they are represented in EpiTrax.
- This file is named public_report_diseases.csv in our example scripts.
- This file should have two columns:
  - EpiTrax_name: A list of all diseases to include in the report AS THEY ARE NAMED IN EPITRAX.
  - Public_name: How each disease should be named in the public report. For example, converting “Chlamydia trachomatis infection” to “Chlamydia”.
    - NOTE: If multiple diseases have the same Public_name their report entries will be combined. For example, to combine “Syphilis, primary” and “Syphilis, secondary” into “Syphilis”, simply set the Public_name of both diseases to “Syphilis”.
- If the Public Disease List is not provided, the program will default to using whatever diseases are found in the input dataset.
Report Config: YAML file providing additional configuration information:
- current_population: Population for converting case counts for the current year to Rates per 100k. Defaults to 100k.
- avg_5yr_population: Population to use for converting case counts for the 5yr historical average to Rates per 100k. Defaults to current_population.
- rounding_decimals: How many digits to round decimal values. Defaults to 2.
- generate_csvs: Whether to generate CSVs. When false, only Excel files will be generated. Defaults to TRUE.
- trend_threshold: Threshold for determining the “Trend” column in reports that compute this statistic. A percent change above or below the trend_threshold will result in a change in the trend. Defaults to 0.15 (15%).
- The Report Config file is named report_config.yaml in our example scripts.

Again, these files are optional, but significantly improve the quality of your reports.

Note: Disease lists (both internal and public) might include diseases for which there are no case data in the input dataset. In this case, their values will simply be 0s in the generated reports.

Filesystem

epitraxr can output reports as CSVs, Excel files, or PDFs. To write reports to outputs, you need to provide the output functions with folders where epitraxr will write the outputs. Generally, we’ll use three folders:

report_settings/: Holds settings files for report generation.
internal_reports/: Holds reports intended for internal use by the health department.
public_reports/: Holds reports intended for public use.

You can create these manually or you can use the setup_filesystem() function to create the directories. The example scripts in the scripts/ folder all start by setting up the filesystem.

WARNING: Do not place any of your own files in the internal_reports and public_reports folders you give epitraxr as typically EVERYTHING in these folders is DELETED each time the script runs (depending on how you configure setup_filesystem()). This is intended to remove reports from previous runs so they don’t clutter your filesystem. If you want to save old reports, move them out of the internal_reports/public_reports folders before running your epitraxr script again.

Executing `epitraxr` Code

epitraxr can be used in either “standard” mode or “piped” mode (recommended).

Standard Mode

library(epitraxr)

data_file <- system.file(
  "sample_data/sample_epitrax_data.csv",
  package = "epitraxr"
)
epitrax_data <- read_epitrax_data(data_file)

report <- create_report_annual_counts(
  data = epitrax_data,
  diseases = c("Chickenpox", "Measles", "Lyme disease")
)

report
#>        disease 2019 2020 2021 2022 2023 2024
#> 1   Chickenpox  218  318  263  234  249  292
#> 2 Lyme disease    0    0    0    0    0    0
#> 3      Measles  211  326  292  414  586  304

Piped Mode (recommended)

library(epitraxr)

config_file <- system.file(
  "sample_data/sample_config.yml",
  package = "epitraxr"
)

disease_list_file <- system.file(
  "sample_data/sample_disease_list.csv",
  package = "epitraxr"
)

epitrax <- create_epitrax_from_file(data_file) |>
  epitrax_set_config_from_file(config_file) |>
  epitrax_set_report_diseases(list(
    internal = disease_list_file,
    public = disease_list_file
  )) |>
  epitrax_ireport_annual_counts()

epitrax$internal_reports$annual_counts
#>        disease 2019 2020 2021 2022 2023 2024
#> 1   Chickenpox  218  318  263  234  249  292
#> 2 Lyme disease    0    0    0    0    0    0
#> 3      Measles  211  326  292  414  586  304

Piped mode makes it super simple to add additional reports.

epitrax <- create_epitrax_from_file(data_file) |>
  epitrax_set_config_from_file(config_file) |>
  epitrax_set_report_diseases(list(
    internal = disease_list_file,
    public = disease_list_file
  )) |>
  epitrax_ireport_annual_counts() |>
  epitrax_ireport_monthly_avgs() |>
  epitrax_ireport_ytd_counts_for_month()

list(epitrax$internal_reports)
#> [[1]]
#> [[1]]$annual_counts
#>        disease 2019 2020 2021 2022 2023 2024
#> 1   Chickenpox  218  318  263  234  249  292
#> 2 Lyme disease    0    0    0    0    0    0
#> 3      Measles  211  326  292  414  586  304
#> 
#> [[1]]$`monthly_avgs_2019-2024`
#>        disease  Jan    Feb    Mar    Apr    May    Jun    Jul  Aug    Sep
#> 1   Chickenpox 23.5 22.333 21.000 26.667 19.833 19.167 24.167 20.0 18.833
#> 2 Lyme disease  0.0  0.000  0.000  0.000  0.000  0.000  0.000  0.0  0.000
#> 3      Measles 27.0 31.500 24.667 34.833 25.667 24.667 37.333 27.5 28.167
#>      Oct    Nov    Dec
#> 1 25.333 19.000 22.500
#> 2  0.000  0.000  0.000
#> 3 36.333 29.167 28.667
#> 
#> [[1]]$ytd_counts
#>        disease Current_YTD_Counts Avg_5yr_YTD_Counts
#> 1   Chickenpox                292              256.4
#> 2 Lyme disease                  0                0.0
#> 3      Measles                304              365.8

Shiny App

The epitraxr package includes a Shiny app, which provides a user-friendly interface for running report generation operations with epitraxr.

To run the Shiny app, you’ll need to install the optional dependencies, such as shiny and DT (if you didn’t already do this during initial package installation).

devtools::install_github("EpiForeSITE/epitraxr", dependencies = TRUE)

Next, load the package:

library(epitraxr)

Finally, use run_app() to launch the app:

run_app()

Scripts

We’ve developed a couple of handy scripts that are included in the GitHub repo under the scripts/ folder:

epitraxr-piped.R: Showcases report generation using epitraxr’s Piped Mode (recommended)
epitraxr-standard.R: Showcases report generation using the epitraxr’s Standard Mode

These scripts provide you with a full report generation process that should run successfully if you have the package installed with all additional dependencies (see above). They provide identical output to the original project (currently inactive), though epitraxr-piped.R additionally provides a few new reports and generates PDFs.

Troubleshooting: Common Usage Problems

If epitraxr unexpectedly fails to generate reports after previously succeeding, check that none of the generated reports are open on your machine. Often, an epitraxr script will start by deleting existing reports from folders before generating new ones. If the files are opened, epitraxr cannot delete them. Close any open files and try running the script again.

Getting Help

If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub.

Code of Conduct

Please note that the epitraxr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Acknowledgments

This project was made possible by cooperative agreement CDC-RFA-FT-23-0069 from the CDC’s Center for Forecasting and Outbreak Analytics. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention.

epitraxr