Enhancing reproducibility, replicability, and extensibility in epidemiology

Epiconductor


Supporting and Enhancing the Reproducibility, Replicability, and Extensibility of Epidemiology Research


In the field of Epidemiology, large and complex datasets present challenges to potential user’s access and analysis of the data. In addition, better mechanisms are needed for researchers to publish shareable scripts that would reproduce their analyses.

Epiconductor aims to address these challenges by establishing a community that will:

  1. Package and share code to standardize data and data access
  2. Become a resource for:
    • Sharing and accessing software packages
    • Training materials,
    • Connecting with fellow researchers to analyze these data resources.

Our first effort is to containerize the NHANES data resource.  We welcome comments and contributions to this effort.


Building on our efforts to enhance access to the National Health and Nutrition Examination Survey (nhanesA, Ale et.al), our team developed a containerized software package via Docker to address the common hurdles in handling epidemiological data.

The NHANES survey, intended to characterize the nutritional and health status of the non-institutionalized United States population, has been conducted every two years since 1999. Sampling approximately 10k people for every study period, the survey has cumulatively sampled 134k+ participants and contains approximately 4.7k variables. These variables are broken into five categories, including Demographic Information, Dietary Consumption, Physical Examination Results, Laboratory Results, and Questionnaire Results.

The challenges of using NHANES data are common challenges to large epidemiological datasets:

  • Data is stored in hundreds of separate files
  • Inconsistencies:
    • Nomenclature (e.g., same variable changes name midway through dataset)
    • Units Change (i.e., requires accounting for possible changes in magnitude)
    • Category Label Changes (i.e., could lead to misclassification)
  • Replicates
  • Oversampling (i.e., NHANES uses a sampling strategy that oversamples underrepresented groups)

To streamline the process from retrieval to analysis, we developed a Docker Container that can run on virtually any operating system including most laptop computers. Our aim is for researchers, not familiar with database technologies, to readily be able to utilize the software for their analysis.

You can access these resources and additional information on how to utilize them via our Software page. Please provide feedback and communicate with us via our Contact Form


Access packages and tools that support the reproducible analysis of epidemiological data.

Vignettes and other materials that guide users through implementing packages for their data analysis.

Submit vignettes, software packages, and feedback to support reproducible epidemiological research.


We are committed to advancing the field of Epidemiology by facilitating a collaborative, open, and efficient research environment. Join us in shaping the future of public health research.


References:

Nguyen VK, Middleton LYM, Huang L, Zhao N, Verly E Jr, Kvasnicka J, Sagers L, Patel CJ, Colacino J, Jolliet O. Harmonized US National Health and Nutrition Examination Survey 1988-2018 for high throughput exposome-health discovery. medRxiv [Preprint]. 2023 Feb 8:2023.02.06.23284573. doi: 10.1101/2023.02.06.23284573. PMID: 36798185; PMCID: PMC9934713.