Director: David Reif
The Data Management and Analysis Core (DMAC) is designed to integrate results from all data streams into a true synthesis that advances environmental and public health related to per- and polyfluoroalkyl substances (PFAS).
DMAC will manage the integration of data streams generated by the Projects and Cores of the NC State University Superfund Research Center. The DMAC aims lay out a data-centric approach to synthesize this multi-scale integration into actionable discoveries that advance public health related to PFAS. The translational impacts of DMAC will include new methods for tackling the interface between environmental chemical sampling and resultant public health consequences, distributed software to make these methods available to Superfund Research Program (SRP) teams addressing similar questions in other compound families, and promotion of computational fluency in the next generation of Environmental Health Scientists (EHS).
The goals of DMAC are to coordinate development of project/core Data Management Plans (DMP), implement a software pipeline that standardizes the transfer of data between projects and the Analytical Core, facilitate interactive statistical and bioinformatics analysis that enhance rigor and reproducibility, ensure Center-wide data access and interoperability, visualize results to foster communication across projects and cores, and advance training of the next generation of Data Scientists.
To achieve these goals, DMAC has five specific aims. The DMAC Specific Aims support the goals of the NC State Center by: (1) assuring that all Center components manage and analyze data according to best practices of Data Science; (2) facilitating interactive statistical and bioinformatics analyses that enhance rigor and reproducibility; (3) facilitating Center-wide data access and interoperability; (4) visualizing results to foster translation of the science of the Center to key stakeholders, the nationwide network of Superfund centers, and vulnerable communities; and (5) providing quantitative interdisciplinary training to the next generation of EHS researchers. The DMAC is configured as the “one-stop shop” for informatics needs of SRP Investigators.
Specific Aim 1: Embed principles of Data Science into every aspect of the Center through coordinated development of project/core DMPs within a formal Comprehensive DMP framework.
Coordinating key aspects of experimental design, analysis, and data structures into formal project/core DMPs prior to data collection will allow creation of a Comprehensive DMP that unifies the Center towards a data-ready computational infrastructure that includes solid Quality Analysis and Quality Control (QA/QC).
Specific Aim 2: Implement a software pipeline that standardizes the transfer of data between projects and the Analytical Core.
The pipeline will enable seamless data sharing (and prevent data leakage) between data-generating elements of the Center. Key DMAC Investigators will leverage their transdisciplinary Chem- and Bioinformatic collaboration to develop standardized data structures that can be used by modular software to automate data delivery.
Specific Aim 3: Create data management resources that facilitate Center-wide data interactivity.
User friendly browser interfaces will be developed to summarize data contents and serve digestible results to Center members. This ability to interact with data and results in a useful manner is key to achieving Findable, Accessible, Interoperable, Reusable (FAIR) principles.
Specific Aim 4: Visualize results in a manner that fosters communication across projects and cores.
Effective visualization and data graphics are proven means of interdisciplinary communication, and the DMAC will work with Center investigators to implement visualization methods or develop novel visual analytics as appropriate.
Specific Aim 5: Advance training of the next generation of Data Scientists.
The creation of standardized data structures, shareable software pipelines, visualization, and browsers will invite trainees from all Center components to be intimately involved with data management and analysis. The DMAC will promote fluency in usage of these computational tools through nanocourses with the Research Experience and Training Coordination Core (RETCC), distributed documentation and learning resources, and development of new courses focused on computational methods in EHS.
The DMAC structure innovates by leveraging institutional strengths and established collaborations to forge a comprehensive synthesis from all Project/Core data streams. DMAC investigators have already established successful collaborations (i.e., publications and funding) with Center Project PIs and Core leaders that integrate data streams from environmental sampling of perfluorinated compounds, biological assays, cheminformatic models, and database resources. DMAC will continue to strengthen their interdisciplinary Bioinformatic/Cheminformatics collaboration and leverage state-of-
the-art Data Science resources at NC State University to link the engineering, environmental, analytical chemistry, and biomedical aspects of the Center and foster innovation in the study of PFAS.
The DMAC has begun building the infrastructure for coordination to foster data sharing and interoperability across the Center. In response to the laboratory disruptions brought on by COVID-19, the DMAC pivoted effort from analysis of newly-generated data to the creation of modular data structures and software tools. We presented our prototype DMAC data integrator at the “SRP Progress in Research Webinar Part 4: Emerging Exposures” seminar in November. This data integrator is an interactive web application (web app) built to foster integration of Center projects and cores. It is intended as a front-end for Center users so that they can recombine data streams to create integrative analyses without having to write code.