ARM Reprocessing Toolkit: Towards Efficient, and Timely Delivery of Quality Controlled ARM Data

 
Poster PDF

Authors

Jitendra Kumar — Oak Ridge National Laboratory
Michael Giansiracusa — ORNL CCSI ARM Data Center
Alka Singh — Oak Ridge National Laboratory
Kyle K Dumas (Quicklooks) — Oak Ridge National Laboratory
James William Tonkin — Oak Ridge National Laboratory
Kavya Guntupally — Oak Ridge National Laboratory

Category

ARM infrastructure

Description

The Atmospheric Radiation Measurement (ARM) Climate Research Facility collects observations from hundreds of instruments across a number of fixed and mobile atmospheric observatories across the globe to study the effects and interactions of clouds and aerosols and their impact on the earth's energy balance. To ensure accuracy and characterize uncertainties in measurements, ARM conducts a comprehensive data quality assessment on all collected data before making it publicly available to the scientific community. Reprocessing is a critical component of ARM's data infrastructure to fix the data issues identified by quality assessments. A reprocessing task may entail correction to a partial or complete time series of a single data stream, or end--to--end reprocessing of an entire instrument's data. Reprocessing helps produce consistent and greater quality data to improve its usability. However, reprocessing can be complicated and time consuming. We have developed a new computationally efficient Reprocessing Toolkit architecture to automate the reprocessing workflow, improve accuracy, and reduce time for task completion. An improved web--based interface for Data Quality Report (DQR) submissions would allow mentors/PIs to provide symbolic equations for data corrections. Using information contained within the DQR, the data files needed for a task are identified and retrieved from ARM disk or tape archives using Globus protocols over a dedicated high speed network. Our symbolic equation processor applies the correction to all the data files to generate corrected data files. Once corrected, the data undergoes reviews for accuracy and consistency before being versioned and archived. The workflow utilizes ARM computing resources for reprocessing of large volumes of data in parallel. Once data is corrected, the DQR report is updated with a summary of corrections performed and all the users of that datastream are notified of the updated dataset. In this presentation, we will share the new automation functionalities developed within the Reprocessing Toolkit and engage with ARM facility and science community for feedbacks. We expect that the workflow implemented within the Reprocessing Toolkit would allow for timely delivery of corrected high quality ARM data set to scientific community.