Machine-Readable Data Quality Reports

 
Poster PDF

Authors

Sean T. Moore — Orbital ATK Inc.
Giri Prakash — Oak Ridge National Laboratory
Kenneth Kehoe — ARM Data Quality Office - University of Oklahoma - CIWRO
Biva Shrestha — Arm Data Archive

Category

ARM Infrastructure

Description

Data issues discovered and confirmed by ARM as problems after data ingest are stored in Data Quality Reports (DQRs). These reports are delivered as text files to users along with data files ordered from the ARM Archive. These text-based reports are not easily used by automated processing codes, and so have been underutilized or ignored in the past. The ARM program now has several ways for users to easily exclude data affected by known problems without manually interpreting the reports or tediously hand editing their data holdings. During the ordering process, users are given the option of asking the Archive to automatically remove data records affected by certain types of DQR. A custom dataset will be generated with affected data marked as 'missing'. If other problems are discovered after a user has downloaded data , they will receive a notification indicating that this is so. At this point, they can either re-submit the data order to remove the newly affected records, or they can use a web service from within their processing code to query the DQR database directly. Details on using the web service and proposed future enhancements to the automated DQR process will be presented.