Big Data Systems at the ARM Archive II: data processing opportunities

 
Poster PDF

Authors

James H. Mather — Pacific Northwest National Laboratory
Raymond A. McCord — retired
Matt Macduff — Pacific Northwest National Laboratory
Giri Prakash — Oak Ridge National Laboratory
Jay F. Manneschmidt — Oak Ridge National Laboratory
Pete Eby — Oak Ridge National Laboratory

Category

Infrastructure & Outreach

Description

A new ARM infrastructure resource provided by the ARM Archive includes a new system that will be dedicated to data processing tasks based on the very large data volumes from new ARM instrumentation (e.g., scanning radars and lidars). The "Big Data System" (BDS) data processing tasks will focus on generating secondary data products to be archived and accessible to the ARM user community. These tasks will be proposed to ARM as a "virtual field campaign" preproposal that identifies the user's resource requirements and objectives for each task. BDS users may include members of the ARM infrastructure and the ASR research community. Roles for the users include specifying the required input data, describing the expected output data, providing the data processing software, monitoring processing progress, and reviewing the results. Data processing software can be developed using the ARM Integrated Software Development Environment (ISDE) tools on a development system at the Archive or other user-obtainable computing resources. Roles for the Archive staff include supervising the data flow with the Mass Storage System (MSS), acquiring approval for user access to the system, and assisting with installation of data processing software. This system is designed to support user-provided software and processing tasks requiring ~10–30 TB of input and output data. Adjacent installation of this system with the Archive MSS will optimize data transfers for these tasks. The poster will provide more details about the hardware and software configurations, example processing tasks, operational procedures, and requests for additional user input about the requirements for this system.