ARM Computing Clusters: Extending High-Performance Computing Environments for Analysis and Delivery of ARM Data

 

Authors

Robert James Records — Oak Ridge National Laboratory
Giri Prakash — Oak Ridge National Laboratory
Jitendra Kumar — Oak Ridge National Laboratory
Anthony Clodfelter — Oak Ridge National Laboratory

Category

ARM infrastructure

Description

The Atmospheric Research Measurement (ARM) Climate Research Facility collects various atmospheric parameters using state-of-the-art ground and aerial instruments from four permanent sites and three mobile facilities. The data collected from these sites are transferred to the ARM Data Center (ADC) where they go through the routine processes of data ingest, data quality analysis, and generation of value added science data products. These data are then archived in a High-Performance Storage System (HPSS) where it is stored and distributed to the users via the Data Discovery interface. The ARM Data Archive, located in the ARM Data Center at Oak Ridge National Laboratory (ORNL), currently stores and distributes over 1.4 petabytes (PB) of data from about 11,000 products from instrument data streams, value added products (VAPs), and Principal Investigator (PI) contributed datasets. To support the high volume of archived and incoming data and to increase efficiency of operations, analysis, modeling and simulations, a High-Performance Computing (HPC) infrastructure was enabled to support these initiatives. The ADC computing cluster architecture integrates current ADC cyber infrastructure with two additional computing clusters. The ADC computing cluster provides a tiered access to computing and storage resources to the ARM scientific community. The first tier consists of a small-scale, 1,080 core cluster deployed and operated as part of the ORNL Compute and Data Environment for Science (CADES) facility. The second tier is a mid-scale, 12,096 core cluster deployed and operated by the Oak Ridge Leadership Facility (OLCF). The mid-scale cluster will be deployed in phases with 4,032 core block in the first phase. At the highest tier, the facility will connect to the computational resources at OLCF. Co-located within ORNL computing infrastructure, a fast network will enable communication and seamless data movement across all the computational and data storage resources. In this presentation, we will share the current status of the cluster(s), enhanced workflows and discuss future requirements with ARM facility and scientific community. We expect, with commmunity involvement, that the ARM clusters will provide a platform for large-scale simulations and data analytics.