Integrated Cloud and High-Performance Computing Platform for Interactive Analysis of ARM Data

 

Authors

Jitendra Kumar — Oak Ridge National Laboratory
Zach Price — Oak Ridge National Laboratory
Anthony Clodfelter — Oak Ridge National Laboratory
Robert James Records — Oak Ridge National Laboratory
Giri Prakash — Oak Ridge National Laboratory

Category

ARM infrastructure

Description

Large scale scientific experiments and observation networks around the world have provided an unprecedented wealth of data providing insights in Earth and environmental processes at regional to global scale and over long time scales. For example, the archived data volume at the Department of Energy (DOE)'s Atmospheric Radiation Measurement (ARM) facility, collected from a range of sensors across distributed across the globe, recently reached the 1.4 petabyte mark and continues to grow. Analysis and scientific discovery from such large data sets requires computationally efficient algorithms and tools. High performance computing (HPC) has enabled new scientific discoveries through efficient and scalable analysis of these large scale datasets. ARM’s high performance computing ecosystem, available to atmospheric science community, provides a big data analytics platform with co-located ARM’s atmospheric science observations and computing and storage resources. Scientific discovery, however, is a creative, exploratory and iterative process and often requires interactive analysis and development. ARM’s Next-Generation computing platform also provides a JupyterLab based customizable environment for interactive development and analysis in a web-based user interface that supports Python and a range of other languages. We will share the capabilities of ARM’s Next-Generation computing platform and early science results from data-intensive machine learning studies using ARM data enabled by the new computational infrastructure.