Breakout Summary Report

 

ARM/ASR User and PI Meeting

10 - 13 June 2019

Using ARM compute resources to do CESD science at scale
10 June 2019
9:00 AM - 11:30 AM
20
Jitendra (Jitu) Kumar, Scott Collis, Bobby Jackson, Anthony Clodfelter

Breakout Description

ARM collects many large data sets that are difficult to download and require substantial computing infrastructure for analysis. To alleviate this, ARM has substantial heterogeneous computing resources available to users and stakeholders. In addition, the ARM Data Center has been investing in implementing high-quality open-source data science tools such as Jupyter Hub and Dask. This tutorial, which will cater to a spectrum of skill levels, has the aim of equipping attendees with a set of techniques to distribute analysis tasks to many computational cores. The course will be taught using the Python programming language and will draw from a range of data available in the ARM Data Center. ADC-developed tools for automation of ordering and staging will also be covered with a focus on server-side analysis.

Main Discussion

The workshop focused on introducing the ARM/ASR community to the computational facilities and resources offered by the ARM facility. Attendees were introduced (presenter: Jitu Kumar) to various high-performance computing resources available within ARM and to design and specifications to enable big data analysis using ARM observations. Data access and movement is one of the core aspects of ARM data science, and the participants were introduced to custom data access and download tools available within the facility, and how to integrate them within automated data analysis workflows (presenter: Jitu Kumar). Workshop participants ranged from novice to seasoned high-performance computing users, so a Python refresher tutorial was presented to bring everyone up to speed (presenter: Bobby Jackson). They were also introduced to concepts of parallel programming using Dask + Python (presenter: Scott Collis). The workshop was designed to be interactive and all participants were able to follow along with a working example using a sample ARM data set. Computational concepts were demonstrated using a number of simple-to-complicated examples using an ARM data set that participants were able to write and execute (presenters: Jitu Kumar and Scott Collis). An interactive session was conducted using ARM’s new cloud-based JupyterLab service and from the comfort of a browser on their personal laptops, users were able to write their parallel codes and execute their jobs on ARM’s Stratus cluster.

Key Findings

Participants were able to learn basic programming concepts to develop parallel workflows for analysis of ARM observations, automate data access, and get started using ARM HPC clusters. They were also able to get help from workshop organizers and ARM personnel with any questions/issues they had.
All workshop materials are publicly available at: https://github.com/ARM-DOE/armhpc-tutorial

Issues

Some participants had not registered for an ARM computing account ahead of time, but they were able to sign up and, with Anthony Clodfelter from ADC present, had their accounts set up quickly to participate in interactive programming.

Future Plans

We plan to continue to develop the notebook collection to include more examples. The JupyterLab setup should be a good platform for future workshops.