Breakout Summary Report

 

ARM/ASR User and PI Meeting

10 - 13 June 2019

Applications of Machine Learning to ARM/ASR Science
10 June 2019
4:00 PM - 6:00 PM
70
Joseph Hardin, Jennifer Comstock, Shaocheng Xie, Ed Luke

Breakout Description

Machine learning (ML) has seen an explosion of interest in many fields including atmospheric science, providing a tool capable of solving previously intractable problems. ARM has recently funded several machine learning-based projects, and researchers throughout ASR have started applying machine learning methods to their research.



This session has the following discussion points:



  • Updates from ARM/ASR funded machine learning projects

  • Presentation of novel applications of machine learning to ARM data

  • Discussion of ML techniques for learning from modeling and observational data sets to improve
    model parameterizations.

  • Addressing ways in which ARM and ASR can improve their accessibility to ML solutions.

  • Discussion of techniques from the field that could help ARM/ASR address uncertainty
    quantification.

  • Are there any immediate needs the ARM/ASR community has that could benefit from machine
    learning?

  • Is there anything we as a community could better do to reach out to the broader ML community?

Main Discussion

The breakout was well attended with 60+ in attendance (standing room only). The session started with an introduction to machine learning taxonomies and resources for machine learning around ARM including the clusters Stratus and Cumulus. Then a series of six talks discussed applications of machine learning to both ARM data, and data from the wider community. These talks covered a wide range of applications including modeling, data quality, and retrievals. The session concluded with a discussion about the role of machine learning in ARM and the atmospheric sciences. The mood was optimistic about the role machine learning could play, with some caution suggested on blind application of ML methods to problems. It was agreed a similar session should be held next year and that a more organized presence for machine learning is needed to disseminate resources and connect practitioners.

Key Findings

The talks discussed a wide variety of topics roughly divided into 1) Applications to models, 2) Data quality and cleanup, and 3) Retrievals. Rao Kotomarthi presented results of using a convolutional neural network to reproduce planetary boundary layer (PBL) parameters through the column using measurements from the surface in the model. The goal was a faster physics emulator. He showed how different topologies of models using different interconnection strategies effected the performance, and that the network showed skill in reproducing the column from the model. Next Yangang Liu presented on a ML model for cloud microphysics parameterization to emulate a cloud parcel model.



Moving into the data quality portion, Shuaiqi Tang showed how an SVM based with a non-linear kernel transformation approach could be used to flag data-quality issues with the microwave radiometer caused by rain contamination on the radome. The method resulted in less contamination time and better identification of contaminated periods as compared to the currently implemented method that relies primarily on SSI index from brightness temperatures. Following this, Ed Luke presented work on using a convolutional neural network (CNN) to remove sea clutter from the XSAPR2 at ENA. Using a mask generated from the radar correlation coefficient as an initial supervisor, they are training on spatiotemporal input data from 87,000 radar scans to automate the removal of sea clutter. This was work in progress, but Ed showed data elucidating their concept.



The next three talks focused on retrievals. Vanessa Pryzybylo showed results using a CNN model to categorize ice crystal images from the CPI by species. They discussed how the model was able to accurately classify aggregates, columns, and junk ice, and showed opportunities for improvement on bullets; the ultimate goal being to compare the ML retrievals with results from an ice simulation model. Next Yuping Lu showed results from a deep learning model for hydrometeor identification trained against WSR-88D-reported HID types trained to reproduce 4 HID classes. Finally, the presentations wrapped up with an overview by V. Chandrasekar covering how ML has been used over the last 20 years in retrievals throughout radar including QPE, hydrometeor classification, and data quality.



We then discussed opportunities and challenges for machine learning in ARM/ASR and the wider atmospheric science community.

Needs

The needs of the community was a major point of discussion during this breakout. Upon querying the attendees “Who currently uses machine learning and who would like to start using it?” there was a 50/50 split in attendees. A common refrain from attendees was wanting to get into machine learning, but not having the expertise. It was requested that there should be an easier way to find machine learning experts within ARM and in their home institutions and whether there could be a better way to connect those interested in machine learning, and those that needed it for their problem domains. There was some brief discussion on whether there were better ways to reach out to the ML community to bring experts into the community and how to translate atmospheric problems into something tractable by the ML community.



Attendees pointed out that most newer ML methods seem designed for images/speech, and that they often struggled to see a clear way to apply it to time series data. Overall the general lack of labeled data hindered many researchers and projects. There was interest in whether ML could help with uncertainty quantification.



A common refrain was the need not to use ML as a “black-box” and how do we make machine learning interpretable and support science, rather than subsume it? There was discussion as to whether there is a way to embed some physical representations into the ML models. The ML models need to support process-level understanding, but keep in mind physical constraints.


Future Plans

We decided we will set up a mailing list to coordinate discussion among the various researchers using machine learning in ARM. This has now been set up at ml@arm.gov and participants from last year and this year will be added to the mailing list. We will additionally propose this as a breakout in next year's meeting.

Action Items

1) Establish a mailing list (this has been completed now, ml@arm.gov).
2) Propose breakout session at next year’s meeting.
3) Communicate ML community needs to ADC and work with them to improve the onboarding process for ML on stratus and cumulus.
4) Develop strategy and/or suggestions on how machine learning can be leveraged for analysis of ARM data and process understanding under ASR.