Improving low-cloud fraction prediction through machine learning

Submitter

Li, Zhanqing — University of Maryland
Zhang, Haipeng — University of Maryland, College Park

Area of research

Cloud Processes

Journal Reference

Zhang H, Y Zheng, and Z Li. 2024. "Improving Low‐Cloud Fraction Prediction Through Machine Learning." Geophysical Research Letters, 51(15), e2024GL109735, 10.1029/2024GL109735.

Science

Low clouds impose a strong radiative cooling effect on Earth’s climate. Predicting low-cloud fraction (LCF) is, however, challenging in global climate models (GCMs), partly due to deficiencies in cloud parameterization schemes. There are two longstanding issues in GCMs: (a) predicting too few low clouds and (b) simulating a too rapid stratocumulus-to-cumulus transition (SCT). Machine learning (ML) models might fill these gaps because they are an efficient, economical, and accurate method of making predictions.

Impact

To improve LCF prediction, ML models (XGBoost) are built upon major large-scale meteorological factors by circumventing the representation of unresolved small-scale processes (e.g., boundary-layer turbulence and entrainment). LCFs from two generations of the community atmospheric model (CAM5 and CAM6) and ERA5 reanalysis data are evaluated against the satellite LCF product of the CERES SYN Ed4. The ML models exhibit superior performance in predicting LCF in terms of both overall error statistics (bias, RMSE, and correlation coefficients) and the spatial patterns across the full spectrums of atmospheric stability and large-scale vertical velocity. The improvements alleviate the common problem of “too few” low clouds in GCMs. Furthermore, marked improvements are also demonstrated in representing the transition from stratocumulus to cumulus clouds, as opposed to too rapid decreases in LCF simulated by two CAMs and ERA5.

Summary

This study exploits the credentials of ML models (specifically XGBoost) in predicting LCF with respect to several traditional cloud schemes employed in the CAM5, CAM6, and ERA5 reanalysis data. To mitigate the impact of simulation errors related to large-scale meteorological factors, we nudged the wind speeds, temperature, and moisture toward ERA5 in CAM5 and CAM6. Through an interpretable ML approach (SHAP), we find that including the effect of moisture source in ML models is crucial to representing spatiotemporal variations in LCF in the midlatitudes, noting that this key variable has not been used in the popular cloud fraction parametrizations. This study indicates that ML can significantly improve the simulation of low clouds, promising to address longstanding issues of too few low clouds and too rapid SCT in GCMs. Despite the promising results, further studies are needed to refine ML models before they can be employed in climate models.

Comparison of global LCF between observations (CERES SYN Ed4 product) and different models. (a) Climatological mean LCF for 2004-2005 from observations. Other panels show LCF differences between a model and observations: (b) CAM6 with COSP enabled, (c) CAM5 with COSP enabled, (d) ERA5, (e) XGB7, and (f) XGB10. The black boxes of 10^o x 10^o mark the eight selected regions where stratocumulus decks are prevalent: North Atlantic (NA), North Pacific (NP), Northeast Pacific (NEP), Northeast Atlantic (NEA), Southeast Pacific (SEP), Southeast Atlantic (SEA), Southeast Indian Ocean (SEI), and Southern Ocean (SO), identified by Klein & Hartmann (1993). Image from journal.