Large-Scale Data Analysis and Visualization Using NoSQL Technologies for LASSO, Radar Data, and Beyond

 
Poster PDF

Authors

Kyle K Dumas (Quicklooks) — Oak Ridge National Laboratory
Kyle K Dumas — Oak Ridge National Laboratory
William I. Gustafson — Pacific Northwest National Laboratory
Andrew M. Vogelmann — Brookhaven National Laboratory
Tami Fairless — Pacific Northwest National Laboratory
Giri Prakash — Oak Ridge National Laboratory

Category

ARM next generation – Megasite and LES activities

Description

This paper presents a new way of providing ARM data discovery through data analysis and visualization services using NoSQL technology. Two technologies that are currently being implemented are Apache Cassandra (noSQL database) and Apache Spark (noSQL based analytics framework). Both technologies were developed to work in a distributed environment and hence can handle large data for storing and analytics. D3.js is a JavaScript library that can generate interactive data visualizations in web browsers by making use of commonly used SVG, HTML5, and CSS standards. Recently a NoSQL architecture was build and has been applied to the LES ARM Symbiotic Simulation and Observation Workflow (LASSO). LASSO packages LES output and observations in “data bundles” and analyses required the ability for users to analyze both observations and LES model output either individually or together across multiple time periods. The LASSO implementation strategy suggests that enormous data storage is required to store the above-mentioned quantities. Thus, noSQL was used to provide a powerful means to store portions of the data that provided users with search capabilities on each simulation’s traits through a web application. Based on the user selection, plots are created dynamically along with ancillary information that enables users to locate and download data that fulfilled their required traits.  The NoSQL architecture and the visualization solutions were scaled to support radar data as well to plot profile and time series plots. The other application that is being implemented is using NoSQL technology is advanced data search. The current search for ARM data is performed by using its metadata, such as the site name, instrument name, date, etc. The capabilities of data searching can be improved, using the measurement values to look them up. To test the performance of NoSQL for ARM data, we will be using ARM’s popular measurements to locate the data based on their value.