Enabling Big Data Analysis Platform for Popular ARM Data Streams

 

Authors

Kyle K Dumas (Quicklooks) — Oak Ridge National Laboratory
Jitendra Kumar — Oak Ridge National Laboratory
Giri Prakash — Oak Ridge National Laboratory

Category

ARM infrastructure

Description

The aim of this poster is to provide a modular analytics infrastructure for exploring ARM data. This would mean work with scientific users to build end-to-end solutions of conceptual, physical, and analytical storage and delivery of data using current technologies in NoSQL databases, processing framework for data analysis and visualization tools. The high level requirements were gathered from various stakeholders during the past ARM and ASR meetings. These include: scientific users wanting to perform advanced data search capabilities using measurement values, the translators intended to explore the option of using visualization to summarize and study the data for assessing the variables without downloading large amount of data, and long-term DQ analysis using historical datasets by the DQ office using NoSQL platform and machine learning algorithms. Hence this work will consist of prototypes based on requirements gathered from the stakeholders along with improvements to already existing in-production application, the LASSO Bundle Browser. The author will present working prototypes on advanced data querying applications using few popular datastreams and expanding the outlier identification capability using valid minimum and maximum defined for a variable in netCDF. This will provide the means for interacting with the stakeholders and to capture their feedbacks.