Residential Area Energy Consumption Big Data Analytics and Visualization

Gupta, Ragini

Publication

Residential Area Energy Consumption Big Data Analytics and Visualization

Gupta, Ragini

Date

2018-06

Authors

Gupta, Ragini

Advisor

Al-Ali, Abdulrahman
Zualkernan, Imran

Type

Thesis

Files

35.232-2018.21 Ragini Gupta.pdf

Adobe PDF, 3.21 MB

Description

A Master of Science thesis in Computer Engineering by Ragini Gupta entitled, “Residential Area Energy Consumption Big Data Analytics and Visualization”, submitted in June 2018. Thesis advisor is Dr. Abdulrehman Al-Ali and thesis co-advisor is Dr. Imran Zualkernan. Soft and hard copy available.

Abstract

As Internet of Things (IoT) technology and open source file distributed system applications are evolving, home appliances can be monitored and controlled via an IoT-based home gateway. These gateways collect energy consumption from home appliances and hence create a large amount of data. Due to the large amount of data being generated, utility companies require platforms that enable them to store, process, analyze, visualize, and monetize the energy consumption data, and to gain meaningful insights into load profiles. This thesis proposes a residential area smart energy management system that enables home owners and utilities to monitor consumption patterns of each home, community, state, and country. Using an open source file distributed file system tools, home owners can monitor their home appliances energy consumption on a periodic basis. Additionally, utilities can also monitor the neighborhoods, communities, states, and country’s consumption. The architecture was tested to process data from one million smart meters. This data was synthetically generated based on one year of real consumption data from a home. The big data was stored in a Hadoop cluster of four nodes. Dimensional modeling was used to develop benchmarking queries to create a real time dashboard consisting of charts, graphs, and reports for home owners and utilities. Both Spark and Hive were used to implement the benchmarking queries and it was found that Spark outperformed Hive in terms of latency and processor throughput. Spark’s average latency was fifteen minutes with an average throughput of 2400 MBps while Hive’s average latency was thirty-four minutes with an average throughput of 2200 MBps for processing one million smart meters in a four nodes cluster. To validate the proposed system outcomes, the results were compared with existing proprietary tools such as IBM’s TimeSeries and relational database management systems. Spark and Hive have an intermediate performance in comparison to IBM’s proprietary tool and relational database management system. The results demonstrate that the proposed solution can be utilized to provide energy data consumption visualization for consumer and utility provider stakeholders, while implementing Spark as the backend processing engine for low latency, performance gain, and a high throughput.