TY - GEN
T1 - Machine Learning and Statistical Techniques for Outlier Detection in Smart Home Energy Consumption
AU - Krishna, N. Sri
AU - Kumar, Y. V.Pavan
AU - Prakash, K. Purna
AU - Reddy, G. Pradeep
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Due to the continuous increase of smart home culture worldwide, large volumes of energy consumption data gained the attention of data scientists. Smart meters capture the energy consumption readings at a predefined rate and store them as a database. The quality of these databases is highly desired to have accurate analysis and decision-making. But, these readings often have anomalies namely missingness, redundancy, and outliers due to the issues present in meter/data communication networks. Among these, outlier readings indicate an abnormality of the load behavior (e.g.: nonlinearity, unpredicted load switching, system faults, etc.). Hence, it is essential to detect and visualize such anomalies for the necessary treatment. With this motivation, this paper implements various key machine learning and statistical techniques namely autoregressive integrated moving average (ARIMA), autoencoder, density-based spatial clustering of applications with noise (DBSCAN), isolation forest, k-means, hierarchical density-based spatial clustering of applications with noise (HDBSCAN), one-class support vector machine (SVM), local outlier factor (LOF), long short-term memory (LSTM), winsorization, interquartile range (IQR), and Z-score. The results revealed that DBSCAN consistently demonstrated the most accurate performance in detecting outliers in energy data, while, Z-score, IQR, and winsorization provided reasonable outcomes but were limited in handling complex and non-linear data patterns. Autoencoder, Isolation forest, and One-class SVM showed moderate success, but their performance depended on the specific dataset characteristics. Kmeans exhibited mixed results. ARIMA, LOF, LSTM, and HDBSCAN had limited success in outlier detection in the timeseries data. Thus, this analysis finally recommends DBSCAN as the best technique as it consistently outperformed other machine learning and statistical techniques in accurately detecting outliers in smart home energy consumption data.
AB - Due to the continuous increase of smart home culture worldwide, large volumes of energy consumption data gained the attention of data scientists. Smart meters capture the energy consumption readings at a predefined rate and store them as a database. The quality of these databases is highly desired to have accurate analysis and decision-making. But, these readings often have anomalies namely missingness, redundancy, and outliers due to the issues present in meter/data communication networks. Among these, outlier readings indicate an abnormality of the load behavior (e.g.: nonlinearity, unpredicted load switching, system faults, etc.). Hence, it is essential to detect and visualize such anomalies for the necessary treatment. With this motivation, this paper implements various key machine learning and statistical techniques namely autoregressive integrated moving average (ARIMA), autoencoder, density-based spatial clustering of applications with noise (DBSCAN), isolation forest, k-means, hierarchical density-based spatial clustering of applications with noise (HDBSCAN), one-class support vector machine (SVM), local outlier factor (LOF), long short-term memory (LSTM), winsorization, interquartile range (IQR), and Z-score. The results revealed that DBSCAN consistently demonstrated the most accurate performance in detecting outliers in energy data, while, Z-score, IQR, and winsorization provided reasonable outcomes but were limited in handling complex and non-linear data patterns. Autoencoder, Isolation forest, and One-class SVM showed moderate success, but their performance depended on the specific dataset characteristics. Kmeans exhibited mixed results. ARIMA, LOF, LSTM, and HDBSCAN had limited success in outlier detection in the timeseries data. Thus, this analysis finally recommends DBSCAN as the best technique as it consistently outperformed other machine learning and statistical techniques in accurately detecting outliers in smart home energy consumption data.
UR - https://www.scopus.com/pages/publications/85196147155
UR - https://www.scopus.com/pages/publications/85196147155#tab=citedBy
U2 - 10.1109/eStream61684.2024.10542609
DO - 10.1109/eStream61684.2024.10542609
M3 - Conference contribution
AN - SCOPUS:85196147155
T3 - 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences, eStream 2024 - Proceedings
BT - 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences, eStream 2024 - Proceedings
A2 - Navakauskas, Dalius
A2 - Paulikas, Sarunas
A2 - Sledevic, Tomyslav
A2 - Udris, Dainius
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th IEEE Open Conference of Electrical, Electronic and Information Sciences, eStream 2024
Y2 - 25 April 2024
ER -