TY - GEN
T1 - Statistical Detection of Data Drift in Real-time Social Network Conversations
AU - Pujari, Chetana
AU - Sumith, N.
AU - Pooja, S.
AU - Chandrakala, C. B.
AU - Prabhu, Vibha
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The increasing reliance on conversational datasets for natural language processing (NLP) applications necessitates a comprehensive understanding of potential data drift phenomena. This paper investigates the phenomenon of data drift within conversational datasets over time, aiming to develop effective methods for detection and mitigation. Our approach involves the analysis of temporal changes in the distribution of conversation data, focusing on linguistic patterns, user preferences, and contextual nuances. A novel framework leveraging advanced statistical methods and machine learning techniques to quantify and detect data drift within the dataset is proposed here. The methodology is designed to adapt to the evolving nature of language use, capturing subtle shifts in conversational dynamics that may impact model performance. Furthermore, experimental results on a diverse set of conversational datasets, demonstrating the efficacy of our approach in identifying and characterizing data drift is presented here. The findings highlight the importance of continuous monitoring and adaptation to evolving linguistic patterns, ensuring the robustness and generalization capability of NLP models over time. This research contributes to the broader understanding of data drift in conversational datasets and provides a foundation for the development of adaptive NLP models capable of maintaining high performance in dynamic linguistic environments. The proposed framework not only enhances the reliability of existing models but also lays the groundwork for future research in addressing the evolving challenges posed by data drift in natural language conversations.
AB - The increasing reliance on conversational datasets for natural language processing (NLP) applications necessitates a comprehensive understanding of potential data drift phenomena. This paper investigates the phenomenon of data drift within conversational datasets over time, aiming to develop effective methods for detection and mitigation. Our approach involves the analysis of temporal changes in the distribution of conversation data, focusing on linguistic patterns, user preferences, and contextual nuances. A novel framework leveraging advanced statistical methods and machine learning techniques to quantify and detect data drift within the dataset is proposed here. The methodology is designed to adapt to the evolving nature of language use, capturing subtle shifts in conversational dynamics that may impact model performance. Furthermore, experimental results on a diverse set of conversational datasets, demonstrating the efficacy of our approach in identifying and characterizing data drift is presented here. The findings highlight the importance of continuous monitoring and adaptation to evolving linguistic patterns, ensuring the robustness and generalization capability of NLP models over time. This research contributes to the broader understanding of data drift in conversational datasets and provides a foundation for the development of adaptive NLP models capable of maintaining high performance in dynamic linguistic environments. The proposed framework not only enhances the reliability of existing models but also lays the groundwork for future research in addressing the evolving challenges posed by data drift in natural language conversations.
UR - http://www.scopus.com/inward/record.url?scp=85211081739&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85211081739&partnerID=8YFLogxK
U2 - 10.1109/ICCCNT61001.2024.10724556
DO - 10.1109/ICCCNT61001.2024.10724556
M3 - Conference contribution
AN - SCOPUS:85211081739
T3 - 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024
BT - 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024
Y2 - 24 June 2024 through 28 June 2024
ER -