Int J Performability Eng ›› 2024, Vol. 20 ›› Issue (4): 224-231.doi: 10.23940/ijpe.24.04.p4.224231

Previous Articles     Next Articles

A Framework for Analyzing the Context of Discussion in Crowd Clusters

Bibal Benifa J. V*, Joel Mathew Philip, Christy K T, and Anu K P   

  1. Indian Institute of Information Technology Kottayam, Kerala, India
  • Submitted on ; Revised on ; Accepted on
  • Contact: * E-mail address: benifa@iiitkottayam.ac.in

Abstract: Nowadays, social media platforms are extensively used by the public to expose their opinions on various sensitive matters. One of the active research challenges in social media analytics is web content mining and context analysis. This helps us to identify events or incidents which are actively discussed among many users who may be from specific geographical areas. Such events or incident identification gives an early warning in many unusual situations [1]. However, the semantic processing of social media is challenging due to its high complexity, ambiguity, and unstructured nature. In this work, we propose a framework for crowd cluster identification and context analysis from clusters using the Online Spherical K-means algorithm and some Natural Language Processing (NLP) techniques. Initially, the tweets are scraped from Twitter and undergo suitable data preprocessing steps. Furthermore, clusters are identified from the cleaned data using the Online Spherical K-means algorithm. Finally, the analysis and visualization of context discussion from each cluster are performed with the aid of various fitting NLP methods. The proposed method is evaluated using tweets scraped with three different hashtags #blacklivesmatter, #Superbowl, and #Texasfreeze. For performance evaluation, we computed the homogeneity score, Completeness score, Calinski-Harabasz Index, and V-Measure. The performance metrics show that the proposed method yields promising results.

Key words: text processing, clustering, entity recognition module, tweets, embeddings, visualization