A Framework for Analyzing the Context of Discussion in Crowd Clusters

doi:10.23940/ijpe.24.04.p4.224231

Abstract

Abstract: Nowadays, social media platforms are extensively used by the public to expose their opinions on various sensitive matters. One of the active research challenges in social media analytics is web content mining and context analysis. This helps us to identify events or incidents which are actively discussed among many users who may be from specific geographical areas. Such events or incident identification gives an early warning in many unusual situations [1]. However, the semantic processing of social media is challenging due to its high complexity, ambiguity, and unstructured nature. In this work, we propose a framework for crowd cluster identification and context analysis from clusters using the Online Spherical K-means algorithm and some Natural Language Processing (NLP) techniques. Initially, the tweets are scraped from Twitter and undergo suitable data preprocessing steps. Furthermore, clusters are identified from the cleaned data using the Online Spherical K-means algorithm. Finally, the analysis and visualization of context discussion from each cluster are performed with the aid of various fitting NLP methods. The proposed method is evaluated using tweets scraped with three different hashtags #blacklivesmatter, #Superbowl, and #Texasfreeze. For performance evaluation, we computed the homogeneity score, Completeness score, Calinski-Harabasz Index, and V-Measure. The performance metrics show that the proposed method yields promising results.

Key words: text processing, clustering, entity recognition module, tweets, embeddings, visualization

Bibal Benifa J. V, Joel Mathew Philip, Christy K T, and Anu K P. A Framework for Analyzing the Context of Discussion in Crowd Clusters [J]. Int J Performability Eng, 2024, 20(4): 224-231.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Watanabe K., Ochi M., Okabe M., and Onai R.Jasmine: A Real-Time Local-Event Detection System Based on Geolocation Information Propagated to Microblogs. In Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 2541-2544, 2011.
[2] Bird, S. NLTK: The Natural Language Toolkit. In Proceedings of the COLING/ACL2006 Interactive Presentation Sessions, pp. 69-72, 2006.
[3] Cer D., Yang Y., Kong S.Y., Hua N., Limtiaco N., John R.S., Constant N., Guajardo-Cespedes M., Yuan S., Tar C., and Strope B. Universal Sentence Encoder for English. In Proceedings of the2018 conference on empirical methods in natural language processing: system demonstrations, pp. 169-174, 2018.
[4] Sato K., Wang J., and Cheng, Z. Detecting Real-Time Events using Tweets. In2016 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, pp. 1-6, 2016.
[5] Khalifa M.B.,Diaz Redondo, R.P., Vilas, A.F., and Rodríguez, S.S. Identifying Urban Crowds using Geo-Located Social Media Data: A Twitter Experiment in New York City. Journal of Intelligent Information Systems, vol. 48, pp. 287-308, 2017.
[6] Bojanowski P., Grave E., Joulin A., and Mikolov T.Enriching Word Vectors with Subword Information. Transactions of the association for computational linguistics, vol. 5, pp. 135-146, 2017.
[7] Zhong, S. Efficient Online Spherical K-Means Clustering. In Proceedings.2005 IEEE International Joint Conference on Neural Networks, IEEE, vol. 5, pp. 3180-3185, 2005.
[8] d'Sa A.G., Illina I., and Fohr D. Bert and Fasttext Embeddings for Automatic Detection of Toxic Speech. In2020 International Multi-Conference on:“Organization of Knowledge and Advanced Technologies”(OCTA), IEEE, pp. 1-5, 2020.
[9] Sriram S.An Evaluation of Text Representation Techniques for Fake News Detection using: TF-IDF, Word Embeddings, Sentence Embeddings with Linear Support Vector Machine, 2020.
[10] JafariAsbagh, M., Ferrara, E., Varol, O., Menczer, F., and Flammini, A. Clustering Memes in Social Media Streams. Social Network Analysis and Mining, vol. 4, pp. 1-13, 2014.
[11] Huang F., Li X., Zhang S., Zhang J., Chen J., and Zhai Z.Overlapping Community Detection for Multimedia Social Networks. IEEE Transactions on multimedia, vol. 19, no. 8, pp. 1881-1893, 2017.

[1]	V. Sudha and Anna Saro Vijendran. OSD-DNN: Oil Spill Detection using Deep Neural Networks [J]. Int J Performability Eng, 2024, 20(2): 57-67.
[2]	Payal Khurana Batra and P. Raghu Vamsi. Fuzzy Logic-Based Cluster Head Selection Method for Enhancing Wireless Sensor Network Lifetime [J]. Int J Performability Eng, 2024, 20(2): 81-90.
[3]	Deepak Kumar, Chaman Verma, Purushottam Sharma, Deeksha Kumari, and Zoltán Illés. Demographic and Clinical Factors Role Identification in Stroke Risk and Subtype Prediction [J]. Int J Performability Eng, 2023, 19(6): 368-378.
[4]	Yihao Li, Pan Liu, W. Eric Wong, Nicholas Chau, and Chih-Wei Hsu. Alternative Ranking Distance Metrics for Fault-Focused Clustering in Parallel Fault Localization [J]. Int J Performability Eng, 2023, 19(10): 633-643.
[5]	Arvind Kumar Mishra, Renuka Nagpal, Kirti Seth, and Rajni Sehgal. Maintainability of Service-Oriented Architecture using Hybrid K-means Clustering Approach [J]. Int J Performability Eng, 2023, 19(1): 33-42.
[6]	Kai-Wen Chen and Chin-Yu Huang. Automatic Categorization of Software with Document Clustering Methods and Voting Mechanism [J]. Int J Performability Eng, 2022, 18(4): 251-262.
[7]	Ankit R. Mune and Sohel A. Bhura. TUMKFCM-ELM: An Unsupervised Multiple Kernelized Fuzzy C-Means Extreme Learning Machine Approach for Heterogeneous Datasets [J]. Int J Performability Eng, 2022, 18(3): 188-200.
[8]	Shelley Gupta, Archana Singh, and Jayanthi Ranjan. Online Document Content and Emoji-Based Classification Understanding from Normal to Pandemic COVID-19 [J]. Int J Performability Eng, 2022, 18(10): 710-719.
[9]	M. Muzammil Parvez, J. Shanmugam, and V.S. Ghali. Fuzzy C-based Automatic Defect Detection using Barker Coded Thermal Wave Imaging [J]. Int J Performability Eng, 2021, 17(5): 484-490.
[10]	Zengyu Cai, Zuodong Wu, Jianwei Zhang, and Wenqian Wang. A BD Group Key Negotiation Protocol based on Clustering Technology [J]. Int J Performability Eng, 2020, 16(6): 875-882.
[11]	Aidi Xu, and Yunfeng Shang. Cooperative Quality Evaluation of Supply Chain using Structural Characteristics [J]. Int J Performability Eng, 2020, 16(5): 775-783.
[12]	Maxime Redondin, Laurent Bouillaut, and Dimitri Daucher. A Clustering-based Approach to Segment a Pavement Markings Line [J]. Int J Performability Eng, 2020, 16(10): 1497-1508.
[13]	Ming Lei, Bin Wen, Jianhou Gan, and Jun Wang. Clustering Algorithm of Ethnic Cultural Resources based on Spark [J]. Int J Performability Eng, 2019, 15(3): 756-762.
[14]	Tianyong Wu, Yunsheng Zhao, and Xiang Li. Method of DTM Extraction and Visualization using Threshold Segmentation and Mathematical Morphology [J]. Int J Performability Eng, 2019, 15(3): 919-929.
[15]	Jinchao Zhao, Shuaichao Wei, and Qiuwen Zhang. Effective Intra Mode Prediction of 3D-HEVC System based on Big Data Clustering and Data Mining [J]. Int J Performability Eng, 2019, 15(12): 3219-3226.