DOI QR코드

DOI QR Code

Identifying Mobile Owner based on Authorship Attribution using WhatsApp Conversation

  • 투고 : 2021.07.05
  • 발행 : 2021.07.30

초록

Social media is increasingly becoming a part of our daily life for communicating each other. There are various tools and applications for communication and therefore, identity theft is a common issue among users of such application. A new style of identity theft occurs when cybercriminals break into WhatsApp account, pretend as real friends and demand money or blackmail emotionally. In order to prevent from such issues, data mining can be used for text classification (TC) in analysis authorship attribution (AA) to recognize original sender of the message. Arabic is one of the most spoken languages around the world with different variants. In this research, we built a machine learning model for mining and analyzing the Arabic messages to identify the author of the messages in Saudi dialect. Many points would be addressed regarding authorship attribution mining and analysis: collect Arabic messages in the Saudi dialect, filtration of the messages' tokens. The classification would use a cross-validation technique and different machine-learning algorithms (Naïve Baye, Support Vector Machine). Results of average accuracy for Naïve Baye and Support Vector Machine have been presented and suggestions for future work have been presented.

키워드

참고문헌

  1. P. Natalie, and T. Yue, T. (2020). "What about WhatsApp? A systematic review of WhatsApp and its role in civic and political engagement," First Monday, vol. 25, 2020, [https:// doi.org/ 10.5210 / fm. v25i12.10417] https://doi.org/10.5210/fm.v25i12.10417
  2. H. Tankovska, "Daily active users of WhatsApp status 2019," Statistica, 2021, [Retrieved on Jan 18, 2021]
  3. I. Shreen, and R. Tariq, "Identify theft and social media," International Journal of Computer Science and Network Security, vol. 18, pp. 43-55, 2018
  4. P. Poonam, K. Krishan, S. Bharanidharan, C. Kheng, "A theoretical review of social media usage by cybercriminals, "in Conference on Computer Communication and Informatics, 2017, India
  5. A. Abbe, C. Grouin, P. Zweigenbaum, B. Falissard, "Text mining applications in psychiatry: a systematic literature review," International Journal of Methods in Psychiatric Research, 25, pp. 86-100. 2016 https://doi.org/10.1002/mpr.1481
  6. H. Fatma, M. Masnizah, A. Zahra, and A. Jowan, "Authorship attribution of short historical arabic texts using stylometric features and a KNN classifier withlLimited training data," Journal of Computer Science, vol. 16, pp. 1334-1345, 2020 https://doi.org/10.3844/jcssp.2020.1334.1345
  7. A.-B. Jafar, T. Bashar, A.-A. Mahmoud, and B. Zaqaibeh, "Using big data analytics for authorship authentication of arabic tweets," in 8th IEEE/ACM International Conference on Utility and Cluod Computing, 2015, Cyprus
  8. B. Mariam, E.Wael, T. Sara, and H. Amjad, "Sentiment classification techniques for arabic language: a survey," in 7th International Conference on Information and Communication Systems, 2016, Jordan
  9. S. Amira, and R. Ahmed, "Sentence-Level Arabic Sentiment Analysis," in International Symposium on Collaboration, Social Computing, New Media and Networks (SoMNet2012), Cairo
  10. A. -f. Ahmed, R. Mohammed, and M. Bellafkih,."Machine learning for authorship attribution in arabic poetry, " International Journal of Future Computer and Communication, vol. 6, pp. 42-46, 2017 https://doi.org/10.18178/ijfcc.2017.6.2.486
  11. A. El-Halees, "Opinion mining from arabic comparative sentences, " in International Arab Conference on Information Technology, 2012, Jordan
  12. A. Mohammad, I. Norisma, M. Rohana, J. Salinah, T. Dirk, and A. Gani, "Hadith data mining and classification: a comparative analysis, " Artificial Inteliigence Review, vol. 46, pp. 113-128, 2016 https://doi.org/10.1007/s10462-016-9458-x
  13. A. Muhammad, A. Tanvir, S. Fahad, and I. Muhammad, "Feature extraction based text classification using k-nearest neighbor algorithm," Internationla Journal of Computer Science and Network Security, vol. 18, pp. 95-101, 2018
  14. R. Oppliger, "Automatic authorship attribution based on character n-grams in Swiss German," in 13th Conference on Natural Language Processing (KONVENS), 2016, Germany
  15. S. Abhay, N. Ananya, and R. Reetika, "An investigation of supervised learning methods for authorship," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 1, pp. 1-11, 2018
  16. A. Lama, S. Mostafa, and E. Fathy, "Arabic blogging Sentiment Analysis," La Pensee, vol. 76, 2016
  17. W. Koehrsen, "Overfitting vs. Underfitting: A Complete Example," [Retrieved from towardsdatascience: https://towardsdatascience.com/overfitting-vsunderfitting-a-completeexample-d05dd7e19765]
  18. J. Chen, Y. Hu, J. Liu, Y. Xiao, and H. Jiang, "Deep Short Text Classification with Knowledge Powered Attention," in Proceedings of the AAAI Conference on Artificial Intelligence. 33. 2019