DOI QR코드

DOI QR Code

Development of a Model for Identifying Drug Organizations and Their Scale through Tweet Clustering

  • Jin-Gyeong Kim (Dept. of Computer Information Engineering, Graduate School, Daegu University) ;
  • Eun-Young Park (Dept. of Computer Engineering, Daegu University) ;
  • Da–Sol Kim (Dept. of Computer Engineering, Daegu University) ;
  • Cho-Won Kim (Dept. of Computer Engineering, Daegu University) ;
  • Jiyeon Kim (Dept. of Computer Engineering, Daegu University)
  • 투고 : 2024.09.09
  • 심사 : 2024.09.30
  • 발행 : 2024.10.31

초록

본 논문은 10대와 청년층에서 빈번하게 발생하는 마약 범죄를 수사하기 위해 소셜미디어 플랫폼 'X'에서 마약 홍보 트윗을 수집하고, 이를 바탕으로 마약 유통 조직 및 규모를 식별하는 클러스터링 모델을 개발하는 것을 목표로 한다. 최근 소셜미디어의 익명성을 악용한 마약, 불법 도박, 성범죄 등 다양한 사이버 범죄가 증가하고 있으며, 특히 마약 유통 조직은 각 구성원이 자신의 역할에 대해서만 익명으로 지시를 받고, 다른 구성원들과 직접 연결되지 않은 점조직 형태로 운영되고 있다. 이러한 유형의 범죄를 추적하기 위해 BERT(Bidirectional Encoder Representations from Transformers), GloVe(Global Vectors for Word Representation)와 같은 텍스트 임베딩 모델 및 K-means Clustering과 Spectral Clustering 등 다양한 클러스터링 알고리즘을 활용하여 실험 시나리오를 설계하였다. 또한, 각 시나리오에서 도출된 클러스터링 결과를 자카드 유사도(Jaccard Similarity) 및 전수조사 기반으로 검증하고, 모든 시나리오에서 동일한 마약 조직으로 식별된 트윗 클러스터를 분석하여 사이버 수사 시, 추적 우선순위가 높은 계정을 식별한다.

In this paper, we propose a model for identifying drug trafficking organizations and assessing their scale by collecting drug promotional tweets from the social media platform 'X,' with a focus on investigating drug crimes that frequently occur among teenagers and young adults. Recently, various cyber crimes, such as drug distribution, illegal gambling, and sex offense, have been on the rise, exploiting the anonymity provided by social media. Drug trafficking organizations, in particular, operate in a decentralized cell structure, where each member receives anonymous instructions regarding only their specific role and is not directly connected to other members. To track these types of crimes, we designed experimental scenarios using various clustering algorithms, such as K-means Clustering and Spectral Clustering, alongside text embedding models like BERT (Bidirectional Encoder Representations from Transformers) and GloVe (Global Vectors for Word Representation). Furthermore, the clustering results derived from each scenario are validated using Jaccard Similarity and a full-scale investigation. We then analyze tweet clusters identified as the same drug organization across all scenarios, prioritizing the identification of high-priority accounts for cyber investigations.

키워드

과제정보

This work was supported by 'Tech. Challenge for Future Program Policing(www.kipot.or.kr)' funded by Ministry of Science and ICT(MSIT, Korea) & Korean National Police Agency(KNPA, Korea). [Project Name : Development of Active Dark Web Information Collection, Analysis and Tracking Technology to Prevent Dark Web Crime / Project Number : RS-2023-00244362] This work was supported by Korea Foundation for Women In Science, Engineering and Technology (WISET) grant funded by the Ministry of Science and ICT(MSIT) under the team research program for female engineering students.

참고문헌

  1. Digital 2024 Global Overview Report, https://datareportal.com/ reports/digital-2024-global-overview-report
  2. Y. Kim, J. Choi, and J. Shin, "Social Losses Due to Drug Addiction Calculated in Monetary Terms... [Investigation K] ['Weak' Society, Talking About Drugs]," KBS NEWS, https://news.kbs.co.kr/news/pc/view/view.do?ncd=7714951.
  3. D. Lee, "Drugs Smuggled Overseas, Sold via Cryptocurrency and SNS... Organization Arrested (Comprehensive)," NEWSIS, https://www.newsis.com/view/NISX20240423_0002709777.
  4. Y. Lee, "70 Arrested for Distributing Drugs Nationwide via Telegram,", Herald Economy, https://biz.heraldcorp.com/view.php?ud=20240709050189.
  5. S. Y. Shin, "[Exclusive] Undercover Investigations in Drug Crimes Like in 'New World'," Seoul Newspaper, https://www.seoul.co.kr/news/society/2022/08/18/20220818010010.
  6. E. Choi, S. Lee, H. Kwon, M. Kim, I. Lee, and S. Lee, "A Study on the Comparison and Semantic Analysis between SNS Big Data, Search Portal Trends and Drug Case Statistics," Journal of Digital Convergence, vol. 19, no. 2, pp. 231-238, Feb. 2021. DOI: 10.14400/JDC.2021.19.2.231
  7. S. Degadwala, D. Vyas, M. R. Hossain, A. R. Dider, M. N. Ali, and P. Kuri, "Location-Based Modelling And Analysis Of Threats By Using Text Mining," 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), pp. 1940-1944, Coimbatore, India, Aug. 2021. DOI: 10.1109/ICESC51422.2021.9532825.
  8. D. Petrou, V. Martinez-Gil, F. Castillo, C. Tunc, and R. Bryce, "Twitter Account Analysis for Drug Involvement Detection," 2023 3rd Intelligent Cybersecurity Conference (ICSC), pp. 9-16, San Antonio, TX, USA, Oct. 2023. DOI: 10.1109/ICSC60084.2023.10349992.
  9. F. Zhao et al., "Computational Approaches to Detect Illicit Drug Ads and Find Vendor Communities Within Social Media Platforms," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 1, pp. 180-191, 3 2022. DOI: 10.1109/TCBB.2020.2978476.
  10. C. Hu, M. Yin, B. Liu, X. Li, and Y. Ye, "Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach," Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM '21), pp. 3838-3846, Virtual Event, Australia, 2021. DOI: 10.1145/3459637.3481908.
  11. C. Hu, B. Liu, X. Li, and Y. Ye, "Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media," arXiv preprint arXiv:2307.03699, 7 2023. DOI: https://doi.org/10.48550/arXiv.2307.03699
  12. T. Ma, Y. Qian, C. Zhang, and Y. Ye, "Hypergraph Contrastive Learning for Drug Trafficking Community Detection," 2023 IEEE International Conference on Data Mining (ICDM), pp. 1205-1210, Shanghai, China, Dec. 2023. DOI: 10.1109/ICDM58522.2023.00149.
  13. N. Shah, N. Bhagat, and M. Shah, "Crime forecasting: a machine learning and computer vision approach to crime prediction and prevention," Visual Computing for Industry, Biomedicine, and Art, vol. 4, no. 9, pp. 1-14, Apr. 2021. DOI: 10.1186/s42492-021-00075-z.
  14. Y. Wang, W. Yu, S. Liu, and S. D. Young, "The Relationship Between Social Media Data and Crime Rates in the United States," *Social Media + Society*, vol. 5, no. 1, Mar. 2019. DOI: 10.1177/2056305119834585.
  15. F.-C. Tsai, M.-C. Hsu, C.-T. Chen, and D.-Y. Kao, "Exploring drug-related crimes with social network analysis," Procedia Computer Science, vol. 159, pp. 1907-1917, Oct. 2019. DOI: 10.1016/j.procs.2019.09.363.
  16. P. Siriaraya, Y. Zhang, Y. Wang, Y. Kawai, M. Mittal, P. Jeszenszky, and A. Jatowt, "Witnessing Crime through Tweets: A Crime Investigation Tool based on Social Media," Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL '19), pp. 568-571, New York, NY, USA, Nov. 2019. DOI: 10.1145/3347146.3359082.
  17. E.-Y. Park, J. Kim, and C.-H. Kim, "A Tracking Method of Same Drug Sales Accounts through Similarity Analysis of Instagram Profiles and Posts," Journal of The Korea Society of Computer and Information, vol. 29, no. 2, pp. 109-118, Feb. 2024. DOI: 10.9708/jksci.2024.29.02.109
  18. .H. J. Park, "A Study on Drug trading countermeasures via internet and sns," Journal of the Korea Information Assurance Society, Vol. 18, No. 1, pp. 93-102, Mar. 2018.
  19. H.-j. Song and S.-y. Oh, "An Analysis of the Progress of Youth Drug Crimes Using Cryptocurrency," Korean Journal of Convergence Science, vol. 12, no. 11, pp. 133-145, Nov. 2023. DOI: 10.24826/KSCS.12.11.9.
  20. F. M. Bianchi, D. Grattarola, and C. Alippi, "Spectral Clustering with Graph Neural Networks for Graph Pooling," Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 874-883, Jul. 2020. Available: https://proceedings.mlr.press/v119/bianchi20a.html.
  21. Lee, Seul-ki, "Analysis of Question Types by Student Writers Using Clustering in Writing Using Generative Artificial Intelligence," The Journal of Korean Language and Literature Education, Vol. 84, pp. 69-105, Feb. 2024.
  22. S.-Y. Ihm, "A Study on Clustering-based Color Extraction Method for Pill Classification," Journal of Big Data Service, vol. 1, no. 1, pp. 79-84, Jul. 2023. DOI: 10.61241/KBDSS.01.01.07.
  23. Inmook Lee, Jaehong Min, Kyoungtae Kim, and Seung-Young Kho, "Generating Travel Patterns of Public Transportation Users Using a k-means Clustering Based on Smart Card Data," Journal of the Korean Society for Railway, Vol. 23, No. 3, pp. 204-215, Mar. 2020. DOI: 10.7782/JKSR.2020.23.3.204
  24. D.-T. Vu and J.-Y. Jeong, "Collision Risk Assessment by using Hierarchical Clustering Method and Real-time Data," Journal of the Korean Society of Marine Environment and Safety, vol. 27, no. 4, pp. 483-491, June 2021. DOI: 10.7837/kosomes.2021.27.4.483.
  25. J.-H. Tak, J.-Y. Hong, and D.-J. Park, "A Study on Road Link Vulnerability Assessment Based on Clustering Analysis for Disaster Situations," Journal of Korean ITS Society, vol. 22, no. 2, pp. 29-43, Feb. 2023. DOI: 10.12815/kits.2023.22.2.29
  26. G. Hajela, M. Chawla, and A. Rasool, "A Clustering Based Hotspot Identification Approach For Crime Prediction," *Procedia Computer Science*, vol. 167, pp. 1462-1470, Apr. 2020. DOI: 10.1016/j.procs.2020.03.357.
  27. D. Y. Kim and S. W. Jung, "A Preliminary Study on the Application of Spatial Clustering Techniques for Crime Prediction," in *Proceedings of the Korean Society of Spatial Information Science Conference*, vol. 2020.6, pp. 111-114, Jun. 2020.
  28. A. A. Alkhaibari and P.-T. Chung, "Cluster analysis for reducing city crime rates," 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT), pp. 1-6, Farmingdale, NY, USA, May 2017. DOI: 10.1109/LISAT.2017.8001983.
  29. Y. van Gennip, B. Hunter, R. Ahn, P. Elliott, K. Luh, M. Halvorson, S. Reid, M. Valasik, J. Wo, G. E. Tita, A. L. Bertozzi, and P. J. Brantingham, "Community detection using spectral clustering on sparse geosocial data", arXiv preprint arXiv:1206.4969, Jun. 2012. DOI: https://doi.org/10.48550/arXiv.1206.4969
  30. K. Joseph, R. J. Gallagher, and B. F. Welles, "Who Says What with Whom using Bi-Spectral Clustering to Organize and Analyze Social Media Protest Networks," Nov. 2020. DOI: 10.5117/CCR2020.2.002.JOSE
  31. R. Tibshirani, G. Walther, and T. Hastie, "K-Means Clustering and Related Algorithms," Technical Report, Stanford University, 2004.
  32. U. von Luxburg, "A Tutorial on Spectral Clustering," Technical Report No. MPIK-TR-149, Max Planck Institute for Biological Cybernetics, Aug. 2007. DOI: 10.1007/s11222-007-9033-z