Deteksi Email Spam Menggunakan Multinomial Naive Bayes dengan Teknik Bag of Words
DOI:
https://doi.org/10.55681/sentri.v5i2.5650Keywords:
Spam Classification, Spam Detection, Email Spam, Multinomial Naive Bayes, Bag Of WordsAbstract
Email is a means of communication within internal networks and the internet for the exchange of information. Email is still used today because of its ease of use. However, with the increase in the number of incoming emails, the problem of spam has arisen, requiring effective methods for detecting spam so that users can manage their email more efficiently and avoid potential fraud and disruption. This study aims to analyze the thematic and linguistic patterns of email messages based on their content using text classification techniques with the Multinomial Naive Bayes algorithm, which is believed to have good accuracy in detecting spam emails. The research consists of collecting a dataset related to Indonesian-language spam emails, preprocessing the data, training the model by dividing it into two scenarios (with and without stemming), and evaluating the model. Features from the email text will be converted into numerical representations using the Bags-of-Words method. Classification performance evaluation is carried out using accuracy, precision, recall, F1-Score, and confusion matrix metrics. Experimental results demonstrate that the Multinomial Naive Bayes model without stemming achieved the highest performance with an Accuracy of 92.5%, Precision of 91.0%, and F1-Score of 91.7%. These findings indicate that stemming in short texts like spam emails eliminates crucial semantic features (affixes) characteristic of spam. This study contributes to providing optimal pre-processing recommendations for Indonesian short text classification.
Downloads
References
Amin, M. B. M., Hakim, G., Maulana, M. T., Alwan, M. F., Anggraheni, H. S., Naufal, M. J., & Yudistira, N. (2024). Deteksi Spam Berbahasa Indonesia Berbasis Teks Menggunakan Model Bert. Jurnal Teknologi Informasi Dan Ilmu Komputer, 11(6), 1291–1302. https://doi.org/10.25126/jtiik.2024118121
Krishna Juluru, MD Hao-Hsin Shih, MS Krishna Nand Keshava Murthy, MS Pierre Elnajjar, & MS. (2021). Bag-of-Words Technique in Natural Language Processing_ A Primer for Radiologists.
Dharrao, D., Gaikwad, P., Gawai, S. V., Bongale, A. M., Patel, K., & Singh, A. (2024). Classifying SMS as Spam or Ham: Leveraging NLP and Machine Learning Techniques. International Journal of Safety and Security Engineering, 14(1), 289–296. https://doi.org/10.18280/ijsse.140128
Galuh, U., Kasus, S., Rasa Galendo, P. D., Cigembor, K., Ciamis, K., Ciamis, K., Barat, J., Deassy, ), Sari, R. J., & Nurhayaty, M. (2024). PENERAPAN GO DIGITAL MARKETING UNTUK MENINGKATKAN PENJUALAN PRODUK PADA UMKM GALENDO. Jurnal Media Teknologi, 11(01).
Given Putra. (2025). 09-15. Jurnal Komputer Dan Informatika Vol 20 No 1, April 2025: Hlm 09- 15.
Hanif, M., Ubaidillah, Z., & Fatah, Z. (2024). Volume 2 ; Nomor 10. 129–132. https://doi.org/10.59435/gjmi.v2i11.536
Hassan, S. U., Ahamed, J., & Ahmad, K. (2022). Analytics of machine learning-based algorithms for text classification. Sustainable Operations and Computers, 3, 238–248. https://doi.org/10.1016/j.susoc.2022.03.001
Jaiswal, M., Das, S., & Khushboo. (2021). Detecting spam e-mails using stop word TF-IDF and stemming algorithm with Naïve Bayes classifier on the multicore GPU. International Journal of Electrical and Computer Engineering, 11(4), 3168–3175. https://doi.org/10.11591/ijece.v11i4.pp3168-3175
Nasreen, G., Murad Khan, M., Younus, M., Zafar, B., & Kashif Hanif, M. (2024). Email spam detection by deep learning models using novel feature selection technique and BERT. Egyptian Informatics Journal, 26. https://doi.org/10.1016/j.eij.2024.100473
Prasad, J. K., & Christy, S. (2022). SMS Spam Detection Using Multinational Naive Bayes Algorithm Compared with Decision Tree Algorithm. BALTIC JOURNAL OF LAW & POLITICS A Journal of Vytautas Magnus University, 15(4), 349–356. https://doi.org/10.2478/bjlp-2022-004037
Raharjo, B. (2021). P Y YAYASAN PRIMA AGUS TEKNIK YAYASAN PRIMA AGUS TEKNIK YAYASAN PRIMA AGUS TEKNIK Pembelajaran Mesin (Machine Learning).
Rahma, F., Farmadiansyah, A. Z., & Hidayatullah, A. F. (2021). Deteksi Surel Spam dan Non Spam Bahasa Indonesia Menggunakan Metode Naïve Bayes.
Rustam, M., Brotokuncoro, A., & Roestam, R. (2024). DETEKSI EMAIL SPAM DENGAN CONTINUOUS BAG-OF-WORDS DAN RANDOM FOREST. In Jurnal Ilmiah Sain dan Teknologi (Vol. 2, Issue 4).
Wijaya, A. P., Penulis, *, & Diajukan, K. (2025). Perbandingan Algoritma Klasifikasi Random Foresst dengan Naïve Bayes Classifier pada Studi Penyakit Berdasarkan Pola Nutrisi. Remik: Riset Dan E-Jurnal Manajemen Informatika Komputer, 9(1). https://doi.org/10.33395/remik.v9i1.14652
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Widya Mulyaningtyas, Kusrini Kusrini

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





