Comparison of K-NN, SVM, and Random Forest Algorithm for Detecting Hoax on Indonesian Election 2024

Indra; Agus Umar Hamdani; Suci Setiawati; Zena Dwi Mentari; Mauridhy Hery Purnomo

doi:10.23887/janapati.v13i1.76079

Authors

Indra Universitas Budi Luhur
Agus Umar Hamdani Teknik Informatika,Fakultas Teknologi Informasi, Universitas Budi Luhur Jakarta
Suci Setiawati Teknik Informatika,Fakultas Teknologi Informasi, Universitas Budi Luhur Jakarta
Zena Dwi Mentari Teknik Informatika,Fakultas Teknologi Informasi, Universitas Budi Luhur Jakarta
Mauridhy Hery Purnomo Department of Computer Engineering, Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.23887/janapati.v13i1.76079

Keywords:

Indonesian Election 2024, TF-IDF, K-NN, TWEET, HOAX DETECTION

Abstract

During the year 2022, The Indonesian National Police (POLRI) received 113 reports related to the spread of hoax news related to 2024 Indonesian Election (PEMILU). There are still relatively few hoax detection tools that already exist in Indonesia. This research creates a system that can detect hoax news in Indonesian tweets about the Indonesian Election (PEMILU) 2024 by comparing three methods, namely K-NN, SVM, and Random Forest. The process of labeling (create model) using validation on ground truth data, namely cekfakta.tempo, cekfakta.kompas, and turnbackhoax.id. In this research, we also check the differences between different types of distance measurements in applying the K-NN algorithm. The method used for feature extraction in this research is TF-IDF. The results of experiments show that the highest accuracy results are obtained using the SVM and K-NN algorithms with distance measurements using Euclidean Distance, which is 86.36%. The best precision value is obtained using the K-NN algorithm with distance measurements using Manhattan Distance, which is 86.95%.

References

Nurhayati and A. Pasaribu, “Perancangan Sistem Pendeteksi Berita Hoax Menggunakan Algoritma Levenshtein Distance Berbasis Php,” J. SAINTIKOM (Jurnal Sains Manaj. Inform. dan Komputer), vol. 19, no. 2, p. 74, 2020, doi: 10.53513/jis.v19i2.2601.

Indra, S. Setiawati, S. Vaddhana, and A. Septiarini, “Comparison of Naive Bayes and Support Vector Machine for Detecting Hoax in Indonesian Tweet Case Study of Tweet Covid-19,” Int. Conf. Electr. Eng. Comput. Sci. Informatics, vol. 2022-Octob, no. October, pp. 61–66, 2022, doi: 10.23919/EECSI56542.2022.9946515.

C. S. Sriyano and E. B. Setiawan, “Pendeteksian Berita Hoax Menggunakan Naive Bayes Multinomial Pada Twitter dengan Fitur Pembobotan TF-IDF,” e-Proceeding Eng. Vol.8, No.2, vol. 8, no. 2, pp. 3396–3405, 2021.

Q. Liao et al., “An Integrated Multi-Task Model for Fake News Detection,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 11, pp. 5154–5165, 2022, doi: 10.1109/TKDE.2021.3054993.

L. Wu, P. Liu, Y. Zhao, P. Wang, and Y. Zhang, “Human Cognition-Based Consistency Inference Networks for Multi-Modal Fake News Detection,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 1, pp. 211–225, 2024, [Online]. Available: https://ieeexplore.ieee.org/document/10138033

A. H. J. Almarashy, M.-R. Feizi-Derakhshi, and P. Salehpour, “Enhancing Fake News Detection by Multi-Feature Classification,” IEEE Trans. Knowl. Data Eng., vol. 11, pp. 139601–139613, 2023, doi: 10.1109/ACCESS.2023.3339621.

A. Heidari, N. J. Navimipour, H. Dag, S. Talebi, and M. Unal, “A Novel Blockchain-Based Deepfake Detection Method Using Federated and Deep Learning Models,” Cognit. Comput., no. 0123456789, 2024, doi: 10.1007/s12559-024-10255-7.

M. Audina, A. E. Karyawati, I. W. Supriana, I. K. G. Suhartana, I. G. S. Astawa, and I. W. Santiyasa, “Klasifikasi Berita Hoaks Covid-19 Menggunakan Kombinasi Metode K-Nearest Neighbor dan Information Gain,” JELIKU (Jurnal Elektron. Ilmu Komput. Udayana), vol. 10, no. 4, p. 319, 2022, doi: 10.24843/jlk.2022.v10.i04.p02.

E. Zuliarso, M. T. Anwar, K. Hadiono, and I. Chasanah, “Detecting Hoaxes in Indonesian News Using TF/TDM and K Nearest Neighbor,” IOP Conf. Ser. Mater. Sci. Eng., vol. 835, no. 1, pp. 0–6, 2020, doi: 10.1088/1757-899X/835/1/012036.

M. K. Elhadad, K. F. Li, and F. Gebali, “Detecting misleading information on COVID-19,” IEEE Access, vol. 8, pp. 165201–165215, 2020, doi: 10.1109/ACCESS.2020.3022867.

I. L. Kharisma, D. A. Septiani, A. Fergina, and K. Kamdan, “Penerapan Algoritma Decision Tree untuk Ulasan Aplikasi Vidio di Google Play,” J. Nas. Teknol. dan Sist. Inf., vol. 9, no. 2, pp. 218–226, 2023, doi: 10.25077/teknosi.v9i2.2023.218-226.

I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate speech detection in the Indonesian language: A dataset and preliminary study,” 2017 Int. Conf. Adv. Comput. Sci. Inf. Syst. ICACSIS 2017, vol. 2018-Janua, no. October, pp. 233–237, 2017, doi: 10.1109/ICACSIS.2017.8355039.

F. G. Weddiningrum, “Deteksi Konten Hoax Berbahasa Indonesia Pada Media Sosial Menggunakan Metode Levenshtein Distance,” Perpust. Univ. Islam Neger Sunan Ampel, pp. 1–78, 2018.

P. D. Nugraha, S. al Faraby, and Adiwijaya, “Klasifikasi Dokumen Menggunakan Metode Knn Dengan Information Gain,” eProceedings Eng., vol. 5, no. 1, pp. 1541–1550, 2018.

M. Addanki, “Integrating Sentiment Analysis in Book Recommender System by using Rating Prediction and DBSCAN Algorithm with Hybrid Filtering Technique,” 2023.

Y. Miftahuddin, S. Umaroh, and F. R. Karim, “Perbandingan Metode Perhitungan Jarak Euclidean, Haversine, Dan Manhattan Dalam Penentuan Posisi Karyawan (Studi Kasus : Institut Teknologi Nasional Bandung),” J. Tekno Insentif, vol. 14, no. 2, pp. 69–77, 2020, [Online]. Available: https://jurnal.lldikti4.or.id/index.php/jurnaltekno/article/view/270

V. K. Gupta, A. Gupta, D. Kumar, and A. Sardana, “Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model,” Big Data Min. Anal., vol. 4, no. 2, pp. 116–123, 2021, doi: 10.26599/BDMA.2020.9020016.

V. W. Siburian and I. E. Mulyana, “Prediksi Harga Ponsel Menggunakan Metode Random Forest,” Annu. Res. Semin., vol. 4, no. 1, pp. 144–147, 2018.

M. F. Rahman, D. Alamsah, and M. I. Darmawidjadja, “Klasifikasi Untuk Diagnosa Diabetes Menggunakan Metode Bayesian Regularization Neural Network (RBNN),” J. Inform., vol. 11, no. 1, p. 36, 2017, doi: 10.26555/jifo.v11i1.a5452.

F. Rahutomo, I. Y. R. Pratiwi, and D. M. Ramadhani, “Eksperimen Naïve Bayes Pada Deteksi Berita Hoax Berbahasa Indonesia,” J. Penelit. Komun. Dan Opini Publik, vol. 23, no. 1, 2019, doi: 10.33299/jpkop.23.1.1805.

F. Prasetya and F. Ferdiansyah, “Analisis Data Mining Klasifikasi Berita Hoax COVID 19 Menggunakan Algoritma Naive Bayes,” J. Sist. Komput. dan Inform., vol. 4, no. 1, p. 132, 2022, doi: 10.30865/json.v4i1.4852.

N. K. Widyasanti, I. K. G. Darma Putra, and N. K. Dwi Rusjayanthi, “Seleksi Fitur Bobot Kata dengan Metode TFIDF untuk Ringkasan Bahasa Indonesia,” J. Ilm. Merpati (Menara Penelit. Akad. Teknol. Informasi), vol. 6, no. 2, p. 119, 2018, doi: 10.24843/jim.2018.v06.i02.p06.

W. Hidayat, E. Utami, A. F. Iskandar, A. D. Hartanto, and A. B. Prasetio, “Perbandingan Performansi Model pada Algoritma K-NN terhadap Klasifikasi Berita Fakta Hoaks Tentang Covid-19,” Edumatic J. Pendidik. Inform., vol. 5, no. 2, pp. 167–176, 2021, doi: 10.29408/edumatic.v5i2.3664.