KLASIFIKASI JUDUL BERITA BAHASA INDONESIA MENGGUNAKAN SUPPORT VECTOR MACHINE DAN SELEKSI FITUR MUTUAL INFORMATION

Authors

  • I Putu Gede Hendra Suputra Universitas Udayana
  • Linawati Fakultas Teknik, Universitas Udayana
  • I Gede Sukadarmika Fakultas Teknik, Universitas Udayana
  • Nyoman Putra Sastra Fakultas Teknik, Universitas Udayana

DOI:

https://doi.org/10.23887/jptkundiksha.v22i1.89158

Keywords:

klasifikasi teks, judul berita, support vector machine, mutual information

Abstract

Current information and communication technology has changed the way information is shared, affecting the way people get and deliver news. The number of digital news that continues to increase every day by several news portals poses a challenge, where news is often related to more than one category. From the existing problems, a study was conducted on the classification of online news titles. This study uses the SVM method with mutual information feature selection to classify online news titles. The dataset used is the news title from detik.com using 6 categories, namely finance, travel, health, auto, food, and sport with the number of data per category being 2000 data. The classification process starts from text preprocessing, term weighting using TF-IDF, then feature selection with mutual information, and finally classification with SVM. The results of the study showed that testing various SVM kernels and mutual information (MI) thresholds with a threshold of 85% provided the highest level of F1-score on the SVM machine with the RBF kernel and a C value = 10, which was 86,15%.

References

F. A. Ramadhan, S. H. Sitorus, and T. Rismawan, ‘Penerapan Metode Multinomial Naïve Bayes untuk Klasifikasi Judul Berita Clickbait dengan Term Frequency - Inverse Document Frequency’, Jurnal Sistem dan Teknologi Informasi (JustIN), vol. 11, no. 1, p. 70, 2023, doi: 10.26418/justin.v11i1.57452.

A. Alfando and R. Hayami, ‘Klasifikasi Teks Berita Berbahasa Indonesia Menggunakan Machine Learning Dan Deep Learning: Studi Literatur’, JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 681–686, 2023, doi: 10.36040/jati.v7i1.6486.

B. H. Mahendra, Adiwijaya, and U. N. Wisesty, ‘Kategorisasi Berita Multi-Label Berbahasa Indonesia Menggunakan Algoritma Random Forest’, e-Proceeding of Engineering, vol. 6, no. 2, pp. 9030–9041, 2019.

R. Yuranda, T. Sutabri, and D. Wahyuningsih, ‘Pendekatan Macine Learning dalam Evaluasi Label Berita Berdasarkan Judul : Studi Kasus Media Online’, vol. 12, pp. 434–439, 2023.

P. Rama, B. Putra, and R. S. Perdana, ‘Klasifikasi Judul Berita Online menggunakan Metode Support Vector Machine ( SVM ) dengan Seleksi Fitur Chi-square’, vol. 7, no. 5, pp. 2132–2141, 2023.

L. G. Irham, A. Adiwijaya, and U. N. Wisesty, ‘Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine’, Jurnal Media Informatika Budidarma, vol. 3, no. 4, p. 284, 2019, doi: 10.30865/mib.v3i4.1410.

N. Ajijah, A. Kurniawan, and Susilawati, ‘Klasifikasi Teks Mining Terhadap Analisa Isu Kegiatan Tenaga Lapangan Menggunakan Algoritma K-Nearest Neighbor (KNN)’, Jurnal Sains Komputer & Informatika (J-SAKTI, vol. 7, no. 1, pp. 254–262, 2023.

M. Sholih ’afif, M. Muzakir, M. I. Al, and G. Al Awalaien, ‘Text Mining Untuk Mengklasifikasi Judul Berita Online Studi Kasus Radar Banjarmasin Menggunakan Metode Naïve Bayes’, Kumpulan jurnaL Ilmu Komputer (KLIK), vol. 08, no. 2, pp. 199–208, 2021.

S. Waljinah, H. J. Prayitno, E. Purnomo, A. Rufiah, and E. W. Kustanti, ‘Tindak Tutur Direktif Wacana Berita Online: Kajian Media Pembelajaran Berbasis Teknologi Digital’, SeBaSa, vol. 2, no. 2, p. 118, 2019, doi: 10.29408/sbs.v2i2.1590.

W. Afandi, S. N. Saputro, A. M. Kusumaningrum, H. Adriansyah, M. H. Kafabi, and S. Sudianto, ‘Klasifikasi Judul Berita Clickbait menggunakan RNN-LSTM’, Jurnal Informatika: Jurnal Pengembangan IT, vol. 7, no. 2, pp. 85–89, 2022, doi: 10.30591/jpit.v7i2.3401.

M. U. Albab, Y. Karuniawati, and M. N. Fawaiq, ‘Optimization of the Stemming Technique on Text preprocessing President 3 Periods Topic’, Jurnal TRANSFORMATIKA, vol. 20, no. 2, pp. 1–10, 2023.

M. D. Hendriyanto and B. N. Sari, ‘Penerapan Algoritma K-Nearest Neighbor Dalam Klasifikasi Judul Berita Hoax’, Jurnal Ilmiah Informatika, vol. 10, no. 02, pp. 80–84, 2022, doi: 10.33884/jif.v10i02.5477.

C. O. Varoquaux G, ‘Evaluating Machine Learning Models and Their Diagnostic Value’, Machine Learning for Brain Disorders [Internet]. New York, NY: Humana; 2023. Chapter 20., vol. Chapter 20, pp. 601–629, 2023, doi: 10.1515/9780823295258-024.

M. Grandini, E. Bagli, and G. Visani, ‘Metrics for Multi-Class Classification: an Overview’, pp. 1–17, 2020, [Online]. Available: http://arxiv.org/abs/2008.05756

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, ‘BERT: Pre-training of deep bidirectional transformers for language understanding’, NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, no. Mlm, pp. 4171–4186, 2019.

J. Pennington, R. Socher, and C. Manning, ‘GloVe: Global Vectors for Word Representation’, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}), A. Moschitti, B. Pang, and W. Daelemans, Eds., Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. doi: 10.3115/v1/D14-1162.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘Efficient estimation of word representations in vector space’, 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, pp. 1–12, 2013.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, ‘Enriching Word Vectors with Subword Information’, Trans Assoc Comput Linguist, vol. 5, pp. 135–146, 2017, doi: 10.1162/tacl_a_00051.

Downloads

Published

2025-01-30