KLASIFIKASI JUDUL BERITA BAHASA INDONESIA MENGGUNAKAN SUPPORT VECTOR MACHINE DAN SELEKSI FITUR MUTUAL INFORMATION
DOI:
https://doi.org/10.23887/jptkundiksha.v22i1.89158Keywords:
klasifikasi teks, judul berita, support vector machine, mutual informationAbstract
Current information and communication technology has changed the way information is shared, affecting the way people get and deliver news. The number of digital news that continues to increase every day by several news portals poses a challenge, where news is often related to more than one category. From the existing problems, a study was conducted on the classification of online news titles. This study uses the SVM method with mutual information feature selection to classify online news titles. The dataset used is the news title from detik.com using 6 categories, namely finance, travel, health, auto, food, and sport with the number of data per category being 2000 data. The classification process starts from text preprocessing, term weighting using TF-IDF, then feature selection with mutual information, and finally classification with SVM. The results of the study showed that testing various SVM kernels and mutual information (MI) thresholds with a threshold of 85% provided the highest level of F1-score on the SVM machine with the RBF kernel and a C value = 10, which was 86,15%.
References
F. A. Ramadhan, S. H. Sitorus, and T. Rismawan, ‘Penerapan Metode Multinomial Naïve Bayes untuk Klasifikasi Judul Berita Clickbait dengan Term Frequency - Inverse Document Frequency’, Jurnal Sistem dan Teknologi Informasi (JustIN), vol. 11, no. 1, p. 70, 2023, doi: 10.26418/justin.v11i1.57452.
A. Alfando and R. Hayami, ‘Klasifikasi Teks Berita Berbahasa Indonesia Menggunakan Machine Learning Dan Deep Learning: Studi Literatur’, JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 681–686, 2023, doi: 10.36040/jati.v7i1.6486.
B. H. Mahendra, Adiwijaya, and U. N. Wisesty, ‘Kategorisasi Berita Multi-Label Berbahasa Indonesia Menggunakan Algoritma Random Forest’, e-Proceeding of Engineering, vol. 6, no. 2, pp. 9030–9041, 2019.
R. Yuranda, T. Sutabri, and D. Wahyuningsih, ‘Pendekatan Macine Learning dalam Evaluasi Label Berita Berdasarkan Judul : Studi Kasus Media Online’, vol. 12, pp. 434–439, 2023.
P. Rama, B. Putra, and R. S. Perdana, ‘Klasifikasi Judul Berita Online menggunakan Metode Support Vector Machine ( SVM ) dengan Seleksi Fitur Chi-square’, vol. 7, no. 5, pp. 2132–2141, 2023.
L. G. Irham, A. Adiwijaya, and U. N. Wisesty, ‘Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine’, Jurnal Media Informatika Budidarma, vol. 3, no. 4, p. 284, 2019, doi: 10.30865/mib.v3i4.1410.
N. Ajijah, A. Kurniawan, and Susilawati, ‘Klasifikasi Teks Mining Terhadap Analisa Isu Kegiatan Tenaga Lapangan Menggunakan Algoritma K-Nearest Neighbor (KNN)’, Jurnal Sains Komputer & Informatika (J-SAKTI, vol. 7, no. 1, pp. 254–262, 2023.
M. Sholih ’afif, M. Muzakir, M. I. Al, and G. Al Awalaien, ‘Text Mining Untuk Mengklasifikasi Judul Berita Online Studi Kasus Radar Banjarmasin Menggunakan Metode Naïve Bayes’, Kumpulan jurnaL Ilmu Komputer (KLIK), vol. 08, no. 2, pp. 199–208, 2021.
S. Waljinah, H. J. Prayitno, E. Purnomo, A. Rufiah, and E. W. Kustanti, ‘Tindak Tutur Direktif Wacana Berita Online: Kajian Media Pembelajaran Berbasis Teknologi Digital’, SeBaSa, vol. 2, no. 2, p. 118, 2019, doi: 10.29408/sbs.v2i2.1590.
W. Afandi, S. N. Saputro, A. M. Kusumaningrum, H. Adriansyah, M. H. Kafabi, and S. Sudianto, ‘Klasifikasi Judul Berita Clickbait menggunakan RNN-LSTM’, Jurnal Informatika: Jurnal Pengembangan IT, vol. 7, no. 2, pp. 85–89, 2022, doi: 10.30591/jpit.v7i2.3401.
M. U. Albab, Y. Karuniawati, and M. N. Fawaiq, ‘Optimization of the Stemming Technique on Text preprocessing President 3 Periods Topic’, Jurnal TRANSFORMATIKA, vol. 20, no. 2, pp. 1–10, 2023.
M. D. Hendriyanto and B. N. Sari, ‘Penerapan Algoritma K-Nearest Neighbor Dalam Klasifikasi Judul Berita Hoax’, Jurnal Ilmiah Informatika, vol. 10, no. 02, pp. 80–84, 2022, doi: 10.33884/jif.v10i02.5477.
C. O. Varoquaux G, ‘Evaluating Machine Learning Models and Their Diagnostic Value’, Machine Learning for Brain Disorders [Internet]. New York, NY: Humana; 2023. Chapter 20., vol. Chapter 20, pp. 601–629, 2023, doi: 10.1515/9780823295258-024.
M. Grandini, E. Bagli, and G. Visani, ‘Metrics for Multi-Class Classification: an Overview’, pp. 1–17, 2020, [Online]. Available: http://arxiv.org/abs/2008.05756
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, ‘BERT: Pre-training of deep bidirectional transformers for language understanding’, NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, no. Mlm, pp. 4171–4186, 2019.
J. Pennington, R. Socher, and C. Manning, ‘GloVe: Global Vectors for Word Representation’, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}), A. Moschitti, B. Pang, and W. Daelemans, Eds., Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. doi: 10.3115/v1/D14-1162.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘Efficient estimation of word representations in vector space’, 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, pp. 1–12, 2013.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, ‘Enriching Word Vectors with Subword Information’, Trans Assoc Comput Linguist, vol. 5, pp. 135–146, 2017, doi: 10.1162/tacl_a_00051.
Downloads
Published
Issue
Section
License
Authors who publish with the JPTK agree to the following terms:- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. (See The Effect of Open Access)