Alternative Text Pre-Processing using Chat GPT Open AI
DOI:
https://doi.org/10.23887/janapati.v12i1.59746Keywords:
Algorithm, Chat GPT, K-Nearest Neighbour, Naïve Bayes, Text Pre-ProcessingAbstract
Text Pre-Processing is the first step in Sentiment Analysis. Categorizing a sentiment in a dataset is part of the Text-Preprocessing stage to get the optimal model accuracy value. Generative Pretrained Transformer, often known as Chat GPT, is a Machine Learning model that can automatically generate realistic and meaningful text. This study aims to examine the capability of GPT Chat as an alternative in the Text-Pre-Processing stage by utilizing GPT Chat 3 from the openai.com website in the Text-Pre-Processing stage of the collected tweet data. The data used in this research is the result of crawling Twitter by inserting the keyword "Chat GPT”. This study method was carried out by measuring performance using the K-Nearest Neighbor and Naïve Bayes Algorithms to find the best performance value and compare it with the Text-Preprocessing generated by Rapidminer. It is shown that the performance accuracy produced using the K-Nearest Neighbor Algorithm is 73.57% using the Linear Sampling method. The comparison result with the Text-Preprocessing method using Rapidminer indeed shows a better accuracy of 75.33%, which means it has a narrow difference of 1.76% with the Chat GPT Text Pre-Processing method. However, both are still in the same category, which is Fair Classification. The results of this research show that Chat GPT can be an alternative in Text-Preprocessing datasets for sentiment analysis.
References
Patmawati and M. Yusuf, “Analisis Topik Modelling Terhadap Penggunaan Sosial Media Twitter oleh Pejabat Negara,” Build. Informatics, Technol. Sci., vol. 3, no. 3, pp. 122–129, 2021, doi: 10.47065/bits.v3i3.1012.
Junadhi, Agustin, M. Rifqi, and M. K. Anam, “Sentiment Analysis Of Online Lectures Using K-Nearest Neighbors Based On Feature Selection,” Janapati, vol. 11, no. 3, pp. 216–225, 2022.
O. P. Zusrotun, A. C. Murti, and R. Fiati, “Sentimen Analisis Belajar Online Di Twitter Menggunakan Naïve Bayes,” JANAPATI, vol. 11, no. 3, pp. 310–320, 2022.
S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” J. Media Inform. Budidarma, vol. 5, no. 2, pp. 406–414, 2021, doi: 10.30865/mib.v5i2.2835.
I. T. Julianto, D. Kurniadi, M. R. Nashrulloh, and A. Mulyani, “Twitter Social Media Sentiment Analysis Against Bitcoin Cryptocurrency Trends Using Rapidminer,” J. Tek. Inform., vol. 3, no. 5, pp. 1183–1187, 2022.
I. T. Julianto, “Analisis Sentimen Terhadap Sistem Informasi Akademik Institut Teknologi Garut,” J. Algoritm., vol. 19, no. 1, pp. 449–456, 2022, doi: 10.33364/algoritma/v.19-1.1112.
M. Murali, B. Duraisamy, and J. Vankara, “Measurement : Sensors Independent component support vector regressive deep learning for sentiment classification,” Meas. Sensors, vol. 26, no. December 2022, pp. 1–8, 2023, doi: 10.1016/j.measen.2023.100678.
J. Sangeetha and U. Kumaran, “A hybrid optimization algorithm using BiLSTM structure for sentiment analysis,” Meas. Sensors, vol. 25, no. December 2022, pp. 1–7, 2023, doi: 10.1016/j.measen.2022.100619.
M. Dowling and B. Lucey, “ChatGPT for (Finance) research: The Bananarama Conjecture,” Financ. Res. Lett., no. 103662, pp. 1–20, 2023, doi: 10.1016/j.frl.2023.103662.
OpenAI, “ChatGPT: Optimizing Language Models for Dialogue,” openai.com, 2022. https://openai.com/blog/chatgpt/.
S. Demir and B. Topcu, “Graph-based Turkish text normalization and its impact on noisy text processing,” Eng. Sci. Technol. an Int. J., vol. 35, pp. 1–13, 2022, doi: 10.1016/j.jestch.2022.101192.
M. O. Hegazi, Y. Al-Dossari, A. Al-Yahy, A. Al-Sumari, and A. Hilal, “Preprocessing Arabic text on social media,” Heliyon, vol. 7, no. 2, pp. 1–15, 2021, doi: 10.1016/j.heliyon.2021.e06191.
A. E. Budiman and A. Widjaja, “Analisis Pengaruh Teks Preprocessing Terhadap Deteksi Plagiarisme Pada Dokumen Tugas Akhir,” J. Tek. Inform. dan Sist. Inf., vol. 6, no. 3, pp. 475–488, 2020, doi: 10.28932/jutisi.v6i3.2892.
S. Sugriyono and M. U. Siregar, “Preprocessing kNN algorithm classification using K-means and distance matrix with students’ academic performance dataset,” J. Teknol. dan Sist. Komput., vol. 8, no. 4, pp. 311–316, 2020, doi: 10.14710/jtsiskom.2020.13874.
V. V. Nhlabano and P. E. N. Lutu, “Impact of Text Pre-processing on the Performance of Sentiment Analysis Models for Social Media Data,” 2018 Int. Conf. Adv. Big Data, Comput. Data Commun. Syst., pp. 1–6, 2018.
L. G. Irham, A. Adiwijaya, and U. N. Wisesty, “Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine,” J. Media Inform. Budidarma, vol. 3, no. 4, p. 284, 2019, doi: 10.30865/mib.v3i4.1410.
F. Syah, H. Fajrin, A. N. Afif, M. R. Saeputra, D. Mirranty, and D. D. Saputra, “Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization,” urnal JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 54–58, 2023.
A. H. Anshor and A. Safuwan, “Analisis Sentimen Opini Warganet Twitter Terhadap Tes Screening Genose Pendeteksi Virus Covid-19 Menggunakan Metode Naïve Bayes Berbasis Particle Swarm Optimization,” JINTEKS (Jurnal Inform. Teknol. dan Sains), vol. 5, no. 1, pp. 170–178, 2023.
A. P. Nardilasari, A. L. Hananto, S. S. Hilabi, and B. Priyatna, “Analisis Sentimen Calon Presiden 2024 Menggunakan Algoritma SVM,” JOINTECS (Journal Inf. Technol. Comput. Sci., vol. 7, no. 1, pp. 11–18, 2022.
B. Kurniawan Rachmat, A. Suwarisman, I. Afriyanti, A. Wahyudi, and D. D. Saputra, “Analisis Sentimen Complain dan Bukan Complain pada Twitter Telkomsel dengan SMOTE dan Naïve Bayes,” J. Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 107–113, 2023, [Online]. Available: https://doi.org/10.35870/jti.
M. Fahmi, Y. Yuningsih, and A. Puspita, “Sentiment Analysis Of Online Gojek Transportation Services On Twitter Using The Naïve Bayes Method,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 8, no. 2, pp. 84–90, 2023, doi: 10.33480/jitk.v8i2.4004.
M. R. Qisthiano, I. Ruswita, and P. Armilia, “Implementasi Metode SVM dalam Analisis Sentimen Mengenai Vaksin dengan Menggunakan Python 3,” J. Ilm. Sist. Inf., vol. 13, no. 1, pp. 1–7, 2023.
D. Setiyawati and N. Cahyono, “Analisa Sentimen Pengguna Sosial Media Twitter Terhadap Perokok di Indonesia,” Indones. J. Comput. Sci., vol. 12, no. 1, pp. 262–272, 2023.
Alfandi Safira and F. N. Hasan, “Analisis Sentimen Masyarakat Terhadap Paylater Menggunakan Metode Naive Bayes Classifier,” Zo. J. Sist. Inf., vol. 5, no. 1, pp. 59–70, 2023, doi: 10.31849/zn.v5i1.12856.
M. T. Anwar, D. Riandhita, A. Permana, P. Sistem, I. Industri, and J. Pusat, “Analisis Sentimen Masyarakat Indonesia Terhadap Produk Kendaraan Listrik Menggunakan VADER,” J. Tek. Inform. dan Sist. Inf., vol. 10, no. 1, pp. 783–792, 2023.
I. P. Rahayu, A. Fauzi, and J. Indra, “Analisis Sentimen Terhadap Program Kampus Merdeka Menggunakan Naive Bayes Dan Support Vector Machine,” J. Sist. Komput. dan Inform. Hal 296−, vol. 301, no. 2, pp. 25–38, 2022.
S. R. Cholil, T. Handayani, R. Prathivi, and T. Ardianita, “Implementasi Algoritma Klasifikasi K-Nearest Neighbor (KNN) Untuk Klasifikasi Seleksi Penerima Beasiswa,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 6, no. 2, pp. 118–127, 2021.
H. Andriana, S. S. Hilabi, and A. Hananto, “Penerapan Metode K-Nearest Neighbor pada Sentimen Analisis Pengguna Twitter Terhadap KTT G20 di Indonesia,” JURIKOM (Jurnal Ris. Komputer), vol. 10, no. 1, pp. 60–67, 2023, doi: 10.30865/jurikom.v10i1.5427.
A. Pebdika, R. Herdiana, and D. Solihudin, “Klasifikasi Menggunakan Metode Naive Bayes Untuk Menentukan Calon Penerima PIP,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 1, pp. 452–458, 2023.
M. K. Insan, U. Hayati, and O. Nurdiawan, “Analisis Sentimen Aplikasi Brimo Pada Ulasan Pengguna Di Google Play Menggunakan Algoritma Naive Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 1, pp. 478–483, 2023.
M. Dennis, F. Zoromi, and M. K. Anam, “Penerapan Algoritma Naïve Bayes Untuk Pengelompokkan Predikat Peserta Uji Kemahiran Berbahasa Indonesia,” J. Media Inform. Budidarma, vol. 6, no. 2, pp. 1183–1190, 2022, doi: 10.30865/mib.v6i2.3956.
I. T. Julianto, D. Kurniadi, M. R. Nashrulloh, and A. Mulyani, “Comparison Of Classification Algorithm And Feature Selection in Bitcoin Sentiment Analysis,” JUTIF, vol. 3, no. 3, pp. 739–744, 2022.
D. S. Utami and A. Erfina, “Analisis Sentimen Pinjaman Online di Twitter Menggunakan Algoritma Support Vector Machine (SVM),” SISMATIK (Seminar Nas. Sist. Inf. dan Manaj. Inform., vol. 1, no. 1, pp. 299–305, 2021.
C. D. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval (2nd edition). Cambridge: Cambridge University Press, 2009.
Han and Kamber, Data Mining Concepts and Technique. San Francisco: Diane Cerra, 2006.
I. H. Witten, E. Frank, and M. A. Hall, Data Mining Practical Machine Learning Tools and Technique. San Francisco: Morgan Kaufmann, 2011.
L. K. Harsono, Y. Alkhalifi, Nurajijah, and W. Gata, “Analisis Sentimen Stakeholder atas Layanan haiDJPb pada Media Sosial Twitter Dengan Menggunakan Metode Support Vector Machine dan Naïve Bayes,” J. Ilmu-ilmu Inform. dan Manaj., vol. 14, no. 1, pp. 36–44, 2020.
A. Ahmad and W. Gata, “Sentimen Analisis Masyarakat Indonesia di Twitter Terkait Metaverse dengan Algoritma Support Vector Machine,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 6, no. 4, pp. 548–555, 2022, doi: 10.35870/jtik.v6i4.569.
G. Feng, M. Fan, and Y. Chen, “Analysis and Prediction of Students’ Academic Performance Based on Educational Data Mining,” IEEE Access, vol. 10, pp. 19558–19571, 2022, doi: 10.1109/ACCESS.2022.3151652.
A. Y. Pratama, Y. Umaidah, and A. Voutama, “Analisis Sentimen Media Sosial Twitter Dengan Algoritma K-Nearest Neighbor dan Seleksi Fitur Chi-Square (Kasus Omnibus Law Cipta Kerja),” Sains Komput. Inform., vol. 5, no. 2, pp. 897–910, 2021, [Online]. Available: https://tunasbangsa.ac.id/ejurnal/index.php/jsakti/article/view/386/365.
K. Ayuningsih, Y. A. Sari, and P. P. Adikara, “Klasifikasi Citra Makanan Menggunakan HSV Color Moment dan Local Binary Pattern dengan Naïve Bayes Classifier,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 3, no. 4, pp. 3166–3173, 2019.
Yunitasari, H. S. Hopipah, and R. Mayasari, “Optimasi Backward Elimination untuk Klasifikasi Kepuasan Pelanggan Menggunakan Algoritme k-nearest neighbor (k-NN) and Naive Bayes,” Technomedia J., vol. 6, no. 1, pp. 99–110, 2021, doi: 10.33050/tmj.v6i1.1531.
D. Nurlaela, “Penerapan Adaboost untuk Meningkatkan Akurasi Naive Bayes Pada Prediksi Pendapatan Penjualan Film,” Inti Nusa Mandiri, vol. 14, no. 2, pp. 181–188, 2020.
R. Parlika, S. I. Pradika, A. M. Hakim, and K. R. N. M, “Analisis Sentimen Twitter Terhadap Bitcoin dan Cryptocurrency Berbasis Python TextBlob,” J. Ilm. Teknol. Inf. dan Robot., vol. 2, no. 2, pp. 33–37, 2020.
D. Kurniadi, E. Abdurachman, H. L. H. S. Warnars, and W. Suparta, “Predicting student performance with multi-level representation in an intelligent academic recommender system using backpropagation neural network,” ICIC Express Lett. Part B Appl., vol. 12, no. 10, pp. 883–890, 2021, doi: 10.24507/icicelb.12.10.883.
Y. Asri, W. N. Suliyanti, D. Kuswardani, and M. Fajri, “Pelabelan Otomatis Lexicon Vader dan Klasifikasi Naive Bayes dalam menganalisis sentimen data ulasan PLN Mobile,” PETIR J. Pengkaj. dan Penerapan Tek. Inform., vol. 15, no. 2, pp. 264–275, 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Indri Tri julianto, Dede Kurniadi, Yosep Septiana, Ade Sutedi
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with Janapati agree to the following terms:- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. (See The Effect of Open Access)