Alternative Text Pre-Processing using Chat GPT Open AI

Authors

  • Indri Tri Julianto Institut Teknologi Garut
  • Dede Kurniadi Institut Teknologi Garut
  • Yosep Septiana Institut Teknologi Garut
  • Ade Sutedi Institut Teknologi Garut

DOI:

https://doi.org/10.23887/janapati.v12i1.59746

Keywords:

Algorithm, Chat GPT, K-Nearest Neighbour, Naïve Bayes, Text Pre-Processing

Abstract

Text Pre-Processing is the first step in Sentiment Analysis. Categorizing a sentiment in a dataset is part of the Text-Preprocessing stage to get the optimal model accuracy value. Generative Pretrained Transformer, often known as Chat GPT, is a Machine Learning model that can automatically generate realistic and meaningful text. This study aims to examine the capability of GPT Chat as an alternative in the Text-Pre-Processing stage by utilizing GPT Chat 3 from the openai.com website in the Text-Pre-Processing stage of the collected tweet data. The data used in this research is the result of crawling Twitter by inserting the keyword "Chat GPT”. This study method was carried out by measuring performance using the K-Nearest Neighbor and Naïve Bayes Algorithms to find the best performance value and compare it with the Text-Preprocessing generated by Rapidminer. It is shown that the performance accuracy produced using the K-Nearest Neighbor Algorithm is 73.57% using the Linear Sampling method. The comparison result with the Text-Preprocessing method using Rapidminer indeed shows a better accuracy of 75.33%, which means it has a narrow difference of 1.76% with the Chat GPT Text Pre-Processing method. However, both are still in the same category, which is Fair Classification. The results of this research show that Chat GPT can be an alternative in Text-Preprocessing datasets for sentiment analysis.

References

Patmawati and M. Yusuf, “Analisis Topik Modelling Terhadap Penggunaan Sosial Media Twitter oleh Pejabat Negara,” Build. Informatics, Technol. Sci., vol. 3, no. 3, pp. 122–129, 2021, doi: 10.47065/bits.v3i3.1012.

Junadhi, Agustin, M. Rifqi, and M. K. Anam, “Sentiment Analysis Of Online Lectures Using K-Nearest Neighbors Based On Feature Selection,” Janapati, vol. 11, no. 3, pp. 216–225, 2022.

O. P. Zusrotun, A. C. Murti, and R. Fiati, “Sentimen Analisis Belajar Online Di Twitter Menggunakan Naïve Bayes,” JANAPATI, vol. 11, no. 3, pp. 310–320, 2022.

S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” J. Media Inform. Budidarma, vol. 5, no. 2, pp. 406–414, 2021, doi: 10.30865/mib.v5i2.2835.

I. T. Julianto, D. Kurniadi, M. R. Nashrulloh, and A. Mulyani, “Twitter Social Media Sentiment Analysis Against Bitcoin Cryptocurrency Trends Using Rapidminer,” J. Tek. Inform., vol. 3, no. 5, pp. 1183–1187, 2022.

I. T. Julianto, “Analisis Sentimen Terhadap Sistem Informasi Akademik Institut Teknologi Garut,” J. Algoritm., vol. 19, no. 1, pp. 449–456, 2022, doi: 10.33364/algoritma/v.19-1.1112.

M. Murali, B. Duraisamy, and J. Vankara, “Measurement : Sensors Independent component support vector regressive deep learning for sentiment classification,” Meas. Sensors, vol. 26, no. December 2022, pp. 1–8, 2023, doi: 10.1016/j.measen.2023.100678.

J. Sangeetha and U. Kumaran, “A hybrid optimization algorithm using BiLSTM structure for sentiment analysis,” Meas. Sensors, vol. 25, no. December 2022, pp. 1–7, 2023, doi: 10.1016/j.measen.2022.100619.

M. Dowling and B. Lucey, “ChatGPT for (Finance) research: The Bananarama Conjecture,” Financ. Res. Lett., no. 103662, pp. 1–20, 2023, doi: 10.1016/j.frl.2023.103662.

OpenAI, “ChatGPT: Optimizing Language Models for Dialogue,” openai.com, 2022. https://openai.com/blog/chatgpt/.

S. Demir and B. Topcu, “Graph-based Turkish text normalization and its impact on noisy text processing,” Eng. Sci. Technol. an Int. J., vol. 35, pp. 1–13, 2022, doi: 10.1016/j.jestch.2022.101192.

M. O. Hegazi, Y. Al-Dossari, A. Al-Yahy, A. Al-Sumari, and A. Hilal, “Preprocessing Arabic text on social media,” Heliyon, vol. 7, no. 2, pp. 1–15, 2021, doi: 10.1016/j.heliyon.2021.e06191.

A. E. Budiman and A. Widjaja, “Analisis Pengaruh Teks Preprocessing Terhadap Deteksi Plagiarisme Pada Dokumen Tugas Akhir,” J. Tek. Inform. dan Sist. Inf., vol. 6, no. 3, pp. 475–488, 2020, doi: 10.28932/jutisi.v6i3.2892.

S. Sugriyono and M. U. Siregar, “Preprocessing kNN algorithm classification using K-means and distance matrix with students’ academic performance dataset,” J. Teknol. dan Sist. Komput., vol. 8, no. 4, pp. 311–316, 2020, doi: 10.14710/jtsiskom.2020.13874.

V. V. Nhlabano and P. E. N. Lutu, “Impact of Text Pre-processing on the Performance of Sentiment Analysis Models for Social Media Data,” 2018 Int. Conf. Adv. Big Data, Comput. Data Commun. Syst., pp. 1–6, 2018.

L. G. Irham, A. Adiwijaya, and U. N. Wisesty, “Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine,” J. Media Inform. Budidarma, vol. 3, no. 4, p. 284, 2019, doi: 10.30865/mib.v3i4.1410.

F. Syah, H. Fajrin, A. N. Afif, M. R. Saeputra, D. Mirranty, and D. D. Saputra, “Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization,” urnal JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 54–58, 2023.

A. H. Anshor and A. Safuwan, “Analisis Sentimen Opini Warganet Twitter Terhadap Tes Screening Genose Pendeteksi Virus Covid-19 Menggunakan Metode Naïve Bayes Berbasis Particle Swarm Optimization,” JINTEKS (Jurnal Inform. Teknol. dan Sains), vol. 5, no. 1, pp. 170–178, 2023.

A. P. Nardilasari, A. L. Hananto, S. S. Hilabi, and B. Priyatna, “Analisis Sentimen Calon Presiden 2024 Menggunakan Algoritma SVM,” JOINTECS (Journal Inf. Technol. Comput. Sci., vol. 7, no. 1, pp. 11–18, 2022.

B. Kurniawan Rachmat, A. Suwarisman, I. Afriyanti, A. Wahyudi, and D. D. Saputra, “Analisis Sentimen Complain dan Bukan Complain pada Twitter Telkomsel dengan SMOTE dan Naïve Bayes,” J. Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 107–113, 2023, [Online]. Available: https://doi.org/10.35870/jti.

M. Fahmi, Y. Yuningsih, and A. Puspita, “Sentiment Analysis Of Online Gojek Transportation Services On Twitter Using The Naïve Bayes Method,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 8, no. 2, pp. 84–90, 2023, doi: 10.33480/jitk.v8i2.4004.

M. R. Qisthiano, I. Ruswita, and P. Armilia, “Implementasi Metode SVM dalam Analisis Sentimen Mengenai Vaksin dengan Menggunakan Python 3,” J. Ilm. Sist. Inf., vol. 13, no. 1, pp. 1–7, 2023.

D. Setiyawati and N. Cahyono, “Analisa Sentimen Pengguna Sosial Media Twitter Terhadap Perokok di Indonesia,” Indones. J. Comput. Sci., vol. 12, no. 1, pp. 262–272, 2023.

Alfandi Safira and F. N. Hasan, “Analisis Sentimen Masyarakat Terhadap Paylater Menggunakan Metode Naive Bayes Classifier,” Zo. J. Sist. Inf., vol. 5, no. 1, pp. 59–70, 2023, doi: 10.31849/zn.v5i1.12856.

M. T. Anwar, D. Riandhita, A. Permana, P. Sistem, I. Industri, and J. Pusat, “Analisis Sentimen Masyarakat Indonesia Terhadap Produk Kendaraan Listrik Menggunakan VADER,” J. Tek. Inform. dan Sist. Inf., vol. 10, no. 1, pp. 783–792, 2023.

I. P. Rahayu, A. Fauzi, and J. Indra, “Analisis Sentimen Terhadap Program Kampus Merdeka Menggunakan Naive Bayes Dan Support Vector Machine,” J. Sist. Komput. dan Inform. Hal 296−, vol. 301, no. 2, pp. 25–38, 2022.

S. R. Cholil, T. Handayani, R. Prathivi, and T. Ardianita, “Implementasi Algoritma Klasifikasi K-Nearest Neighbor (KNN) Untuk Klasifikasi Seleksi Penerima Beasiswa,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 6, no. 2, pp. 118–127, 2021.

H. Andriana, S. S. Hilabi, and A. Hananto, “Penerapan Metode K-Nearest Neighbor pada Sentimen Analisis Pengguna Twitter Terhadap KTT G20 di Indonesia,” JURIKOM (Jurnal Ris. Komputer), vol. 10, no. 1, pp. 60–67, 2023, doi: 10.30865/jurikom.v10i1.5427.

A. Pebdika, R. Herdiana, and D. Solihudin, “Klasifikasi Menggunakan Metode Naive Bayes Untuk Menentukan Calon Penerima PIP,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 1, pp. 452–458, 2023.

M. K. Insan, U. Hayati, and O. Nurdiawan, “Analisis Sentimen Aplikasi Brimo Pada Ulasan Pengguna Di Google Play Menggunakan Algoritma Naive Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 1, pp. 478–483, 2023.

M. Dennis, F. Zoromi, and M. K. Anam, “Penerapan Algoritma Naïve Bayes Untuk Pengelompokkan Predikat Peserta Uji Kemahiran Berbahasa Indonesia,” J. Media Inform. Budidarma, vol. 6, no. 2, pp. 1183–1190, 2022, doi: 10.30865/mib.v6i2.3956.

I. T. Julianto, D. Kurniadi, M. R. Nashrulloh, and A. Mulyani, “Comparison Of Classification Algorithm And Feature Selection in Bitcoin Sentiment Analysis,” JUTIF, vol. 3, no. 3, pp. 739–744, 2022.

D. S. Utami and A. Erfina, “Analisis Sentimen Pinjaman Online di Twitter Menggunakan Algoritma Support Vector Machine (SVM),” SISMATIK (Seminar Nas. Sist. Inf. dan Manaj. Inform., vol. 1, no. 1, pp. 299–305, 2021.

C. D. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval (2nd edition). Cambridge: Cambridge University Press, 2009.

Han and Kamber, Data Mining Concepts and Technique. San Francisco: Diane Cerra, 2006.

I. H. Witten, E. Frank, and M. A. Hall, Data Mining Practical Machine Learning Tools and Technique. San Francisco: Morgan Kaufmann, 2011.

L. K. Harsono, Y. Alkhalifi, Nurajijah, and W. Gata, “Analisis Sentimen Stakeholder atas Layanan haiDJPb pada Media Sosial Twitter Dengan Menggunakan Metode Support Vector Machine dan Naïve Bayes,” J. Ilmu-ilmu Inform. dan Manaj., vol. 14, no. 1, pp. 36–44, 2020.

A. Ahmad and W. Gata, “Sentimen Analisis Masyarakat Indonesia di Twitter Terkait Metaverse dengan Algoritma Support Vector Machine,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 6, no. 4, pp. 548–555, 2022, doi: 10.35870/jtik.v6i4.569.

G. Feng, M. Fan, and Y. Chen, “Analysis and Prediction of Students’ Academic Performance Based on Educational Data Mining,” IEEE Access, vol. 10, pp. 19558–19571, 2022, doi: 10.1109/ACCESS.2022.3151652.

A. Y. Pratama, Y. Umaidah, and A. Voutama, “Analisis Sentimen Media Sosial Twitter Dengan Algoritma K-Nearest Neighbor dan Seleksi Fitur Chi-Square (Kasus Omnibus Law Cipta Kerja),” Sains Komput. Inform., vol. 5, no. 2, pp. 897–910, 2021, [Online]. Available: https://tunasbangsa.ac.id/ejurnal/index.php/jsakti/article/view/386/365.

K. Ayuningsih, Y. A. Sari, and P. P. Adikara, “Klasifikasi Citra Makanan Menggunakan HSV Color Moment dan Local Binary Pattern dengan Naïve Bayes Classifier,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 3, no. 4, pp. 3166–3173, 2019.

Yunitasari, H. S. Hopipah, and R. Mayasari, “Optimasi Backward Elimination untuk Klasifikasi Kepuasan Pelanggan Menggunakan Algoritme k-nearest neighbor (k-NN) and Naive Bayes,” Technomedia J., vol. 6, no. 1, pp. 99–110, 2021, doi: 10.33050/tmj.v6i1.1531.

D. Nurlaela, “Penerapan Adaboost untuk Meningkatkan Akurasi Naive Bayes Pada Prediksi Pendapatan Penjualan Film,” Inti Nusa Mandiri, vol. 14, no. 2, pp. 181–188, 2020.

R. Parlika, S. I. Pradika, A. M. Hakim, and K. R. N. M, “Analisis Sentimen Twitter Terhadap Bitcoin dan Cryptocurrency Berbasis Python TextBlob,” J. Ilm. Teknol. Inf. dan Robot., vol. 2, no. 2, pp. 33–37, 2020.

D. Kurniadi, E. Abdurachman, H. L. H. S. Warnars, and W. Suparta, “Predicting student performance with multi-level representation in an intelligent academic recommender system using backpropagation neural network,” ICIC Express Lett. Part B Appl., vol. 12, no. 10, pp. 883–890, 2021, doi: 10.24507/icicelb.12.10.883.

Y. Asri, W. N. Suliyanti, D. Kuswardani, and M. Fajri, “Pelabelan Otomatis Lexicon Vader dan Klasifikasi Naive Bayes dalam menganalisis sentimen data ulasan PLN Mobile,” PETIR J. Pengkaj. dan Penerapan Tek. Inform., vol. 15, no. 2, pp. 264–275, 2022.

Downloads

Published

2023-03-31

How to Cite

Tri Julianto, I., Kurniadi, D. ., Septiana, Y., & Sutedi, A. (2023). Alternative Text Pre-Processing using Chat GPT Open AI . Jurnal Nasional Pendidikan Teknik Informatika : JANAPATI, 12(1), 67–77. https://doi.org/10.23887/janapati.v12i1.59746

Issue

Section

Articles