Implementasi Text-Mining untuk Analisis Sentimen pada Twitter dengan Algoritma Support Vector Machine

Authors

  • Aditiya Hermawan Universitas Buddhi Dharma
  • Indrico Jowensen Universitas Buddhi Dharma
  • Junaedi Junaedi Universitas Buddhi Dharma
  • Edy Universitas Buddhi Dharma

DOI:

https://doi.org/10.23887/jstundiksha.v12i1.52358

Keywords:

Sentiment Analysis, Support Vector Machine, Twitter

Abstract

Setiap tahun, jumlah orang yang menggunakan media sosial bertambah seiring dengan jumlah orang yang menggunakan internet. Peningkatan tersebut diiringi dengan meningkatnya informasi pada internet yang tentunya informasi tersebut mempunyai nilai jika dilakukan analisa. Untuk menganalisa data dalam jumlah besar dapat menggunakan teknik text mining. Text mining mampu memproses untuk memperoleh informasi berkualitas tinggi dari teks. Text mining juga dapat digunakan untuk menganalisa informasi seperti sentimen dari sebuah kalimat dengan sangat cepat untuk memudahkan dalam mendapatkan informasi yang berkualitas. Informasi diproses berasal dari media sosial berbasis text yaitu twitter yang mana pengambilan data dilakukan dengan bantuan Application Programming Interface dan menggunakan kata kunci berupa sebuah kata atau hashtag. Kalimat tersebut akan dilakukan proses text mining dengan menggunakan algoritma Support Vector machine untuk menghasilkan klasifikasi dari sentimen suatu kalimat ke dalam sentiment positif, netral atau negatif. Tingkat akurasi yang dihasilkan oleh proses ini adalah sebesar 73% berdasarkan data sentimen yang dimiliki. Tingkat akurasi dalam melakukan text mining sangat dipengarui pada proses Pre-Processing karena terdapat banyak kata perlu dilakukan pengelolahan lebih lanjut.

References

Ahuja, R., Rastogi, H., Choudhuri, A., & Garg, B. (2015). Stock market forecast using sentiment analysis. 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), 1008–1010.

Altawaier, M. M., & Tiun, S. (2016). Comparison of machine learning approaches on Arabic twitter sentiment analysis. International Journal on Advanced Science, Engineering and Information Technology, 6(6), 1067–1073. https://doi.org/10.18517/ijaseit.6.6.1456.

Batista, F., & Ribeiro, R. (2013). Sentiment analysis and topic classification based on binary maximum entropy classifiers. Procesamiento Del Lenguaje Natural, 50, 77–84.

Benchimol, J., Kazinnik, S., & Saadon, Y. (2020). Communication and transparency through central bank texts. 132nd Annual Meeting of the American Economic Association.

Benchimol, J., Kazinnik, S., & Saadon, Y. (2022). Text mining methodologies with R: An application to central bank texts. Machine Learning with Applications, 8(March 2021), 100286. https://doi.org/10.1016/j.mlwa.2022.100286.

Chen, D., Wang, L., & Li, L. (2015). Position computation models for high-speed train based on support vector machine approach. Applied Soft Computing, 30, 758–766. https://doi.org/https://doi.org/10.1016/j.asoc.2015.01.017.

Fauzi, M. A. (2018). Random forest approach fo sentiment analysis in Indonesian language. Indonesian Journal of Electrical Engineering and Computer Science, 12(1), 46–50. https://doi.org/10.11591/ijeecs.v12.i1.pp46-50.

Guenther, N., & Schonlau, M. (2016). Support Vector Machines. The Stata Journal: Promoting Communications on Statistics and Stata, 16(4), 917–937. https://doi.org/10.1177/1536867X1601600407.

Kartiwi, M., Gunawan, T. S., Arundina, T., & Omar, M. A. (2018). Feature Selection for Financial Data Classification: Islamic Finance Application. 2018 IEEE 5th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA), 1–4. https://doi.org/10.1109/ICSIMA.2018.8688803.

Kashina, M., Lenivtceva, I. D., & Kopanitsa, G. D. (2020). Preprocessing of unstructured medical data: The impact of each preprocessing stage on classification. Procedia Computer Science, 178(2019), 284–290. https://doi.org/10.1016/j.procs.2020.11.030.

Kemp, S. (2019). DIGITAL 2019: GLOBAL DIGITAL OVERVIEW. https://datareportal.com/reports/digital-2019-global-digital-overview.

Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information (Switzerland), 10(4). https://doi.org/10.3390/info10040150.

Kremer, J., Steenstrup Pedersen, K., & Igel, C. (2014). Active learning with support vector machines. WIREs Data Mining and Knowledge Discovery, 4(4), 313–326. https://doi.org/https://doi.org/10.1002/widm.1132.

Leelawat, N., Jariyapongpaiboon, S., Promjun, A., Boonyarak, S., Saengtabtim, K., Laosunthara, A., Yudha, A. K., & Tang, J. (2022). Twitter data sentiment analysis of tourism in Thailand during the COVID-19 pandemic using machine learning. Heliyon, 8(10), e10894. https://doi.org/10.1016/j.heliyon.2022.e10894.

Liu, B., Hu, M., & Cheng, J. (2005). Opinion Observer: Analyzing and Comparing Opinions on the Web. Proceedings of the 14th International Conference on World Wide Web, 342–351. http://dl.acm.org/citation.cfm?id=1060797.

Masdevid. (2021). Kata Positif dan Negatif. https://github.com/masdevid/US-OpinionWords.

Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011.

Nota, G., Postiglione, A., & Carvello, R. (2022). Text mining techniques for the management of predictive maintenance. Procedia Computer Science, 200, 778–792. https://doi.org/10.1016/j.procs.2022.01.276.

Passi, K., & Motisariya, J. (2022). Twitter Sentiment Analysis of the 2019 Indian Election. In IOT with Smart Systems (pp. 805–814). Springer. https://doi.org/10.1007/978-981-16-3945-6_79.

Pilar, G. D., Isabel, S. B., Diego, P. M., & José Luis, G. Á. (2022). A novel flexible feature extraction algorithm for Spanish tweet sentiment analysis based on the context of words. Expert Systems with Applications, 212(September 2022). https://doi.org/10.1016/j.eswa.2022.118817.

Pintas, J. T., Fernandes, L. A. F., & Garcia, A. C. B. (2021). Feature selection methods for text classification: a systematic literature review. Artificial Intelligence Review, 54(8), 6149–6200. https://doi.org/10.1007/s10462-021-09970-6.

Pratama, R. P., & Tjahyanto, A. (2021). The influence of fake accounts on sentiment analysis related to COVID-19 in Indonesia. Procedia Computer Science, 197(2021), 143–150. https://doi.org/10.1016/j.procs.2021.12.128.

Qian, Y., Zhou, W., Yan, J., Li, W., & Han, L. (2015). Comparing machine learning classifiers for object-based land cover classification using very high resolution imagery. Remote Sensing, 7(1), 153–168. https://doi.org/10.3390/rs70100153.

Rashid, A., & Shoaib, U. (2016). Knowledge Discovery in Database using intention mining. Sci.Int.(Lahore), 28(6), 5145–5151.

Rathika, J., & Soranamageswari, M. (2022). Intensified Gray Wolf Optimization-based Extreme Learning Machine for Sentiment Analysis in Big Data. In P. S. R. Chowdary, J. Anguera, S. C. Satapathy, & V. Bhateja (Eds.), Evolution in Signal Processing and Telecommunication Networks (pp. 103–114). Springer Singapore.

Riesener, M., Kuhn, M., Lauf, H., Manoharan, S., & Schuh, G. (2022). Concept for the identification of product innovation potentials by the application of text mining. Procedia CIRP, 109(June), 281–286. https://doi.org/10.1016/j.procir.2022.05.250.

Robbani, H. A. (2016). Sastrawi 1.0.1. Https://Pypi.Org/Project/Sastrawi/. https://pypi.org/project/Sastrawi/.

Saputra, P. S. (2021). Perbandingan Algoritma Fuzzy C-Means Dan Algoritma Naive Bayes Dalam Menentukan Keluarga Penerima Manfaat (Kpm) Berdasarkan Status Sosial Ekonomi (Sse) Terendah. JST (Jurnal Sains Dan Teknologi), 10(1), 1–8. https://doi.org/10.23887/jstundiksha.v10i1.23340.

Starosta, K. (2022). Sentiment Analysis as a New Source of Information. In Measuring the Impact of Online Media on Consumers, Businesses and Society (pp. 33–48). Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-36729-9_4.

Tandel, S. S., Jamadar, A., & Dudugu, S. (2019). A Survey on Text Mining Techniques. 2019 5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019, March, 1022–1026. https://doi.org/10.1109/ICACCS.2019.8728547.

Villavicencio, C., Macrohon, J. J., Inbaraj, X. A., Jeng, J. H., & Hsieh, J. G. (2021). Twitter sentiment analysis towards covid-19 vaccines in the Philippines using naïve bayes. Information (Switzerland), 12(5). https://doi.org/10.3390/info12050204.

Wenda, A. (2022). Support Vector Machine Untuk Pengenalan Bentuk Manusia Menggunakan Kumpulan Fitur Yang Dioptimalkan. JST (Jurnal Sains Dan Teknologi), 11(1), 77–84. https://doi.org/10.23887/jstundiksha.v11i1.44437.

Xue, L., Wang, H., Wang, F., & Ma, H. (2021). Sentiment Analysis of Stock Market Investors and Its Correlation with Stock Price Using Maximum Entropy. In R. Lee (Ed.), Computer and Information Science 2021---Summer (pp. 29–44). Springer International Publishing. https://doi.org/10.1007/978-3-030-79474-3_3.

Zulfa, I., & Winarko, E. (2017). Sentimen Analisis Tweet Berbahasa Indonesia Dengan Deep Belief Network. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 11(2), 187. https://doi.org/10.22146/ijccs.24716.

Downloads

Published

2023-04-18

Issue

Section

Articles