EFFECT OF WORD2VEC WEIGHTING WITH CNN-BILSTM MODEL ON EMOTION CLASSIFICATION

Emotion is an element that can influence human behavior, which in turn influences a decision. Human emotion detection is useful in many areas, including the social environment and product quality. To evaluate and categorize emotions derived from text, a method is required. As a result, the CNN-BiLSTM model, a classification method, aids in the analysis of the text's emotional content. A word weighting technique employing word2vec as a word weighting will help the model. The CNN-BiLSTM model with Word2vec as a pre-trained model is being used in this study to find the findings with the highest accuracy. The information is split into two groups: training and testing, and it is categorized into six categories according to how each emotion manifests itself: surprise, sadness, rage, fear, love, and joy. The best outcome from the CNN-BiLSTM model's accuracy of emotion classification is 92.85%.


INTRODUCTION
Emotion is a factor that can influence human behavior, which can impact rational tasks like making a decision or interacting with others. [1]. Emotion detection contributes to a variety of aspects that will be useful in a variety of decisionmaking fields, including the social environment and business [2]. Emotion recognition via text classification necessitates the use of a method that can classify the emotion of the text. Emotion analysis, which is based on computational studies of natural language expressions, seeks to identify emotions such as anger, fear, joy, sadness, and surprise [3].
Techniques for word representation in vectors are a very interesting research topic in the field of text processing or natural language processing because this topic is constantly evolving. The way word weights are represented is important because it can affect the accuracy of model learning results [4]. However, due to the unstructured nature of text, one of the difficulties in text processing is feature extraction. One of the steps in text classification is feature extraction, which converts unstructured data into structured data using a term-frequency matrix so that it can be processed using feature extraction techniques [5], [6].
The "bag of words" extraction feature, which consists of term frequency (TF), term frequency-inverse document frequency (TF-IDF), and then n-grams, was originally used for textual data. Bag of Words, on the other hand, has the disadvantage of being unable to provide information about the meaning conveyed by the text, its structure, order, and context surrounding the words in each document. When a countbased model, such as BOW, is used, each word is counted individually, and the semantic relationships between words are not captured. If the vocabulary in the corpus is very large and the number of words in the document is very small or even non-existent, BOW can lead to overfitting because the representation of the word vector will be zero [4]. Following that, the word embedding technique was developed. Word embedding is a popular method for learning a continuous dimension by representing a vector of words, as well as a fascinating topic that is still being researched. Word embedding converts each word into a vector, with the vector representing each word in vector space.
Word2vec, a new word embedding method developed in 2013, consists of two models, skip-gram and continuous bag of words (CBOW) [7]. Word2vec, one of Mikolov's word weights, has been widely utilized for the pretrained process because it captures the semantic sense of text by expressing text vectors that have patterns in each word that has the same meaning [8], [9]. Several studies using the word2vec method, such as [10], show that word2vec performs well with 19,997 news articles divided into 20 topic groups derived from UCI KDD data. According to this study, the word2vec method produces good results [11]- [14]. [15] achieved an F1 value of 79% using emotion data with eight emotion labels: anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. Using emotion data with labels of sadness, anger, fear, love, joy, and surprise resulted in an accuracy of 93.50%, according to [16]. With neuron trials, activation functions, and epochs, the model employs LSTM with TF-IDF (Term Frequency-Invers Document Frequency) word weighting. The accuracy of the test was 89% when compared towards the linear SVC model with the same weighting. Deep learning methods based on neural networks, such as CNN and LSTM, are currently gaining popularity due to their impressive results [17], [18]. CNNs with covolution operations can extract local features (structural patterns in images) from global information (all image data), but they are not always successful in capturing long-distance dependencies [19]. The vanishing gradient is a weakness of RNN LSTM is a component of RNN that can solve the problem of long-term dependency but cannot use text information. Despite the fact that LSTM has a close relationship with text, standard LSTM cannot effectively solve the problem. This study employs a hybrid of two deep learning models, CNN and BiLSTM, to improve and address the shortcomings of each model for emotion recognition in text [20] [13].
We will research how the word2vec algorithm can be utilized to classify emotional text in this study. Word2vec will be utilized as a word weighting as a pre-trained method prior to the CNN-BiLSTM modeling process. One advantage of word2vec is that it may encode characteristics as dense vectors rather than the traditional sparse representation, which can assist in dealing with the problem of synonyms and homonyms that are common in natural language processing applications. As an outcome, the researcher proposes combining the word2vec method with the CNN-BiLSTM model.

METHOD
Obtaining classifying emotion data is the first step of this project. The data is divided into two categories: training data and testing data, with training data comprising 16000 data used to train the algorithm with the model, and testing data totaling 2000 data used to determine the performance of the model. Emotional data is separated into 6 labels (sadness, anger, fear, love, joy, and surprise).
The preprocessing procedure is used to prepare the data prior to their introduction into the method and model. This method involves lowering case, converting capital letters to lowercase letters, deleting numbers from emotional data if present, and removing punctuation.
The stopwords algorithm eliminates meaningless words from emotional data by deleting the punctuation marks present in the data. Tokenization is the process of translating sentences into words, followed by weighing each word using the word2vec algorithm to determine its worth. As a classification technique, the CNN and BiLSTM combination model is used to determine the model's accuracy. Figure 1 illustrates the methodology utilized in this study. After obtaining the model classification accuracy process, the next step is to calculate the outcomes of the confusion matrix, namely accuracy, precision, and recall, which can be utilized to determine how accurate the model has been. shows several stages that begin with the emotion text data being processed into the pre-processing procedure to clean the data. Lowercase, removing numbers, removing punctuation, removing stopwords, and tokenization are the five steps. Whenever the data has been processed for word embedding segmentation, the weighting in this pre-trained model is word2vec, which converts the shape of the word into a vector, and the following phase is a modeling process whereby the data enters the convolution layer process. The max pool layer will then be connected to the fully connected layer as a whole, and the bi-LSTM process will begin. The sigmoid activation function is utilized, and the end outcome of emotion text is accuracy.

Dataset
The initial phase in this research is to gather data and emotions for the classification process. The data is divided into two parts: testing data and training data, which is useful for preventing the overfitting process from occurring during the training process. The emotion dataset can be found at https://www.kaggle.com/datasets/praveengovi/e motions-dataset-for-nlp. Approximately 16000 training data and 2000 testing data are used, according to research [15]. The information is derived from the Twitter API and includes multiple types of emotions, including anger, a description of "angry" and "pissed," fear, a description of "fear" and "worried," and pleasure, a description of "fun" and "joy," among others.

Preprocessing Data
Before entering the weighting procedure and model, the preprocessing step is where data is prepared. In this procedure, there are five preprocessing stages: lower case, removal of numerals, removal of punctuation, removal of stopwords, and tokenization. In this study, the preprocessing technique employs a number of steps, including: a. Using Lower Case The feature's primary function is to convert all letters in a word to lowercase [21].
b. Remove Number: Since the number contained in the emotion data has no effect on the classification results, deleting it can minimize noise and improve productivity [22]. c. Remove Punctuation; punctuation refers to distinctive characters like as exclamation marks, commas, and question marks that are not necessary for data classification [23]. d. Remove stopwords; words such as "a," "an," and "the" have no sense in the text and can be removed [24], [25].
e. Tokenization is the separation of words into word or token form, with each token divided into words, phrases, or paragraphs [26].

Word2Vec
Word2Vec is one of the strategies used to distribute a word representation further. This method is divided into two components CBOW and skip-gram architecture, which are important for determining a word's representation vector. CBOW predicts the target word based on the results of the source context, whereas Skip-Gram predicts the word based on the word in the source context of the target word. Both variants are beneficial for assisting students in learning similar words based on textual words [27].

Pretrained Word2Vec
This method employs pre-trained Wikipedia2Vec data from the word weighting method, with dimensions of 100 and 300. This enwiki has been configured to utilize the values window = 10, iteration = 10, and negative = 15.

Convolutional Neural Network
CNN is one of the algorithms for deep learning that uses images as input. CNN employs a convolution method, specifically a matrix that extracts features from the input matrix. Pixels or images in a text or document are substituted with character data during CNN natural language processing, where the text data to be processed is the matrix of the input data

Bidirectional Long Short Term Memory
Text classification is performed progressively using a recurrent neural network (RNN) in the LSTM unit. RNN can recall and store information from previous texts, but it has the disadvantage of being unable to remember and keep information for an extended period of time due to its limited storage capacity. LSTM, a development model for RNN, was created to overcome the model's flaws. The image above shows the Bi-LSTM architecture. This model is the evolution of LSTM with two layers whose processes operate in the reverse manner. Because each word is evaluated sequentially, this model is particularly effective at spotting text patterns inside phrases. The forward layer processes from the first to the final word, whereas the backward layer processes from the last to the first word. Thus, there are layers with two opposing directions so that the model can comprehend the preceding word and the leading word, so that the model's process will be deeper, enabling it to comprehend the text's context more thoroughly.  The model's embedding dimension and optimizer are changed in the final test experiment. The results of using 512 neurons and the sigmoid function are displayed in the table; the best accuracy of word weighting applying word2vec is 92.85% with embedding size 300 and optimizer RMSprop. Pada tabel 5 juga melakukan pengujian tanpa menggunakan pembobotan kata, pengujian paramet pada tabel 5 sama dengan pengujian pada tabel 4 yakni pengubahan dari optimizer serta pada tabel 5 tidak menggunakan pembobotan kata word2vec. Terlihat bahwa hasil akurasi tanpa menggunakan word2vec, akurasi yang dihasilkan paling tinggi sebesar 86.80%. Tables 4 and 5 show that applying the word2vec word weighting method boosted accuracy, as seen by the table 4 experiments, where the maximum accuracy reached 92.50%. This is due to word2vec's ability to extract semantic and syntactic information from words. When Word2vec learns a word representation, each word begins at a random point in the vector space. Following that, the word will be gradually moved to a place closer to a word with familiarity or similarity to the location of similar words depending on their neighbors in the training data. In contrast to Table 5, the greatest accuracy value achieved utilizing test data without word weighting is 86.80%.

CONCLUSION
The CNN-BiLSTM combination model has various tests, including neuron testing with an ideal value of 92.30%, activation functions resulting in 99.43% accuracy, as well as RMSprop and Adamx optimizers, based on research including emotion data with word weighting of word2vec. Each test embedding size of 100 provides the highest test accuracy of 92.50% when using the RMSprop optimizer, while 300 gives the most ideal accuracy of 92.85% when applying the RMSprop optimizer. The optimizer achieved the highest CNN-BiLSTM model performance with a Word2Vec weighting of 92.85% and an embedding dimension of 300.