MACHINE LEARNING PREDICTION OF TIME SERIES COVID-19 DATA IN WEST JAVA, INDONESIA

In 2019, the COVID-19 pandemic appeared. There have been several efforts to curb the spread of this virus. West Java, Indonesia, employs social restrictions to prevent the spread of this disease. However, this method destroyed the economy of the people. If no instances were detected in the region, the World Health Organization (WHO) authorized the social restrictions to be relaxed. If the government lifts the social limitation, the decision must also consider the potential of future confirmed instances. By utilizing machine learning, it is possible to forecast future data. This work utilized the following algorithms: linear regression (LR), locally weighted learning (LWL), multi-layer perceptron (MLP), radial basis function regression (RBF), and support vector machine (SVM). The study investigated daily new instances of COVID-19 in West Java, Indonesia, from March 2, 2020, to October 15, 2020. The RBF algorithm was the best in this investigation. Mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and relative absolute error (RAE) were 48.85, 89.73, 88.67, 62.99, and 60.88, respectively. The RBF prediction model may be proposed to the government of West Java for assessing data on COVID-19 instances, particularly in social restriction management. It is anticipated that West Java would have a minimum of 275 new cases every day for the following 30 days beginning on October 16, 2020. Consequently, the easing of societal limitations requires careful consideration.


INTRODUCTION
In China, severe acute respiratory syndrome (SARS) was identified for the first time in 2002 [1]. After 18 years, the SARS virus reappeared. In China, the novel coronavirus resurfaced in December 2019. SARS-CoV-2, a coronavirus with a novel form, has affected several countries throughout the world. The spread of COVID-19 has rendered several nations defenseless [2]. Like in 2002, SARS-CoV-2 induced fear and panic [3]. The coronavirus has been defeated through a variety of initiatives and tactics. Nevertheless, the pathogen continues to spread. COVID-19 has claimed a substantial number of lives. In addition to its consequences on health, the COVID- 19 epidemic has wrecked the economics of several nations. Since a lockdown policy was implemented, the global economy has stopped [4]. The scenario might result in a worldwide crisis. As a result, several nations have eased their lockdown restrictions and allowed companies to reopen in response to the CsocialOVID-19 outbreak. Indonesia announced its first occurrence in West Java on March 2, 2020 [5]. Indonesia employs extensive societal restrictions to prevent the development of this disease. Similar to other locations, this limitation led to a decline in economic activity [6]. As a result, West Java loosened regulations. However, the World Health Organization (WHO) stipulates conditions under which a region may relax its limits [7]. One of them is the absence of instances in the region.
According to the present data trend, West Java had zero instances on several occasions in May but reported an increase in July 2020 [8]. If the government intends to loosen limits, it must consider the possibility of future instances. Using machine learning, it is possible to forecast future data [9]. This is a component of artificial intelligence [10], [11]. Its intelligence can provide more accurate projections of the number of new COVID-19 cases based on previously supplied input data. Time series prediction is a frequent use of linear regression. Yip et al. employed this technique to detect energy theft [12]. Consumers purposefully destroyed the electrical meter, resulting in a theft. Using linear regression, power usage trends may be observed. If there is a deviation from the normal pattern of energy usage, this is considered stealing. Similar research is conducted to predict the demand for heat [[13] and the stock market [14]. In contrast, locally weighted learning (LWL) uses linear regression to give surrounding data weight. It has been widely implemented for time series prediction, such as temperature prediction [15] and short-term forecasting of urban traffic flow [16]There are hardly many real-world issues where the link between independent and dependent variables is evident. It demonstrates that linear regression might have problems recognizing it, resulting in a huge prediction error. The MLP algorithm can be used to tackle this problem. It is an artificial neural network (ANN) [17]. Kumar [20]. The data was collected from January through March 2020. Predictions were made utilizing MLP and an adaptive network-based fuzzy inference system (ANFIS) in this study. This study suggested comparing many algorithms within each nation. This is due to the fact that each nation has distinct epidemic patterns. There are no areas that are identical. Additionally, Wang et al. forecasted cumulative statistics for a number of nations [21], including Indonesia. This study utilized data from January to June 2020, despite the fact that Indonesia reported its instances only in March 2020. A logistic model was used to make the prediction [21]. Ahmad et al. reached a different conclusion than Ardabili et al. [22]. They thought that the first epidemic curve of one nation might be used as a benchmark for other countries, despite the fact that the pandemic stages in each country are distinct. This research examines the conditions necessary for loosening societal restraints. This study examines the daily number of new cases rather than the total number. The social border policy of Indonesia is provincebased, not national. The study only looks at provinces. This study focuses on the province of West Java. This province is the first in Indonesia to report instances of COVID-19. Several machine learning techniques, including linear regression (LR), locally weighted learning (LWL), multi-layer perceptrons (MLP), radial basis function regression (RBF), and support vector machines (SVM), are employed for prediction in this work. was reached on July 9, 2020. On that date, there were 965 confirmed cases. It increased tenfold from the previous day. Figure 2 depicts the study's approach. This study included just two characteristics. They were the date and daily total of new instances. These two properties were retrieved using a mix of remapping, lag selection, and both. This processing produced 21 types of attributes, namely: • caseNew,

Figure 2. The methodology of Prediction of COVID-19 Data Time Series in West Java
The data set was divided into 80% training data, while the remaining 20% was test data. The training data was used to develop models, which were then applied to the test data to get the prediction. This study constructed models utilizing the following algorithms: LR, LWL, MLP, RBF, and SVM. In this part, each algorithm is explained. This study determined the number of statistical metrics to assess the performance of machine learning models. Mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), relative absolute error (RAE), and root relative squared error (RRSE) were the five parameters. They assisted in the ranking of algorithms and subsequently determined the algorithm with the highest performance for this study (Moayed et al., 2019). Equations 1 through 5 represent MAE, RMSE, MAPE, RAE, and RRSE for n observations. (3) Where x i represents the ith data and f(x i ) represents the ith data prediction.

Linear Regression (LR)
The study of the connection between the dependent and independent variables is known as regression analysis. The goal is to anticipate the dependent variable's value given the value of the independent variable. The linear regression approach employs a straight line to represent the relationship between these two variables [12], [15]. Creating a function that approximates the data points is required. Assume that the measurement result data is (x i , y i ). To approach these sites, a straight line is formed, and the error must be as little as possible. Equation 6 represents the regression equation model for linear regression.
where y, x, a, and b represent the dependent, independent, constant, and regression coefficient.

Locally Weighted Learning (LWL)
LWL predicts a-data based on other data that are geographically nearest to it. This is an example of lazy learning [15]. No models were constructed. The forecast solely considers the target data's immediate neighbors. The LWL's predicted steps include 1. Calculate the value of k (the number of closest neighbors). 2. Using the Euclidean distance to calculate the distance between the target data and all training data. 3. Identify the weight of the k training data with the shortest distance. Calculation of weight based on the specified kernel type. 4. Calculating forecasts using Equation 9.

Multi-Layer Perceptron
Artificial neural networks (ANN) mimic the structure and function of the human brain. ANN is made up of several neurons. A neuron may create connections with other neurons, and each connection has a weight. The learning process is carried out by ANN by varying the weights of neural connections. The backpropagation technique is the most often used learning algorithm for training MLP models [17]. It makes use of an output error to change the value of its weights in reverse. Before an error may occur, the feed-forward stage must be finished. Neurons are activated during forward propagation using the sigmoid activation function. In general, back-propagation MLP training consists of the following steps: 1. Determine the total number of inputs, hidden, and output layers. 2. Give all weights between the input-hidden and output-hidden layers starting random values. 3. Each input unit receives an input signal in a feed-forward mode. 4. All buried layer units get the signal. Each unit in the succeeding layer adds weighted input signals. Furthermore, the activation function computes each output signal. 5. Performing back-propagation 6. A pattern target corresponding to the learning input pattern is delivered to each unit in the output layer. After then, the system computes error information. Weight and bias adjustments are computed, and the results are used to change the weight values in the bottom layer. 7. If the minimum error or maximum iteration limit is reached, the loop should be terminated.

Radial Basis Function (RBF)
RBF maps a multidimensional nonlinear function based on the distance between the input and center vectors [23]. It excels at modeling nonlinear data. It has a network architecture, much like other neural networks. Its structure includes input, hidden, and output layer units. It is very useful for dealing with noisy input data. There are three major differences between MLP and RBF: 1. The link between the input layer and the hidden layer is unweighted. 2. The buried layer node has a radially symmetrical activation function. 3. Unlike MLP, RBF employs only one hidden layer. As a result, the RBF technique is expected to learn faster than the MLP network.

Support Vector Machine
It aims to find a regression function of a hyperplane (borderline) in a time series prediction situation [24]. This regression function must be as small as possible while matching every piece of input data with an error. For example, there are n data (x i, y i ) with i=1 to n. It will find the function f(x) that deviates the most from the real target across all training data. If the variance is zero, the regression is perfect. Equation 10 represents the regression function.
Where and * are slack variables used to solve a viable constraint issue. Constant C sets the trade-off between the function's thinness and the deviation's allowable upper limit. Any deviation larger than this value incurs a C penalty.

RESULT AND DISCUSSION
This study utilized five machine learning methods. Each algorithm's batch size was 100.
The algorithm of LR performed feature selection. Using the Akaike information criterion, each property was evaluated. If an attribute had the lowest standardization coefficient, it was eliminated. This process ceased once all mistake corrections had been made. The crest on LR measured 1.0E-8. The model of LWL considered all of its neighbors. Ranking of neighbors depending on their proximity to the target data. This search was conducted using brute force and the Euclidean distance. In addition, its weighting employed a linear kernel.
In this investigation, the MLP activation function was an approximation sigmoid with a squared error loss function. The architecture had one concealed layer. There were two fundamental functions in RBF. A thread is handled in a pool for both MLP and RBF. In addition, both methods utilized the Ridge penalty factor and tolerance values of 0.01 and 1.0E-6, respectively. In this work, SVM utilized a poly kernel as its kernel. Its optimizer utilized epsilon, and the tolerance value was set at 0.001. The optimizer utilized a specifically designed implementation of Shevade et al work for regression [25].
These five algorithms were run on a system powered by an Intel Core i7-9700K CPU running at 3.60GHz. 8 GB of RAM was put on a PC with a 64-bit operating system, an x64based CPU, and Windows 10. Figures 3 through  7 show the efficacy of each algorithm. Using previous data, each algorithm forecasted data for the next 30 days. Each graph has two lines. The red line depicts actual data, whereas the blue line depicts expected data. The situation has improved as the blue line approaches the red line. The bigger the difference between the blue and red lines, the greater the difference between predicted and actual circumstances.   Figure 4, LWL had the greatest number of incorrect predictions compared to other algorithms. Until the end of August, the blue line formed a straight horizontal line, although the red line did not reveal this. From early September to mid-October, the number of forecast mistakes increased. As seen in Figure 3, the LR's blue line indicated a state that was more similar to the red line. LR formed a somewhat curved line until the end of August, following which it continued to rise.   Figure 7 depicts the SVM prediction error that was most prevalent between mid-August and October. It was very unlike MLP and RBF. Figure 5's MLP demonstrates that the majority of the real line was close to the projected line. The data from the middle of September to the beginning of October, however, reveals a distinct change in number. This is where RBF's expertise is compared to MLP's. Figure 6 demonstrates that RBF more accurately anticipated the data for this time. Table 1 displays the performance of the five studied methods. In this work, MAE, RMSE, MAPE, RAE, and RRSE were used as performance assessment measures. The algorithm was ranked based on the error value per parameter; the less the error, the more efficient the algorithm. Except for MAPE, RBF has always been ranked #1. In contrast, LWL was always ranked last, except for RMSE. Figure 3-7 demonstrates that none of the algorithms can forecast the July data surge. Such data surges are highly probable in the real world. Numerous reasons can induce this syndrome. Even though RBF is the greatest algorithm studied, it cannot anticipate this spike. It implies that the number of new cases in West Java is influenced by a variety of variables, not just the time series trend shown in this study. Therefore, this study cannot be considered the major resource to anticipate future COVID-19 data in West Java.
In contrast, RBF could effectively model the data surges from August to October. The predicted line was approaching the real line. It indicates that RBF was able to simulate the anticipated time series for this period using historical data. It can be recommended as a tool to assist the government of West Java in assessing data on COVID-19 cases, particularly in managing social limitations. Figure 8 and Table 2    The COVID-19 pandemic first surfaced in the year 2019. There have been a number of initiatives taken to slow the progression of this infection. The province of West Java in Indonesia has implemented stringent social regulations in an effort to halt the development of this illness. Nevertheless, the people's economy was completely ruined as a result of using this strategy. In the event that there were no cases found in the area, the World Health Organization (WHO) gave its approval for the social restrictions to be loosened. If the government decides to remove the social restriction, the decision must take into account the possibility of further confirmed incidents in the future. It is feasible to make accurate projections of future data by employing machine learning. In this particular piece of work, the following sets of algorithms were utilized: linear regression (LR), locally weighted learning (LWL), multi-layer perceptron (MLP), radial basis function regression (RBF), and support vector machine (SVM) (SVM). During the course of the study, which began on March 2, 2020 and lasted until October 15, 2020, researchers tracked daily new cases of COVID-19 in West Java, Indonesia. This analysis found that the RBF algorithm performed the best. The relative absolute error (RAE), the mean absolute percentage error (MAPE), the root mean squared error (RMSE), and the mean absolute error (MAE) were respectively 48.85, 89.73, 88.67, and 60.88. It is possible to suggest to the government of West Java that they use the RBF prediction model in order to evaluate data on COVID-19 occurrences, particularly in relation to social restriction management. Beginning on October 16, 2020, it is projected that West Java would have a daily average of at least 275 new cases over the subsequent 30 days until the end of the outbreak. As a consequence of this, giving more leeway to societal restrictions calls for cautious analysis.