IMPLEMENTATION OF LSTM (LONG SHORT TERM MEMORY) ALGORITHM TO PREDICT WEATHER IN CENTRAL JAVA

Agro-indutrial agricultural production such as red onions in Indonesia has a very important share in driving Indonesia's economic growth, especially in Central Java province which contributed 28.15% of the total national red onion production in 2021. Weather conditions have a major influence on the red onion planting process until the red onions are ready to be harvested. In this study, the objective is to predict various types of weather such as rainfall, air temperature


INTRODUCTION
In Indonesia, agro-industrial agricultural production has a contribution to GDP of 13.5% or an increase of 1% from 2019.So that agro-industrial agricultural products are from agriculture, fisheries/marine, livestock, farm, and forestry sectors are very important to encourage Indonesia's economic growth [1] [2].Not only as a source of daily food for the community [3], but also as a source of state income.The agricultural sector remains a mainstay in absorbing labor from time to time [4] due to the habitual nature of its activities and the constant need for products from agriculture [5].One of the high-value agricultural products in the Indonesian market is red onions [6].Red onions (Allium cepa L.) [7]in addition to high value is one of the most important horticultural vegetables in the world [8].Because it can be used in all aspects of life ranging from health to kitchen spices in processing food.
Based on the Central Bureau of Statistics (BPS), in 2021, red onion production managed to increase by 10.43% (189.15kilotons) compared to 2020.
The household sector Figure 1 contributed 790.63 kilo tons of red onion consumption, an increase of 8.33% compared to 2020.For the entire household red onion consumption itself reaches 94.16% of the total existing red onion consumption.
In addition, according to the Central Statistics Agency (BPS), the results of red onion production activities in 2021 fluctuate, production in August is the highest reaching 218.74 kilo tons (See Figure 2).with a harvest area of 18,070 hectares.The provinces with the highest red onion production are Central Java, East Java, and West Nusa Tenggara.Central Java Province contributes as much as 28.15% of Indonesia's production which reaches 564.26 kilo tons and a harvest area of around 55.98 thousand hectares.
With the achievement of red onion production and consumption activities, there are many challenges, one of which is the weather [9].Weather conditions such as temperature, rainfall, and air  For this reason, this study conducted a survey of seven districts including Brebes, Temanggung, Demak, Boyolali, Kendal, Pati, and Tegal in Central Java Province.By analyzing the weather using data (Google Earth Engine MAP) GEEMAP and (Meteorology, Climatology, and Geophysical Agency) BMKG to determine the level of humidity, temperature, and rainfall.So that we can make a prediction system to find out the weather in the future.
Judging from the existing data, it can be predicted how the weather in 7 districts using the LSTM (Long short-term memory) algorithm.The fundamental difference between LSTM and RNN is that LSTM compensates for the shortcomings of its predecessor, RNN (Recurrent Neural Network).RNNs cannot predict data based on information stored in the long term.In other words, the issue of storage duration is not an issue with LSTM [11].Systems that implement LSTM can process, predict, and classify information based on time series data [12].According to this concept, LSTM can recall and delete old data when it is no longer needed [13].Therefore, information management becomes more complete and up-to-date [14].As conducted by Chenjia Hu and his colleagues entitled "Prediction of ultra-short-term wind power based on CEEMDAN-LSTM-TCN" the LSTM algorithm has an error in MSE and RMSE smaller than other algorithms [15].

RESEARCH METHOD
This research method uses a combination of analysis with the CRISP-DM method which can be seen in Figure 3.The CRISP-DM (Cross Industry Process Model for Data Mining) methodology, which uses a goal-oriented approach, was used to structure this research.The CRISP-DM methodology is a mature methodology that is consistently well accepted in data mining research using machine learning.It offers a lifecycle approach to research involving applied artificial intelligence, and is considered the best methodology for knowledge discovery in databases (KDD) [16].There are a number of features of CRISP-DM that make it suitable for evidence mining.In addition, this technique offers a general process model that summarizes the overall framework and aspects of the methodology where it then offers specialization based on a pre-defined context.Figure 3 details the many stages of the process and outcomes to validate the success of the prediction.

Business Understanding
Business Understanding is a step to understand the purpose of the research and how the research can help in making business decisions.This step also involves identifying the hypotheses and target variables to be analyzed.

Data Understanding
Data Understanding is the step to collect all the data required for the research.This can be internal company data or publicly available external data.

Data Preparation
Data preparation is a step to clean and prepare the data to make it ready for processing by the model.This can involve data grouping, removal of useless data, data measurement, and normalization.Python is Rhedy Irwan, et.al., IMPLEMENTATION OF LSTM (LONG SHORT TERM MEMORY) … 1349 used in this procedure since it has grown to be one of the most widely used platforms and the top opensource programming language for deep learning [17] Python on Google Collaboratory is an open and cloud-friendly notebook environment.This tool helps users and their team members to edit documents and supported libraries that are often used for research, especially when related to machine learning [18].
Normalization is required in order to be processed in machine learning.Normalization in this case uses a library from Python, namely Scikit-learn.A Python package called Scikit-learn integrates many types of cutting-edge computers to learn about the techniques for supervised and unsupervised situations.The library focuses on how machines acquire knowledge using a high-level, generalpurpose language aimed at non-specialists [19].Normalization by using function MinMaxScaler, Considering the weather is a time series of data in T times with interval X = X = [  1 ,  2 , . . .,    1 =   − s (1) Where is the observation value at t, is the data that has been normalized at t [20].After normalization, the data is displaced or separated into train data, and test data.The train data amounts to 80% of the total data and the remaining 20% is used for test data then make new data series so that can be processed in modelling stage.

Modeling
Modeling is the step to build a model that can be used to predict the target variable.This can involve selecting an appropriate modeling method and training the model using training data in Deep Learning.Deep learning is a subset of Machine Learning that is fueled by the massive rise in processing power in computers or machines.Deep learning algorithms are commonly used in pattern recognition systems due to their ability to extract abstract ideas from high-dimensional data [21].
Tensorflow, an open source deep learning platform for developers, is used to build Machine Learning and Deep Learning applications.To conceive and investigate intriguing ideas concerning Google's artificial intelligence.TensorFlow is written in the Python programming language, hence it is considered a simple framework [22].
LSTM is a standard variant of RNN.The standard RNN is quite simple and robust.However, in practice it is difficult to train the model for problems with a long time lag between the target and the previous related event.Hence LSTM was introduced to overcome the problems faced by RNNs[ [20].
The architecture of the RNN algorithm is designed for transient sequence models.LSTM has a long-range dependency which makes LSTM more accurate than conventional RNNs.The backpropagation algorithm in RNN causes errors in its backflow problem, Unlike RNN, LSTM contains specialized units called memory blocks in the recurrent hidden layer.Memory blocks contain memory cells with self-connection to store the temporal state of specialized networks called gates that are useful for controlling the flow of information.Each memory block in the original architecture contains three types of gates viz: Input Gate: Input gates control the flow of input activation to the memory cells.Output Gate: Output gates control the flow of cell activation outputs to the rest of the network.Forget Gate: Scales the internal state of the cell before it is added as an input to the memory cell through self-recurrent on the cell connection, therefore adaptively forgetting or resetting the memory cell [20].
In addition, modern LSTM algorithms contain internal cells in their gates that are used to learn how to properly time the output.To simplify analysis, the LSTM architecture is often used in the tt(time) dimension as in the following diagram (See Figure 4) Where  ,  ,  ,  ,  ,  ,   ,   are the model parameters that are performed while the model is trained; (sigmoid) and is the activation function and is the bias.The LSTM layer can be used with tensorflow and Keras library from python.Keras follows best practices to reduce cognitive load in a consistent and simple manner.Keras minimizes the number of user actions required for common use cases, and provides clear and actionable feedback on user errors.This makes Keras easy to learn and use.Easy usage does not reduce flexibility as Keras can be integrated with low-level Deep Learning languages such as TensorFlow.This makes it possible to implement anything that can be created in that base language [23].
The dropout technique is an effective regularization method to reduce overfitting problems.The core idea of dropout is to prevent the network from relying too much on each neuron and thus reducing the adaptability between neurons.The neurons are multiplied by a random variable that has a probability p and follows a Bernoulli distribution throughout the training phase at each iteration, according to the mechanism.The dropout rate corresponds to Figure 6 which illustrates the difference in structure between models with and without dropout.The corresponding formula without dropout is as follows [24].
~=   ℎ  +   (9) With dropout: Where is the output of the model at time t before being processed by its active function.represents the output vector of the hidden layer and represents the weight matrix and can that connects the hidden layer and output layer [19].The final output is.
Where is the output of the model and is the activation function of the output layer [19].

Evaluation
Evaluation is the step to evaluate the ability of the model to predict the target variable.This can involve using test data to obtain the model error score and comparing it with other methods that may be used.
In the evaluation step, it can be seen from the error value or how the model fitting graph runs well or not.To check, you can use a library from python, namely Matplotlib.Matplotlib is a data visualization package that is widely used in Python.Matplotlib can easily draw a variety of high-quality 2D graphs as well as some pretty simple 3D graphs.Matplotlib, a Python library, provides a simple language, good drawing accuracy, and simple and easy-to-understand code.[25].Alternatively, it can use the seaborn library which has functions for dataset-oriented visualization, making it easier to translate questions about data into graphs.Seaborn is designed to be useful throughout the lifecycle of scientific research [26].
RMSE stands for Root Mean Squared Error.It is a measure of the difference between the value predicted by the model or estimator and the true value.RMSE is a popular measure of accuracy for continuous data, and is a commonly used metric in the field of machine learning.The RMSE equation is calculated as the square root of the average squared difference between the predicted value and the true value.In mathematical notation, this is represented as: The smaller the RMSE value, the better the model predicts the true value.Where: n is the number of samples,   [] is the predicted value for the i-th sample,   [] is the true value for the i-th sample [27].

Deployment
Deployment is the step to generate the model into files that can be used for various platforms.Overall, CRISP-DM is a useful methodology for extracting information from data and making business decisions.It provides a clear structure to guide the process of extracting information from data and ensures that research can be completed effectively and efficiently.

Business Understanding
Ideal weather is needed to get good red onion production.If the weather can be predicted, it is expected to increase production or reduce losses in the planting process, for example, what red onion farmers really want to avoid is crop failure.
Modeling predictions for various weather factors, such as temperature, rainfall, ground surface temperature and others are needed to solve these business problems by looking at the model's ability to evaluate the minimum possible error.

Data Understanding
In the data obtained there are two different sources, namely from BMKG (Meteorology, Climatology, and Geophysical Agency) and GEE (Google Earth Engine).BMKG data in the form of monthly global rainfall data from 6 districts namely Brebes, Temanggung, Boyolali, Pati, Kendal, and Demak in 2018 to 2022 in Central Java.It can be seen in Figure 7 that the data arrangement is not neat and there is a lot of graphic content such as colors and logo images that are not needed.Data processing in machine learning requires clean and tidy data.To tidy up the untidy table, you can use the melt function in python which is already contained in the python library.Then for the x value which is empty data, it can be replaced with the mean or average of the number of rows where the empty data is located.GEEMAP data is daily global data of various weather such as rainfall, soil temperature, air temperature, and land surface temperature from 2018 to August 2022 from 7 districts of Brebes, Tegal, Temanggung, Boyolali, Pati, Kendal, and Demak in Central Java Province.

Data Preparation
In the data preparation step by selecting the column to be processed.In this case, we use the Demak Rain Monthly column.If visualized using the matplotlib library, it will look like Figure 8. Visualization in Figure 8 can explain that rainfall in the selected column, namely in Demak district, has a seasonal pattern.
Furthermore, it will require normalization to be processed in machine learning.Normalization this time uses the MinMaxScaler function in the sklearn library with its feature range between values 0 to 1 and reshape between -1 and 1.After normalization, the data is split into train data and test data.The train data amounts to 80% of the total data and the remaining 20% is used for test data.
The new data series is used to create a new dataset into the time series to be processed in the LSTM algorithm.This new data series has two parameters, namely dataset and step.Step here is a time_stamp variable which is worth 3.The step parameter is a step in each time series process.

Modeling
The model is built using the Tensorflow framework using the Long Short-Term Memory algorithm or commonly referred to as LSTM as its main architecture.Tuning hyperparameters in the LSTM model there are 10 layers where 4 layers with the LSTM algorithm with each nodes of 32, 64, 128, 256, dropout which is useful for the model to fit smoothly so that overfitting or underfitting can be avoided.Each Dropout is worth 0.2 or 20%, 1 layer for Dense with 1 node see Figure 9. Figure 10 demonstrates the building of an LSTM memory cell, which is a fundamental unit of the LSTM model.As previously stated, each memory cell has an input gate that learns to protect the memory cell's continuous error flow against irrelevant inputs.The output gate unit learns to protect other units from the memory cell's irrelevant memory contents.The forget gate unit learns to regulate how long a value remains in the memory cell.In this case, the input data consists of predicted variable weather data that has been chosen.The projected variable data visibility is the output.

Evaluation
Evaluation of model performance can be done by looking at the visualization of the fitting process when modeled.By using the matplotlib library can visualize the results of a model's performance.After going through the fitting process, the model can predict future weather, here trying to predict the next 5 days with daily weather data in Table 3. D-day means the last day from existing data.In Figure 13 the orange color explains the graph of how the weather prediction is 5 months ahead after the last data, which is in blue from Monthly demak rainfall.Then in table 14 is the result of predicting the next 5 months after the last month of existing data.When viewed in Figure 7 the last month is July in 2022 which explains as M-Month.By combining existing data and predicted data for the next few months.The visualization can be seen in Figure 14 below.

Deployment
In this deployment process by producing models into files that can be used for various platforms such as tflite for use in android applications, and json for website development.

DISCUSSION
Research using the LSTM algorithm has been widely used, such as the one conducted by Manzhu Yu et.al entitled "Using long short-term memory (LSTM) and Internet of Things (IoT) for localized surface temperature forecasting in an urban environment in 2021".This research shows that LSTM outperforms traditional time series forecasting techniques with RMSE values for minimum, average and maximum of 2.71/2.99/3.31[28].Then the research of Alfan Galih Salman and his friends about "Single Layer & Multi-layer Long Short-Term Memory (LSTM) Model with Intermediate Variables for Weather Forecasting in 2018".Comparing single layer and multi layer LSTM models using weather datasets (temperature, pressure, humidity, and dew point).The results of his research produced the best results on the weather variable pressure with an RMSE value of 0.0775 [20].Furthermore, there is research on "Deep learning model for daily rainfall prediction: case study of Jimma, Ethiopia in 2022" by Demeke Endalie et al [29].Rainfall prediction is an important task for some people, especially in the agricultural sector.This research was conducted in Jimma one of the regions in Ethiopia using LSTM algorithm which produces the best performance of RMSE of 0.01 compared to other algorithms such as kNN, SVM, and Decision Tree.
Based on researches above, LSTM algorithm has best performance than the other algorithms.Because of that, the author is interested in knowing how to implement the LSTM algorithm, especially to predict various weather in districts of Central Java using the latest dataset from 2018 to 2022.Comparing from those researches results, error found out the final results from 39 models by using the evaluation of the average value of train MSE 0.013, test RMSE 0.11, test MSE of 0.02, test RMSE 0.12.
Each column requires different hyperparameter tuning in modeling.This is needed to avoid overfitting or underfitting.Some models for districts are still underfitting or overfitting.It is necessary to change the hyperparameter tuning, the number of epochs, the value of nodes in each layer, fill missing values which can use mode, median or knn.By using class callbacks to avoid overfitting, or change the value of time_stamp in forming new time series data.

CONCLUSION
Based on the research conducted, from 39 total models.There are 5 fitting models, 16 overfitting models, and 18 underfitting models.Tuning the hyperparameters is recommended not to shuffle the data with Boolean True, time series data requires sequential data so the required status is False.Evaluation of the average value of the entire model error train MSE 0.013 and RMSE 0.11 for the average error test MSE of 0.02 and RMSE 0.12 .Models that work well when viewed from the fitting performance are Daily Demak Rainfall, Monthly Brebes Air Temperature, Monthly Brebes Air Humidity, Monthly Demak Rainfall, and Monthly Kendal Air Humidity.

Figure 1 .
Figure 1.Red Onion Consumption by Indonesian Households

Figure 2 .
Figure 2. Red onion production per month in 2021 in Indonesia

Figure 4 .
Figure 4. Arsitektur of Folded Long Short-Term Memory ModelIn the diagram above Figure4, it can be seen that each LSTM block receives signals from: input signal (x), input gate signal (i), recurrent signal (h), forget gate signal (f), and produces output gate signal (o).The process flow in each LSTM memory block can be depicted in the diagram below (See Figure5).

Figure 5 .
Figure 5. Arsitektur of Folded Long Short-Term Memory Model

Figure 7 .
Figure 7. Monthly weather in six regencies of Central Java

Figure 8 .
Figure 8. Visualization Monthly Rainfal in Demak regency year of 2018-2022 For example, there is a sequence of values in the dataset such as X[100, 110, 120] = Y[130], X[110,120,130] = Y[140].X is the train or test data and Y is the data to be predicted.The values 100, 110, 120 are 1352 Jurnal Teknik Informatika (JUTIF), Vol. 4, No. 6, Desember 2023, pp.1347-1357 time_stamps that are worth 3 steps.Then the value 130 in Y[130] will go down to X, namely X[110,120,130] for the next process until the last data.

Figure 11 .
Figure 11.Loss MSE for Monthly Rainfall DemakFigure 11 explains that the loss performance where using MSE or Mean Squared Error can work well between Train and Test results.Fitting is found when epochs are close to the value of 500.

Figure 12 .
Figure 12.Loss RMSE for Monthly Rainfall DemakLikewise, in performance metrics that use RMSE or Root Mean Squared Error.Fitting runs well and meets at epochs around almost 500 can be seen in Figure12.After selecting each column where for modeling which consists of BMKG (Meteorology, Climatology, and Geophysical Agency) data consisting of Demak, Boyolali, Pati, Temanggung, Brebes, Kendal districts with weather such as Rainfall, Air Temperature, and Air Humidity where monthly data from January 2018 to August 2022.Then data from GEE (Google Earth Engine) which has daily weather data such as rainfall, soil temperature, and air temperature from 2018 to August 2022 from 7 districts of Brebes, Tegal, Temanggung, Boyolali, Pati, Kendal, and Demak.Model error assessment uses MSE, and RMSE with the help of the scikit-learn library.The results of models can be checked Table1and Table2with evaluation of the average value of the entire model are error train MSE 0.013 and RMSE 0.11 for the average error test MSE of 0.02 and RMSE 0.12.

Figure 14 :
Figure 14: Combining existing data and future data from monthly demak rainfall

Table 1 and
Table 2 with evaluation of the average value of the entire model are error train MSE 0.013 and RMSE 0.11 for the average error test MSE of 0.02 and RMSE 0.12.

Table 1 .
Results of Train MSE and Train RMSE each model

Table 3 .
Prediction daily weather 5 days ahead after D-day