TEXT CLASSIFICATION USING INDOBERT FINE-TUNING MODELING WITH CONVOLUTIONAL NEURAL NETWORK AND BI-LSTM

The technological advancements in goods delivery facilities have been increasing year by year in tandem with the growing online trade, which necessitates delivery services to fulfill the transactional process between sellers and buyers. Since 2000, top brand awards have often conducted official survey analyses to provide comparisons of goods or services, one of which includes delivery services. However, the survey rankings based on public opinion are less accurate due to users of delivery services and service companies being unaware of the specific success factors and weaknesses in their services. The aim of this research is to analyze the comparison of text mining using the Indonesian language transformation method, IndoBert. The algorithm utilized to demonstrate analysis performance employs Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM). This method is utilized to determine the impact of opinion data from Twitter on the J&T Express expedition delivery service, incorporating both text preprocessing and data without text preprocessing. The IndoBert parameters vary in the learning rate section based on four factors: price, time, returns, and others. The research data consisted of 2525 comments from Twitter users regarding the delivery service spanning from January 1, 2021, to March 31, 2023. The testing showed that Bi-LSTM with text preprocessing performed 2% higher, achieving 79% at a learning rate of 1x10 -6 , compared to without text preprocessing at the same learning rate, which reached 77%. Additionally, CNN outperformed by 3% with a rate of 83%, compared to 80% without text preprocessing at a learning rate of 1x10 -5 . The highest accuracy, reaching 83%, was obtained by CNN with parameters set at 1x10 -5 , and the preprocessing technique was considered superior to Bi-LSTM.


INTRODUCTION
Customer ratings on social media represent opinions derived from individuals' experiences as consumers of services or products [1].These assessments serve as valuable reviews for other consumers and service or product sellers in enhancing their offerings.Twitter, a prevalent social media platform among Indonesians, facilitates internetbased communication, allowing users to post various photos and videos on their accounts [2].J&T Express is a recently established company specializing in delivering goods, including documents and packages [3].In 2023, J&T Express achieved the Top Brand Award in the Best Courier Service category, securing the highest score of 20 in a comparison of seven startups categorized as unicorns.Additionally, it claimed the highest delivery volume among five renowned shipping companies in Indonesia, namely JNE, SiCepat, Ninja Express, and SAP [4].
Text mining involves analyzing substantial amounts of unstructured data to generate new insights from previously unclear information [5].Text Mining is used to extract useful information from data sources by identifying and finding interesting pattern relationships.Text Mining also leads to the research field of Data Mining [6].
The learning rate, a crucial hyperparameter, governs the degree of model adjustment in response to the error estimate during weight updates [7].Selecting an appropriate learning rate poses challenges since a small value can prolong the training process or lead to stagnation, while a large value may result in unstable training or suboptimal weight learning [8].
Adam optimization is a stochastic gradient descent method based on adaptive estimation of firstand second-order moments [9].The name "Adam" comes from "Adaptive Moment Estimation" because Adam uses first and second moment gradient estimates to adjust the learning rate for each weight in the neural network.The Adam optimizer combines the best properties of the AdaGrad and RMSProp algorithms to provide a more optimal algorithm which can handle diffuse and noisy gradients [10].Adam is relatively easy to configure with default configuration parameters working well for most problems generated during training.
Batch size, representing the number of samples processed before updating the model, must range from one to the total samples in the training dataset 1606 Jurnal Teknik Informatika (JUTIF), Vol. 4, No. 6, December 2023, pp.1605-1610 [11] Given that deep neural networks cannot process the entire dataset at once, batch size divides the dataset into manageable groups or segments.
An epoch refers to one round of training a neural network with all available training data [12].The number of epochs dictates the number of times the deep learning algorithm runs.However, excessive epochs can lead to underfitting, where the model fails to discern meaningful relationships between input and output data [13].
The dropout layer within neural networks prevents overfitting by excluding individual nodes during training processes using probabilities [14].In this process, individual nodes are excluded in various training processes using probabilities, as if they were not part of the network architecture at all [15].In general, 0.5 is used, which means removing 50% of the unit.In the dropout layer, the dropout unit is selected randomly, then the unit is not used.Meanwhile, other units will continue learning.
The research carried out this time analyzed the comparison of text mining with the Indonesian language transformation method of Bidirectional Encoder Representations form Transformasis (BERT), namely IndoBert.IndoBERT is a language modeling extension of BERT which is used in various types of linguistic tasks, such as sentiment analysis, question answering, text prediction, text generation, and text summarization [16].
The algorithm used to display analysis performance uses Convolutional Neural Network (CNN) and Bidirectional Long Short Term Memory (Bi-LSTM), this method is used to determine the influence of opinion data from Twitter on the J&T Express expedition delivery service which has been carried out text preprocessing and data text preprocessing was not carried out with IndoBert parameters which varied in the learning rate [17].
This study analyzes the comparison of text mining using the IndoBert transformation method, which is an extension of BERT specifically designed for the Indonesian language.It employs Convolutional Neural Network (CNN) and Bidirectional Long Short Term Memory (Bi-LSTM) algorithms to evaluate the impact of Twitter opinions on J&T Express delivery services.By varying IndoBert parameters concerning learning rates, the study aims to determine the optimal algorithm performance.Previous studies have classified tweet reviews into positive, neutral, and negative sentiments, achieving 75% accuracy using a learning rate parameter of 3x10^-4 with CNNs [18].
Bi-LSTM, a neural network with two LSTM layers, processes information bidirectionally to capture both past and future contexts [20].This architecture amalgamates two opposing hidden layers into one output [21].
CNN, an integral deep learning architecture, obviates the need for manual feature extraction by integrating convolution into artificial neural networks [22].It aims to recognize new objects or images based on detected features.
This research compares the fine-tuning of the IndoBert base model on preprocessed and nonpreprocessed datasets, aiming to ascertain the influence of the J&T Express delivery service dataset on the IndoBert base model using Bert Embedding.Two deep neural network algorithm architectures-Bidirectional LSTM and CNN-are assessed using training and test data, evaluating performance metrics such as accuracy, F1 score, precision, and recall.The goal is to determine the best performing model after fine-tuning the IndoBert base model and to optimize the model by varying learning rate optimization values to gauge their impact on model performance.

RESEARCH METHODS
The research methods in this study can be seen in Figure 1, and Figure 2 illustrates the framework used in this research.To enhance the data analysis process, the label encoder is applied to convert the category from a character data type to an integer type.Table 2 is the encoder label output regarding data labeling based on categories.The purpose behind employing these 7 indicators is to rectify abbreviated vocabulary, repeated letters, and incorrectly written words.The objective is to transform them into complete and appropriate words without altering their intended meanings, adhering to the KBBI Online guidelines.Additionally, the removal of mentions aims to eliminate sentence ambiguities that could potentially impact the sentence pattern relationship within the model.
For IndoBert implementation, the IndoBert-litebase-p2 indobenchmark is utilized for both tokenization and modeling.This modeling process encompasses datasets both with and without preprocessing.The primary goal is to assess how input data affects the IndoBERT base model.Optimization of the model involved the use of Adam optimization, employing various learning rate optimization values as outlined in the subsequent table.The research incorporates 2 learning rate hyperparameters, detailed in table 3. The evaluation encompasses four performance metrics: accuracy, precision, recall, and F1 score.It compares the performance between the Bi-LSTM algorithm and CNN after implementing IndoBert modeling, utilizing two different learning rates.

RESULT AND DISCUSSION
The data utilized in this research, whether subjected to text preprocessing or not, consists of four categories with different data types.From the collected data, 90% is allocated as training data, while the remaining 10% is designated as test data.This distribution is maintained for both the preprocessed and non-preprocessed data sets.The deep neural network algorithm model is built with 2 layers which are explained in table 6.From a comparison of data processed using the Bi-LSTM algorithm without or with text preprocessing, it was found that higher accuracy results were obtained from data processed first using text preprocessing with a learning rate of 1x10 -5 .
Table 9 and table 10 are a comparison of the process with the CNN algorithm with or without text preprocessing first.From this second comparison, CNN with text preprocessing parameters 1x10 -5 is 3% superior to the one without text preprocessing.

DISCUSSION
In the comparison of data processed using the CNN algorithm, the data subjected to text preprocessing showed higher accuracy compared to the data without preprocessing.The accuracy was observed at a learning rate of 1x10 -5 , which outperformed the accuracy at a learning rate of 1x10 - 6 .
When comparing the two algorithms, IndoBert modeling with the CNN algorithm achieved better accuracy than the Bi-LSTM algorithm, with a 3% difference in accuracy.This research successfully utilized a language transition model integrated with a deep learning algorithm.In previous research, the analysis of delivery service opinions employed Bi-GRU, resulting in a 71% accuracy [23].
Bert Embedding, applied to this dataset, transforms complete input sentences into token embeddings.The first token of each sequence in the dataset is transformed into a special classification token.In the embeddings segment, when multiple sentences exist, they are combined into a single sequence and differentiated using special tokens.Additionally, positional encoding is utilized to preserve the word order for vectorizing the dataset.
Within the encoder transformer layer of Bert, the attentional head mechanism enables accurate connections between words in sentences within the dataset.This allows for an accurate understanding of the contextual semantic meaning of the same words in different contexts within a sentence.

CONCLUSION
The research has concluded that text preprocessing significantly influences the input dataset for Bert Embedding in the IndoBert Base model.Text preprocessing ensures that the dataset is

Figure
Figure 1.Research Flow indicator Alda Zevana Putri Widodo, et.al., TEXT CLASSIFICATION USING INDOBERT … 1607 This research utilizes Python programming language version 3.10.0and the Visual Studio Code application.The library packages employed include TensorFlow, Pandas, NumPy, Keras for modeling using IndoBert with the Bi-LSTM and CNN algorithms, and for visualizing image plots, Pylot and Graphiz were utilized.The dataset for this research comprises textual data in the form of tweets sourced from the official J&T Express Twitter account spanning from January 20, 2021, to March 31, 2023, totaling 2525 data entries.Experts from J&T Express categorized the dataset into four categories: price, time, returns, and others.The dataset is structured with three columns: date, tweet, and category.Data collection was executed using the Python programming language, specifically leveraging Tweepy, and the collected data is stored in CSV format.The details of the dataset are outlined in Table 1, providing attributes, descriptions, and data types for the research information.

Table 2 .
Encoder Label Information

Table 3 .
Deep Neural Network Model Optimization Parameters Table 4 presents examples of text data subjected to text preprocessing and data that underwent no text preprocessing.1608 Jurnal Teknik Informatika (JUTIF), Vol. 4, No. 6, December 2023, pp.1605-1610 ReLu activation function is applied in the hidden layer, while the softmax function is utilized in the output layer.The loss function employs sparse categorical cross-entropy due to the classification targets numbering more than two categories.A dropout layer of 0.2 is utilized within the hidden layers to regulate the model, thereby reducing overfitting.Each model was constructed with a batch value of 128 and an epoch value of 4. The IndoBERT base model, the Hugging Face indobenchmark, and the developed IndoNLU parameters are detailed in table 5.