ENHANCING SENTIMENT ANALYSIS WITH CHATBOTS: A COMPARATIVE STUDY OF TEXT PRE-PROCESSING

Text pre-processing plays a crucial role in the Sentiment Analysis process. Machine Learning models like Chat GPT-3.5 by OpenAI and Google Bard serve as alternative methods for text pre-processing. This study aims to evaluate the capabilities of both Chatbots in the text pre-processing stage while assessing their performance using a dataset obtained by crawling from source X. The study involves a comparison of Chat GPT-3.5 and Google Bard using Decision Tree and Naïve Bayes algorithms. The validation process employs K-Fold Cross Validation with a K value of 10. Additionally, three sampling methods, namely Linear, Shuffled, and Stratified Sampling, are utilized. The findings reveal that Chat GPT-3.5 performs best when using the Decision Tree algorithm with a K-Fold Cross value of 10, and employing Stratified Sampling, achieving an Accuracy of 90.68%, Precision of 90.63%, and Recall of 100%. On the other hand, Google Bard's optimal performance is achieved with the Decision Tree algorithm, a K-Fold Cross value of 10, and Shuffled Sampling, resulting in an Accuracy of 74.00%, Precision of 72.73%, and Recall of 98.77%. The study concludes that Chat GPT-3.5 and Google Bard are viable alternatives for text pre-processing in Sentiment Analysis. Performance measurements indicate that Chat GPT-3.5 outperforms Google Bard, achieving an Accuracy of 90.68%, Precision of 90.63%, and Recall of 100%. These results were validated by comparing them to human annotations, which achieved an accuracy score of 85.20%, Precision of 85.71%, and Recall of 99.03% when using the Decision Tree algorithm with a K-Fold Cross value of 10 and employing Stratified Sampling. This suggests that Chat GPT-3.5's text pre-processing performance is on par with human annotations.


INTRODUCTION
Text Pre-Processing is an important stage and step in Sentiment Analysis, as most text elements such as characters, words, and sentences go through the entire Text Pre-Processing process [1]- [2].This step involves transforming raw data into a more understandable format [3].Sentiment Analysis is a process of analyzing a specific event, issue, or problem to determine the responses or opinions regarding that event [3]- [5].Sentiment Analysis functions in developing systems to analyze, identify, and express opinions or sentiments, including Positive, Negative, or Neutral sentiments [6].Sentiment Analysis is commonly performed using Natural Language Processing (NLP), where the main goal of NLP is to train machines to understand the meaning of human language and provide appropriate feedback [7].
The Text Pre-Processing stage is commonly performed using tools such as Rapidminer and Python.With the advancement of technology, Chatbots have emerged, which are Artificial Intelligence (AI) products capable of engaging in automated conversations with humans using natural language, without limitations of space and time [8].
Chat Generative Pre-trained Transformer (Chat GPT-3.5 Open AI) Open AI is a renowned Chatbot with the ability to interact well with humans and provide answers to given questions.It gained popularity in November 2022 as it combines Reinforcement Learning Algorithm with input from humans, leveraging over 150 billion parameters [9].Chat GPT-3.5 Open AI has now evolved to its fourth version, known as Chat GPT-3.5 Open AI-4, functioning as a personal assistant to humans and capable of performing various tasks and answering inquiries.These include assisting in article creation, generating song lyrics, translating languages, and even creating automated chats in chat apps, among many other capabilities [10].
Google Bard serves as a competitor to Chat GPT-3.5 Open AI, employing the Language Model for Dialogue Applications (LaMDA) [11].As the name suggests, this Chatbot is a product of Google, aiming to combine their extensive knowledge of the world with the power, intelligence, and creativity of their large-scale language model.Google Bard assists in information retrieval and can even explain new discoveries from the James Webb Space Telescope to a 9-year-old child [12].
Several previous studies have addressed Text Pre-Processing, emphasizing its importance in Sentiment Analysis [13]- [15].Further research has shown that techniques such as stopword removal, stemming, and feature selection methods can improve performance by 20.4% [16].Another study focused on Text Pre-Processing, specifically cleaning, case folding, and stemming techniques without stopword removal, resulting in a 94.24% accuracy increase [17].Research on Sentiment Analysis utilizing the Decision Tree algorithm has demonstrated good accuracy, such as achieving 99.01%accuracy when applied to an e-commerce application rating dataset [18], and achieving 84.78% accuracy when applied to a Twitter dataset [19].There are also studies that employ the Naïve Bayes algorithm for Sentiment Analysis, achieving accuracies of 76.39% on a Twitter dataset [20] and 73.57% [10].
Previous research has predominantly utilized Rapidminer applications, as seen in studies [3], [4], [6], [21], [22], while others have used Python, as seen in studies [5], [23]- [26].The average accuracy value resulting from these three studies using Python is 73.33%.To identify the differences between current research and previous studies, a Research Gap is presented in the form of a table, as shown in Table 1.Stratified Sampling is employed when dealing with populations that are divided into groups [27].Within each group, sample selection is conducted randomly and systematically.Linear Sampling is a sampling method that divides a sample set into partitions while preserving the original order [27], [28].Shuffled Sampling is a random sampling technique that generates a random subset from a portion of the data examples [27], [29].
The dataset used was the result of crawling from 1000 Twitter users.The keyword used during the data crawling on Twitter was "Artificial Intelligence," which referred to the increasing popularity of Artificial Intelligence in recent years [30].Therefore, this keyword was chosen to observe Twitter users' opinions based on that keyword.
The Decision Tree algorithm is used in this study due to its advantages in handling attributes with discrete or continuous data types, ability to handle missing values, capability for tree pruning, and attribute selection using gain ratio [31].he use of this algorithm is supported by a research [32], which showed that the Decision Tree algorithm achieved 91% accuracy.
The Naïve Bayes algorithm is also employed as a comparison to the Decision Tree algorithm.Naïve Bayes is chosen for its simplicity, low computational complexity, and fast and efficient processing [33], [34].This argument is validated by research [35], which demonstrated an accuracy rate of 96.24% for Naïve Bayes.
This study aims to utilize Chat GPT-3.5 Open AI-3.5 and Google Bard as alternative Text Pre-Processing methods in the Sentiment Analysis process of Twitter datasets.This research is essential as there is limited investigation on Text Pre-Processing using Chat GPT-3.5 Open AI and Google Bard.The performance of the Text Pre-Processing results from both Chatbots will be measured using the Decision Tree and Naïve Bayes algorithms.stage.Fig. 1 represents the complete workflow of each stage.First, the Dataset used consists of crawled data from Twitter.Second, the Text Pre-Processing stage is conducted using Chat GPT-3.5 Open AI and Google Bard.Third, the performance of the Text Pre-Processing stage is measured using K-Fold Cross Validation.Fourth, the accuracy results are compared between Chat GPT-3.5 Open AI and Google Bard to determine which one performs better.

Data Collection
The collected dataset is the result of Crawling technique.Crawling is an extraction technique from Twitter that utilizes an application programming interface (API) to access the information contained within [36].This research gathered the Crawling data from Twitter, consisting of 1,000 records, using the Rapidminer application and entering the keyword "Artificial Intelligence" for data Crawling.The crawling process using RapidMiner is presented as depicted in Figure 2, and The initial data presentation is provided in tabular form, as shown in Table 2.There are five attributes in Table 2, namely Created-At, Language, Source, Retweet-Count, and Text.This research only utilizes the Text attribute in the Sentiment Analysis process, hence the remaining four attributes will be eliminated from the dataset.The Crawling results from Twitter do not have sentiment attributes for the tweets, thus it is necessary to assign sentiment to all the tweets.The sentiment categories that will be assigned are Positive, Negative, and Neutral.The process of labeling involved assigning labels to the review dataset.The prevalent method of labeling often entailed manual annotation with the aid of linguistic specialists.Manual annotation by human was applied in this study to serve as a comparison to the annotations generated by Chat GPT-3.5 Open AI and Google Bard.Manual annotation by human was performed by the research [37], yielding an accuracy rate of 79.29%.

Text Pre-Processing
Text Pre-Processing in Sentiment Analysis is a stage aimed at eliminating inconsistent data, duplicate data, and data that does not influence the polarity of the existing documents [38].The Text Pre-Processing stage in this research is divided into two activities.The first activity is Text Pre-Processing using Chat GPT-3.5 Open AI and Google Bard.The tasks to be 1422 Jurnal Teknik Informatika (JUTIF), Vol. 4, No. 6, Desember 2023, pp.1419-1430 performed by Chat GPT-3.5 Open AI and Google Bard are as follows:

Text Cleaning
Text Cleaning aims to remove punctuation, generalize the usage of uppercase or lowercase letters, eliminate duplicate tweets, and correct spelling [39].In the Text attribute, the following elements are removed during this process: Twitterspecific attributes such as the characters RT, @, #, and links.

Transform Cases
This stage aims to convert all text in the Text attribute to either lowercase or uppercase [40].In this research, lowercase is used for all text.

Labeling
The objective of this stage is to assign sentiment to the collected dataset.A new attribute, Sentiment, will be added to the dataset.The labeling categories used are Positive, Negative, and Neutral.The labeling process could be done manually by humans or automatically using algorithms.Manual labeling involved reading and human interpretation of the text, while automatic labeling used natural language processing and machine learning techniques.
The labeling process was not only carried out using a Chatbot but was also annotated by humans by examining the context of each dataset.This annotation was performed by three researchers to avoid misinterpretation of each sentence in the dataset.The purpose of human annotation is to serve as a comparison to the labeling results generated by the Chatbot.
All these stages are performed using Chat GPT-3.5 Open AI and Google Bard with a relatively fast estimated time, depending on the commands provided in the chat column of both Chatbots.This process takes approximately 5 minutes for each stage, using the following device specifications: 1. Processor Intel Pentium P6000 (1.86 GHz, 3 MB L3 Cache); 2. Intel HD Graphics; 3. 3 GB DDR 3 Memory; 4. 320 GB HDD.must be embedded into the text and not supplied separately.
With such specifications, it can work in approximately 5 minutes, and especially if the specifications exceed that, the job is likely to be much faster.The second activity is Text Pre-Processing using Rapidminer.The processes performed are as follows:

Tokenize
Tokenization is the process of splitting text into small pieces called tokens, aiming to facilitate computer processing of textual data [41]- [43]..

Filter Stopwords
Stopword filtering is the process of eliminating frequently occurring words such as "a," "the," "of," "and," and "an," which do not contribute to the analysis process [44], [45].This process also aims to reduce the dimensionality of the data, thereby speeding up computation time.

Stemming
Stemming aims to transform words into their base form by removing prefixes and suffixes from the words [46], [47].

Filter Token By Length
Filtering tokens by length refers to the procedure of removing words from the text that have a specific number of characters [48].This process utilizes the Rapidminer application and involves setting the minimum character limit to 4 and the maximum character limit to 25.The goal is to restrict the length of words within the text, ensuring they are at least 4 characters long but no longer than 25 characters.

Validation
The Validation stage will display information on the accuracy performance of the constructed model.The algorithms used in this Validation process are the Decision Tree Algorithm and Naïve Bayes Algorithm, employing the K-Fold Cross-Validation (KCV) validation operator.Cross-validation is a technique used to assess the generalizability of statistical analysis outcomes to an independent dataset [49].It involves evaluating how well the results of a model or analysis can be applied to new and unseen data.KCV involves dividing the dataset into k parts and performing k iterations.In each iteration, one part of the dataset is chosen as the testing data, while the remaining k -1 parts are used for training.This process is repeated k times, and the average deviation (error) value is calculated based on the k different test results [50].

Evaluation
The final stage is Evaluation.The evaluation stage involves assessing the outcomes of applying the model to determine if it has achieved the research objectives.Based on this evaluation, a decision is made regarding the utilization of the modeling results [51].During this stage, the accuracy value generated by the constructed model is evaluated.Accuracy refers to the percentage of data records that are correctly classified after testing the classification results [52].
Figure. 9 demonstrates the differences in label distribution between Chat GPT-3.5 Open AI and Google Bard.Google Bard assigns a higher proportion of Positive sentiment, specifically 46%, in the discussion of Artificial Intelligence, compared to Chat GPT-3.5 Open AI, which assigns 40%.On the other hand, Chat GPT-3.5 Open AI assigns a lower proportion of Negative sentiment, specifically 19%, compared to Google Bard, which assigns 34%.As for the Neutral sentiment, Chat GPT-3.5 Open AI assigns a larger proportion, specifically 41%, while Google Bard assigns 20%.Moving forward, this data will be visualized using a Word Cloud.

Validation
The Validation stage utilizes the Rapidminer application to measure the performance resulting from the Text Pre-Processing steps of Chat GPT-3.5 Open AI and Google Bard.The algorithms employed are Decision Tree and Naïve Bayes, which are then connected to the validation operator called K-Fold Cross Validation.In this stage, various values of K are utilized, including k=2, k=4, k=6, k=8, and k=10.These values are combined with three sampling methods: Linear, Stratified, and Shuffled sampling.The process of this stage is presented in a visual format, as shown in Figure .The validation process has been conducted for the Model Process Chat GPT-3.5 Open AI and the Model Process Google Bard.The next stage is to examine the results of the validation process, which will be discussed in the subsequent subsection, namely Evaluation.

Evaluation
The experimental results, combining various values of k, two algorithms, and three sampling methods, show the performance of the process model as presented in Table 4.  Table 4 presents the measurement results of the Text-Pre Processing that were carried out using Chat GPT-3.5 Open AI and Google Bard with the implementation of Decision Tree and Naïve Bayes algorithms.For Chat GPT-3.5 Open AI, the highest accuracy was achieved through the combination of the Decision Tree algorithm using the Stratified Sampling method, with an accuracy score of 90.68%, precision of 90.14%, and recall of 100%.As for Google Bard, the best performance was observed with the combination of the Decision Tree algorithm using the Shuffled Sampling method, resulting in an accuracy score of 68.27%, precision of 72.72%, and recall of 98.77%.
When the results of Text Pre-Processing between Chat GPT-3.5 Open AI and Google Bard were compared, it was Chat GPT-3.5 Open AI that exhibited the best performance with an accuracy score of 90.68%, precision of 90.14%, and recall of 100%.However, these data were not considered valid until there was a point of comparison.Therefore, the next step involved comparing the results of Text Pre-Processing by Chat GPT-3.5 Open AI with the results of Text Pre-Processing using Annotation by Human.The comparative results between Text Pre-Processing Annotation by human with Chat GPT-3.5 Open AI are presented in Table format, as seen in Table 5.Based on Table 5, the best performance values obtained from Text Pre-Processing Annotation by Human used as a benchmark against Chat GPT-3.5 Open AI, were achieved using the Decision Tree algorithm and the Stratified Sampling method, with an Accuracy of 85.20%, Precision of 85.71%, and Recall of 99.03%.This indicates that the Text Pre-Processing generated by Chat GPT-3.5 Open AI outperformed the performance results of Annotation by Human, where the Accuracy achieved was 90.68%, Precision was 90.63%, and Recall was 100%.This study successfully found that chatbots like ChatGPT and Google Bard can perform the Text Pre-Processing phase in Sentiment Analysis.The performance measurement generated by the chatbots was conducted using classification, ultimately resulting in performance metrics such as accuracy, precision, and recall.Subsequently, validation was carried out by comparing the performance results to those obtained through human annotation.The comparative results are presented in the form of a graph, as shown in Fig. 11.The best Text Pre-Processing results are demonstrated by Chat GPT-3.5 Open AI, which achieves an Accuracy score of 90.68%, Precision 90.63%, and Recall 100% using a combination of the Decision Tree algorithm, 10-Fold Cross Validation, and Stratified Sampling as the sampling method.On the other hand, Google Bard obtains a lower performance score, with an Accuracy of 68.27%, Precision 72.73%, and Recall 98.77%.This result is obtained using the Decision Tree algorithm, 10-Fold Cross Validation, and Shuffled Sampling as the sampling method.The results of Text Pre-Processing using Chat GPT-3.5 Open AI surpassed the values generated by Google Bard and even the benchmark models Annotation by Human with an Accuracy of 85.20%, Precision of 85.71%, and Recall of 99.03%.The possible reason for Chat GPT-3.5 Open AI outperforming Google Bard is that Google Bard was still in the testing phase and had not been launched yet.In contrast, Chat GPT-3.5 Open AI had already been launched in 2022.
This research proves that Chatbots such as Chat GPT-3.5 Open AI and Google Bard can be considered as alternative text pre-processing methods in Sentiment Analysis, offering an option alongside Python-based text pre-processing as presented in studies [5], [24]- [26], or solely relying on Rapidminer as shown in studies [3], [4], [6], [21], [22].This research stands out as unique compared to previous studies because it combines widely used Text Pre-Processing tools like Rapidminer with Artificial Intelligence products, namely Chat GPT-3.5 Open AI and Google Bard.Furthermore, this study also aims to compare the performance achieved by these two Chatbots.The objective is to obtain more optimal results in Sentiment Analysis.Moreover, the performance achieved is also considered to be in the category of Fair Classification, indicating a satisfactory level of accuracy.

CONCLUSION
This research has successfully contributed to Text Pre-Processing in Sentiment Analysis by utilizing Chatbot Chat GPT-3.5 Open AI and Google Bard for Text Cleaning, Transform Cases, and Labeling.The best performance results were achieved by Chat GPT-3.5 Open AI, using a combination of the Decision Tree Algorithm, 10-Fold Cross Validation and the Stratified Sampling method, resulting in an accuracy score of 90.68%, precision 90.63%, and recall 100%.These findings demonstrate that Chatbots such as Chat GPT-3.5 Open AI and Google Bard are suitable as Alternative Text Pre-Processing methods in the Sentiment Analysis process.There are some limitations in this study.Firstly, the research focuses on testing the capabilities of Chatbots, namely Chat GPT-3.5 Open AI and Google Bard, in the Text Pre-Processing process, without making improvements to the constructed process model.Secondly, the algorithms used, Decision Tree and Naïve Bayes, are employed solely for the purpose of comparing the performance of Text Pre-Processing generated by the two Chatbots.In future research, improvements to the performance of the constructed model can be applied to Alternative Text Pre-Processing using Chat GPT-3.5 Open AI and Google Bard, such as the use of Feature Selection or Backward Elimination.Additionally, the inclusion of other comparative algorithms can be considered to determine if there are alternative algorithms besides Decision Tree that exhibit more optimal performance.

Figure 1 .
Figure 1.Research Method.The proposed research framework aims to compare the performance results of Chat GPT-3.5 Open AI and Google Bard in the Text Pre-Processing

Indri
Tri Julianto, et.al., ENHANCING SENTIMENT ANALYSIS WITH … 1423 Bard.The Text Pre-Processing stages performed are Text Cleaning, Transforming Cases, and Labeling.As for the results of the Text Cleaning stage, they are presented in the form of images, as shown in Figure.3 and Figure.4.
Figure. 3 (a) instructions to clean text by Chat GPT-3.5 Open AI (b) Cleaning Result (a) (b) Figure.4 (a) Instructions to clean text by Google Bard (b) Cleaning result In Figure.3 and Figure.4, it can be observed that both Chat GPT-3.5 Open AI and Google Bard are capable of cleaning text from unnecessary characters, resulting in clear and meaningful sentences.The given commands must be specific, aiming for the desired outcome.In this study, the command provided for Cleaning Text is "Please remove the following sentences from unnecessary characters like @# and preserve the Twitter attributes for each number without altering the original sentence."The subsequent step performed by Chat GPT-3.5 Open AI and Google Bard is the transformation of cases.The outcomes of this step are presented in Figure.5 and Figure.6, as depicted.(a) (b) Figure.5 (a) Transform Cases by Chat GPT-3.5 Open AI (b) Transform Cases Result 1424 Jurnal Teknik Informatika (JUTIF), Vol. 4, No. 6, Desember 2023, pp.1419-1430 (a) (b) Figure.6 (a) Transform Cases by Google Bard (b) Transform Cases Result After the Cleaning Text process, each sentence is then transformed in the Transform Cases step into lowercase, ensuring uniformity in the text format across all datasets.Both Chat GPT-3.5 Open AI and Google Bard were given the same instruction, which is to "Change each number of sentences into lowercase."As depicted in Fig. 4 and Fig. 5, both chatbots successfully carried out this instruction.The final step performed by Chat GPT-3.5 Open AI and Google Bard is Labeling.The outcomes of this step are presented in the form of images, as shown in Fig. 7 and Fig. 8. (a) (b) Figure.7 (a) Labeling by Chat GPT-3.5 Open AI (b) Labeling Result (a) (b) Figure.8 (a) Labeling by Google Bard (b) Labeling Result The Labeling step was successfully performed by both Chat GPT-3.5 Open AI and Google Bard, where by inputting the same command, "Assign sentiment for each sentence as Positive, Negative, or Neutral," they provided sentiment results for each sentence in the dataset.The Text Pre-Processing step using Chat GPT-3.5 Open AI and Google Bard is now completed, and the output from this step is then inputted into Ms.Excel for later retrieval using Rapidminer.The Text Pre-Processing conducted using Rapidminer includes Tokenization, Stopword Filtering, Stemming, and Token Filtering (by length).The outcomes of this step are presented in Figure. 9 (a) Percentage Rate Chat GPT-3.5 Open AI (b) Percentage Rate Google Bard
(a) Model Process Chat GPT-3.5 Open AI (b) Model Process Google Bard

Table 3 .
Text Pre-Processing Result

Table 5 .
Benchmark Modelling Result