Spam SMS Detection for Turkish Language with Deep Text Analysis and Deep Learning Methods
Citation
Karasoy, O., Ballı, S. Spam SMS Detection for Turkish Language with Deep Text Analysis and Deep Learning Methods. Arab J Sci Eng (2021). https://doi.org/10.1007/s13369-021-06187-1Abstract
With the increasing number of mobile users day by day, the security of mobile phones is an important issue. SMS service available as standard in all users; advertising makes it a preferred method of promotion agencies. Although SMS is not used extensively today, it is still one of the fastest and low-cost ways to reach mobile phone users. This situation directs the institutions to use SMS, which want to advertise, inform and promote the products. However, messages sent without the permission of SMS users pose a serious security problem. In this study, content-based SMS classification has been carried out by using machine learning and deep learning methods to filter out unwanted messages for Turkish Language. TurkishSMS data set has been prepared by collecting messages received from different age groups and regions of people. There are five different structural features, two new features found with Word2Vec and 45 features created with the word index values of each message in the TurkishSMS data set. The feature matrix, which consists of 52 features in total, has been evaluated with deep learning algorithms as well as traditional machine learning algorithms and the results have been compared. As a result, the convolutional neural network has been found as the most successful algorithm with an accurate classification rate of 99.86%.