Comparative analysis of NLP feature extraction methods for SMS spam classification

Authors

  • Kayode Sheriffdeen University name: Ladoke Akintola University of technology Ogbomoso Author

DOI:

https://doi.org/10.21590/ijtmh.2023090107

Abstract

Short Message Service (SMS) spam classification is a critical application of natural language processing (NLP) aimed at mitigating unsolicited and malicious communications. This study presents a comparative analysis of widely used NLP feature extraction methods for SMS spam detection, evaluating their effectiveness, efficiency, and robustness. Traditional approaches such as Bag-of-Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF) are compared with distributed representations including Word2Vec, GloVe, and contextual embeddings derived from transformer-based models. Using standard benchmark SMS datasets, these feature extraction techniques are assessed in conjunction with common machine learning classifiers. Performance is evaluated using metrics such as accuracy, precision, recall, F1-score, and computational cost. The results highlight the strengths and limitations of each method, showing that while traditional features offer simplicity and efficiency, advanced embeddings provide superior contextual understanding and classification performance. The study offers practical insights to guide the selection of feature extraction methods for effective and scalable SMS spam classification systems.

Downloads

Published

2023-03-10