Download PDFOpen PDF in browser

Supervised and Unsupervised Learning Techniques Utilizing Malware Datasets

EasyChair Preprint 9667

7 pagesDate: February 4, 2023

Abstract

Malware continues to gain momentum as it becomes more sophisticated against detection. Monitoring tools and antivirus software do not have the ability to keep up with the ever-going changes of these malignant variants. Due to these dilemmas, machine learning has gained popularity in classification and detection of malware related data. In this study, two separate datasets, Malware-Exploratory and CIC-MalMem-2022, undergo a series of supervised and unsupervised learning procedures to first gather information for observation. The developed model in this research utilizes three clustering algorithms for analysis, K-Means, DBSCAN, and GMM. The model also uses seven classification algorithms for predicting malware including Decision Tree, Random Forest, Ada Boost, KNeighbors, Stochastic Gradient Descent, Extra Trees, and Gaussian Naïve Bayes. Results have shown that Malware-Exploratory dataset averaged an accuracy score of 90% while CIC-MalMem-2022 dataset averaged a score of 99%. Both datasets also showed consistency across all three clustering algorithms. Besides, correlation between variables do not necessarily need to be highly related for malware detection. Future studies will determine if the results remain stable against feature selection and genetic algorithms.

Keyphrases: Gaussian Mixture Model (GMM), Supervised Machine Learning, Unsupervised Machine Learning, area under the curve-receiver operating characteristics (AUC-ROC), density-based spatial clustering of applications with noise (DBSCAN), hierarchical density-based spatial clustering of applications with noise (HDBSCAN)

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:9667,
  author    = {Daryle Smith and Sajad Khorsandroo and Kaushik Roy},
  title     = {Supervised and Unsupervised Learning Techniques Utilizing Malware Datasets},
  howpublished = {EasyChair Preprint 9667},
  year      = {EasyChair, 2023}}
Download PDFOpen PDF in browser