Download PDFOpen PDF in browserCyber Threat Discovery from Dark Web10 pages•Published: September 26, 2019AbstractIn the darknet, hackers are constantly sharing information with each other and learning from each other. These conversations in online forums for example can contain data that may help assist in the discovery of cyber threat intelligence. Cyber Threat Intelligence (CTI) is information or knowledge about threats that can help prevent security breaches in cyberspace. In addition, monitoring and analysis of this data manually is challenging because forum posts and other data on the darknet are high in volume and unstructured. This paper uses descriptive analytics and predicative analytics using machine learning on forum posts dataset from darknet to discover valuable cyber threat intelligence. The IBM Watson Analytics and WEKA machine learning tool were used. Watson Analytics showed trends and relationships in the data. WEKA provided machine learning models to classify the type of exploits targeted by hackers from the form posts. The results showed that Crypter, Password cracker and RATs (Remote Administration Tools), buffer overflow exploit tools, and Keylogger system exploits tools were the most common in the darknet and that there are influential authors who are frequent in the forums. In addition, machine learning helps build classifiers for exploit types. The Random Forest classifier provided a higher accuracy than the Random Tree and Naïve Bayes classifiers. Therefore, analyzing darknet forum posts can provide actionable information as well as machine learning is effective in building classifiers for prediction of exploit types. Predicting exploit types as well as knowing patterns and trends on hackers’ plan helps defend the cyberspace proactively.Keyphrases: bayesian algorithm, cyber analytic, cyber threat intelligence, deep web or darknet, exploit type, hacker forums, machine learning classification, tree based algorithm In: Frederick Harris, Sergiu Dascalu, Sharad Sharma and Rui Wu (editors). Proceedings of 28th International Conference on Software Engineering and Data Engineering, vol 64, pages 174-183.
|