To read this content please select one of the options below:

Detection of phishing websites using a novel twofold ensemble model

Kalyan Nagaraj (Department of Computer Science and Engineering, RV College of Engineering, Bangalore, India)
Biplab Bhattacharjee (School of Management Studies, National Institute of Technology, Calicut, India)
Amulyashree Sridhar (Department of Computer Science and Engineering, RV College of Engineering, Bangalore, India)
Sharvani GS (Department of Computer Science and Engineering, RV College of Engineering, Bangalore, India)

Journal of Systems and Information Technology

ISSN: 1328-7265

Article publication date: 18 October 2018

Issue publication date: 14 November 2018

562

Abstract

Purpose

Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of anonymous access to vulnerable details. Such attacks often result in substantial financial losses. Thus, there is a need for effective intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing and non-phishing web content is a critical task in information security protocols, and full-proof mechanisms have yet to be implemented in practice. The purpose of the current study is to present an ensemble machine learning model for classifying phishing websites.

Design/methodology/approach

A publicly available data set comprising 10,068 instances of phishing and legitimate websites was used to build the classifier model. Feature extraction was performed by deploying a group of methods, and relevant features extracted were used for building the model. A twofold ensemble learner was developed by integrating results from random forest (RF) classifier, fed into a feedforward neural network (NN). Performance of the ensemble classifier was validated using k-fold cross-validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support system for classifying websites as phishing or legitimate ones.

Findings

Experimental simulations were performed to access and compare the performance of the ensemble classifiers. The statistical tests estimated that RF_NN model gave superior performance with an accuracy of 93.41 per cent and minimal mean squared error of 0.000026.

Research limitations/implications

The research data set used in this study is publically available and easy to analyze. Comparative analysis with other real-time data sets of recent origin must be performed to ensure generalization of the model against various security breaches. Different variants of phishing threats must be detected rather than focusing particularly toward phishing website detection.

Originality/value

The twofold ensemble model is not applied for classification of phishing websites in any previous studies as per the knowledge of authors.

Keywords

Citation

Nagaraj, K., Bhattacharjee, B., Sridhar, A. and GS, S. (2018), "Detection of phishing websites using a novel twofold ensemble model", Journal of Systems and Information Technology, Vol. 20 No. 3, pp. 321-357. https://doi.org/10.1108/JSIT-09-2017-0074

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

Related articles