Arabic script language identification using letter frequency neural networks
International Journal of Web Information Systems
ISSN: 1744-0084
Article publication date: 21 November 2008
Abstract
Purpose
With the rapid emergence and explosion of the internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online from the world wide web. Efficiently and effectively managing these documents written in different languages is important to organizations and individuals. Therefore, the purpose of this paper is to propose letter frequency neural networks to enhance the performance of language identification.
Design/methodology/approach
Initially, the paper analyzes the feasibility of using a windowing algorithm in order to find the best method in selecting the features of Arabic script documents language identification using backpropagation neural networks. Previously, it had been found that the sliding window and non‐sliding window algorithm used as feature selection methods in the experiments did not yield a good result. Therefore, this paper proposes, a language identification of Arabic script documents based on letter frequency using a backpropagation neural network and used the datasets belonging to Arabic, Persian, Urdu and Pashto language documents which are all Arabic script languages.
Findings
The experiments have shown that the average root mean squared error of Arabic script document language identification based on letter frequency feature selection algorithm is lower than the windowing algorithm.
Originality/value
This paper highlights the fact that using neural networks with proper feature selection methods will increase the performance of language identification.
Keywords
Citation
Selamat, A. and Ng, C. (2008), "Arabic script language identification using letter frequency neural networks", International Journal of Web Information Systems, Vol. 4 No. 4, pp. 484-500. https://doi.org/10.1108/17440080810919503
Publisher
:Emerald Group Publishing Limited
Copyright © 2008, Emerald Group Publishing Limited