Farsi lexical analysis and stop word list
Abstract
Purpose
The purpose of this article is to present an aggregated methodology for construction of the stop word list in Farsi language and generate a generic Farsi stop word list.
Design/methodology/approach
The stop word list is extracted based on: syntactic classes, domain dependent, corpus statistic and expert judgments. Some of the main challenges that arise in the Farsi automatic text processing are outlined as well.
Findings
Results from the techniques are aggregated and a general Farsi stop word list containing 927 words is generated.
Practical implications
The created stop word list can affect the efficiency and effectiveness of retrieval and indexing process in Farsi information retrieval system, moreover, it can play an important role during Farsi text segmentation.
Originality/value
Our stop word extraction algorithm is a promising technique; it could be applied into other languages that they have ambiguities in automatic text segmentation.
Keywords
Citation
Davarpanah, M.R., Sanji, M. and Aramideh, M. (2009), "Farsi lexical analysis and stop word list", Library Hi Tech, Vol. 27 No. 3, pp. 435-449. https://doi.org/10.1108/07378830910988559
Publisher
:Emerald Group Publishing Limited
Copyright © 2009, Emerald Group Publishing Limited