Proceedings of
International Conference on Advances in Computing, Electronics and Communication ACEC 2013
"ARABIC TEXT CATEGORIZATION USING ROCCHIO MODEL"
Abstract: “Automatic text categorization is considered an important application in natural language processing. It is the process of assigning a document to predefined categories based on its content. In this research, some well-known techniques developed for classifying English text are considered to be applied on Arabic. This work focuses on applying the well-known Rocchio (Centroid-based) technique on Arabic documents. This technique uses centroids to define good class boundaries. The centroid of a class c is computed as center of mass of its members. Arabic language is highly inflectional and derivational which makes text processing a complex task. In the proposed work, first Arabic text is preprocessed using tokenization and stemming techniques. Then, the Rocchio Algorithm is adopted and adapted to be applied to classify Arabic documents. The implemented algorithm is evaluated using a corpus containing a set of actual documents. The results show that the adapted Rocchio algorithm is applicab”
Keywords: Rocchio algorithm, Centroid-based Algorithm, Text Mining, Machine Learning, Arabic Text Categorization, Arabic Text Classification.