• Word Sense Disambiguation
    Prachi rathore and Sonam Gupta, Banasthali university, India
    ABSTRACT

    This paper reviews various approaches of Word Sense Disambiguation problem. Word Sense Disambiguation is an open and very common problem in Natural Language Processing that consists of the process of finding out which sense of the word is to be used in a particular sentence.

  • Pattern Recognition System based on Distributed Computing Architectures: Clusters, Peer to Peer, and Data grid
    Hamdi Hassen 1,3 and Maher Khemakhem2,3, 1Taibah University, KSA, 2University of King Abulaziz, KSA and 3University of Sfax, Tunisia
    ABSTRACT

    This work deals with offline distributed handwriting recognition system based on distributed computing architecture. We present a pattern recognition system for large amount of document for isolated handwritten Arabic words.Our recognition system is a Distributed Optical Character Recognition (DOCR) application via distributed computing architecture. The originality of our approach is the way we deal with large amount of Arabic Manuscript to digitize .We have introduced a new Arabic handwriting pattern Recognition designed to take advantage of distributed computing architecture. We have demonstrated that our approach present a very interesting framework which can lead to the implementation of powerful and speedup handwriting Pattern Recognition Systems.The experiments were conducted on the Omnivore platform: Grid computing Meta-Scheduling system and P2P Technologies in the Department of Mathematics and Computer Science, University of Marburg, Germany, with a real large scaled dataset from the IFN/ENIT database. Experimental results prove the validity of our approach to speed up the pattern recognition process.

  • ARAD (Automatic Recognition of Arabic Digit)
    Hamadouche Maamar1,Mhania Guerti2 and Tebbi Hanane3, 1United States Disciplinary Barracks,Algeria, 2Ecole Nationale Polytechnique Alger,Algeria and 3LRIA,Algeria
    ABSTRACT

    In this work we present our automatic speech recognition system of Arabic digits ARAD, as well as the methodology used for its design, which is realized in two essential courses: the creation of the acoustic database (corpus development), and the recognition of the read signal. We explain the basics modules that compose it, starting from the signal acquisition and finishing to the decision taken. We demonstrate the algorithms used in the implementation of the DTW method and validate it by the obtained results. Our ASR system contains a Front-End module which is used to convert the input sound into feature vectors that are based on Mel Frequency Cepstral Coefficients (MFCCs), to be usable by the rest of the system especially the recognition module that use the method mentioned before; DTW, for the comparison task. The obtained results show that the system presents a rate of 96% of success recognition on the three corpuses which we recorded in a noised environment.

  • Latent Semantic Word Sense Disambiguation Using Global Co-occurrence Information
    Minoru Sasaki, Ibaraki University, Japan
    ABSTRACT

    In this paper, we propose a novel word sense disambiguation method based on the global co-occurrence information using NMF. When we calculate the dependency relation matrix, the existing method tends to produce very sparse co-occurrence matrix from a small training set. Therefore,the NMF algorithm sometimes does not converge to desired solutions. To obtain a large number of co-occurrence relations,we propose to use co-occurrence frequencies of dependency relations between word features in the whole training set.This enables us to solve data sparseness problem and induce more effective latent features. To evaluate the efficiency of the method of word sense disambiguation, we make some experiments to compare with the result of the two baseline methods. The results of the experiments show this method is effective for word sense disambiguation in comparison with the all baseline methods. Moreover, the proposed method is effective for obtaining a stable effect by analyzing the global co-occurrence information.

  • Opinion Mining for Colloquial Varieties of Arabic Language
    1Ahmed Y. Al-Obaidi and 2Venus Samawi, 1Arab Academy for Banking and Financial Sciences,Jordan and 2Al Albayt univrsity,Jordan
    ABSTRACT

    In this paper, we tackle the problem of sentiment analysis for colloquial varieties of Arabic language. To our knowledge, this is the first attempt to address this problem. We present a supervised system to classify reviews into positive or negative. As part of this study, we built a corpus1 of reviews. The collected reviews are written in Jordanian and Saudi dialects. The corpus contains 28,576 reviews, which represents opinions of 5,422 different reviewers, covering 27 different categories. In social networks and reviewing sites, Arabic users usually express their opinion using their colloquial dialects and not using standard Arabic. Therefore, colloquial varieties of Arabic language deserve to get more attention in research field. The problem of reviews classification written in Arabic dialects is challenging problem. Same words may give different meanings, and have dissimilar synonyms in different dialects. Stop words varies from dialect to another in Arabic, and also differ from Modern Standard Arabic (MSA). Consequently, we suggested a list of stop words that suits all dialect as well as MSA. A modified light stemmer is developed and used in this work. We suggested suitable preprocessing steps before extracting features. Different feature sets are used (bag of words, and N-gram of words). Finally, Naive Bays, SVM, and Maximum Entropy classifiers are used and evaluated using F1 measure. The testing results indicated that Maximum Entropy classifier outperform other algorithms with F1 measure of 86.75%.

  • Modeling of speech Synthesis of Standard Arabic using an Expert System
    Tebbi. Hanane and Azzoune. Hamid, University of Science and Technology Houari Boumediene, Algeria
    ABSTRACT

    In this work we present our expert system of speech synthesis based on a text written in Standard Arabic, our work is carried out in two great stages: the creation of the sound data base, and the transformation of the written text into speech (Text To Speech TTS). This transformation is done firstly by a Phonetic Orthographical Transcription (POT) of any written Standard Arabic text that to transform it into his corresponding phonetics sequence, and secondly by the generation of the voice signal which corresponds to the chain transcribed. We spread out the different steps of conception of the system, as well as the results obtained compared to others manners of works studied to realize TTS based on Standard Arabic.

Copyright (c) NLP 2014