A Probabilistic Approach to Pe...

  تاریخ انتشار : 1393/2/6   نام نشریه : 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden,   تعداد صفحات : 5
A Probabilistic Approach to Persian Ezafe Recognition

چکیده مقاله

In this paper, we investigate the problem of Ezafe recognition in Persian language. Ezafe is an unstressed vowel that is usually not written, but is intelligently recognized and pronounced by human. Ezafe marker can be placed into noun phrases, adjective phrases and some prepositional phrases linking the head and modifiers. Ezafe recognition in Persian is indeed a homograph disambiguation problem, which is a useful task for some language applications in Persian like TTS. In this paper, Part of Speech tags augmented by Ezafe marker (POSE) have been used to train a probabilistic model for Ezafe recognition. In order to build this model, a ten million word tagged corpus was used for training the system. For building the probabilistic model, three different approaches were used; Maximum Entropy POSE tagger, Conditional Random Fields (CRF) POSE tagger and also a statistical machine translation approach based on parallel corpus. It is shown that comparing to previous works, the use of CRF POSE tagger can achieve outstanding results.

نویسندگان : حبیب‌اله اصغری، جلال ملکی، هشام فیلی