تاریخ امروز: 1405/4/23 (English)

Compiling a Text Re-Use Detect...

خانه گروه های پژوهشی جزئیات گروه پژوهشی مقالات همایش جزئیات Compiling a Text Re-Use Detect...

تاریخ انتشار : 1396/9/14 نام نشریه : The 21th international conference on Asian Language Processing- Singapore تعداد صفحات : 4

Compiling a Text Re-Use Detection Corpus from Scientific Papers with Semi-Real Cases of Plagiarism

چکیده مقاله

Automatic plagiarism detection deals with retrieval
of reused fragment of texts in a document and finding
source documents. Due to development of various methods
for plagiarism detection, large scale plagiarism corpora are
needed to evaluate these methods. Despite of their importance,
few plagiarism detection corpora developed in recent
years, especially in low resource languages. Because of legal
issues, releasing a collection of real cases of plagiarism for
evaluation purposes is not ethical. Due to these limitations,
simulation and artificial based methods are the two main approaches
to compile a plagiarism corpus. These approaches
try to simulate real cases of plagiarism, from different point
of views. However, there are still fundamental differences
between simulated corpora and real cases of plagiarism. In
this paper a semi-real approach is proposed to create a
collection of plagiarism cases as a corpus. This approach
is based on eliminating correct references from scientific
papers to make them as plagiarized passages. Unlike methods
based on simulated and artificial approaches, the proposed
corpus can correctly simulate real cases of text re-use. The
evaluation result shows high accuracy of proposed corpus
with respect to n-gram similarity for different ranges of N.

نویسندگان : سالار محتاج، حبیب‌اله اصغری، وحید ضرابی

جهاد دانشگاهی مولود مبارک انقلاب است
حضرت آیت الله خامنه ای / معرفی گوینده...

درباره پژوهشکده

اين پژوهشكده يكي از زيرمجموعه‌هاي جهاد دانشگاهي بوده كه هدف از تأسيس آن دستيابي به دانش فني و كاربردي در رشته‌هاي تخصصي ICT از طريق طرح‌هاي مطالعاتي و تحقيقاتي و تلاش در جهت بررسي، شناسايي و كمك به رفع نيازهاي تحقيقاتي بخش‌هاي توليدي، خدماتي و اجرايي در زمينه‌هاي مذكور است.
جزئیات بیشتر...

پیوندهای مفید

اطلاعات تماس

تهران، خیابان انقلاب، چهار راه کالج، کوچه سعیدی، پلاک 5
02188930150
02188930157
info@ictrc.ac.ir

No.5 Saeedi Alley, Hafez Junction, Enghelab Avenue, Tehran, IRAN
+982188930150
+982188930157
info@ictrc.ac.ir

شبکه های اجتماعی

تمای حقوق این وب سایت برای پژوهشکده فناوری اطلاعات جهاد دانشگاهی محفوظ است.

درباره ما | ساختار پژوهشکده | نقشه سایت | اهداف و چشم انداز |

Scroll