تخطي إلى التنقل الرئيسي تخطي إلى البحث تخطي إلى المحتوى الرئيسي

Entropy-Based Approach to Efficient Cleaning of Big Data in Hierarchical Databases

  • Eugene Levner
  • , Boris Kriheli
  • , Arriel Benis
  • , Alexander Ptuskin
  • , Amir Elalouf
  • , Sharon Hovav
  • , Shai Ashkenazi

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

1 اقتباس (Scopus)

ملخص

When databases are at risk of containing erroneous, redundant, or obsolete data, a cleaning procedure is used to detect, correct or remove such undesirable records. We propose a methodology for improving data cleaning efficiency in a large hierarchical database. The methodology relies on Shannon’s information entropy for measuring the amount of information stored in databases. This approach, which builds on previously-gathered statistical data regarding the prevalence of errors in the database, enables the decision maker to determine which components of the database are likely to have undergone more information loss, and thus to prioritize those components for cleaning. In particular, in cases where the cleaning process is iterative (from the root node down), the entropic approach produces a scientifically motivated stopping rule that determines the optimal (i.e. minimally required) number of tiers in the hierarchical database that need to be examined. This stopping rule defines a more streamlined representation of the database, in which less informative tiers are eliminated.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفBig Data – BigData 2020 - 9th International Conference, Held as Part of the Services Conference Federation, SCF 2020, Proceedings
المحررونSurya Nepal, Wenqi Cao, Aziz Nasridinov, MD Zakirul Alam Bhuiyan, Xuan Guo, Liang-Jie Zhang
ناشرSpringer Science and Business Media Deutschland GmbH
الصفحات3-12
عدد الصفحات10
رقم المعيار الدولي للكتب (المطبوع)9783030596118
المعرِّفات الرقمية للأشياء
حالة النشرنُشِر - 2020
الحدث9th International Conference on Big Data, BigData 2020, held as part of the Services Conference Federation, SCF 2020 - Honolulu, الولايات المتّحدة
المدة: 18 سبتمبر 202020 سبتمبر 2020

سلسلة المنشورات

الاسمLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
مستوى الصوت12402 LNCS
رقم المعيار الدولي للدوريات (المطبوع)0302-9743
رقم المعيار الدولي للدوريات (الإلكتروني)1611-3349

!!Conference

!!Conference9th International Conference on Big Data, BigData 2020, held as part of the Services Conference Federation, SCF 2020
الدولة/الإقليمالولايات المتّحدة
المدينةHonolulu
المدة18/09/2020/09/20

بصمة

أدرس بدقة موضوعات البحث “Entropy-Based Approach to Efficient Cleaning of Big Data in Hierarchical Databases'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا