דילוג לניווט ראשי דילוג לחיפוש דילוג לתוכן הראשי

GreedyMini: Generating low-density DNA minimizers

  • Shay Golan
  • , Ido Tziony
  • , Matan Kraus
  • , Yaron Orenstein
  • , Arseny Shur

פרסום מחקרי: פרסום בכתב עתמאמרביקורת עמיתים

1 ציטוט ‏(Scopus)

תקציר

Motivation Minimizers are the most popular k-mer selection scheme in algorithms and data structures analyzing high-throughput sequencing (HTS) data. In a minimizer scheme, the smallest k-mer by some predefined order is selected as the representative of a sequence window containing w consecutive k-mers, which results in overlapping windows often selecting the same k-mer. Minimizers that achieve the lowest frequency of selected k-mers over a random DNA sequence, termed the expected density, are desired for improved performance of HTS analyses. Yet, no method to date exists to generate minimizers that achieve minimum expected density. Moreover, for k and w values used by common HTS algorithms and data structures, there is a gap between densities achieved by existing selection schemes and the theoretical lower bound. Results We developed GreedyMini, a toolkit of methods to generate minimizers with low expected or particular density, to improve minimizers, to extend minimizers to larger alphabets, k, and w, and to measure the expected density of a given minimizer efficiently. We demonstrate over various combinations of k and w values, including those of popular HTS methods, that GreedyMini can generate DNA minimizers that achieve expected densities very close to the lower bound, and both expected and particular densities much lower compared to existing selection schemes. Moreover, we show that GreedyMini's k-mer rank-retrieval time is comparable to common k-mer hash functions. We expect GreedyMini to improve the performance of many HTS algorithms and data structures and advance the research of k-mer selection schemes. Availability and implementation The toolkit, its source code, and precomputed minimizers for a variety of (k,w) pairs are available via https://github.com/OrensteinLab/GreedyMini.

שפה מקוריתאנגלית
עמודים (מ-עד)i275-i284
כתב עתBioinformatics
כרך41
מספר גיליוןSupplement_1
מזהי עצם דיגיטלי (DOIs)
סטטוס פרסוםפורסם - 1 יולי 2025
פורסם באופן חיצוניכן

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'GreedyMini: Generating low-density DNA minimizers'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי