Expected Density of Random Minimizers

Shay Golan, Arseny M. Shur

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Minimizer schemes, or just minimizers, are a very important computational primitive in sampling and sketching biological strings. Assuming a fixed alphabet of size σ, a minimizer is defined by two integers k,w≥2 and a total order ρ on strings of length k (also called k-mers). A string is processed by a sliding window algorithm that chooses, in each window of length w+k-1, its minimal k-mer with respect to ρ. A key characteristic of the minimizer is the expected density of chosen k-mers among all k-mers in a random infinite σ-ary string. Random minimizers, in which the order ρ is chosen uniformly at random, are often used in applications. However, little is known about their expected density DRσ(k,w) besides the fact that it is close to 2w+1 unless w≫k.    We first show that DRσ(k,w) can be computed in O(kσk+w) time. Then we attend to the case w≤k and present a formula that allows one to compute DRσ(k,w) in just O(wlogw) time. Further, we describe the behaviour of DRσ(k,w) in this case, establishing the connection between DRσ(k,w), DRσ(k+1,w), and DRσ(k,w+1). In particular, we show that DRσ(k,w)<2w+1 (by a tiny margin) unless w is small. We conclude with some partial results and conjectures for the case w>k.

Original languageEnglish
Title of host publicationSOFSEM 2025
Subtitle of host publicationTheory and Practice of Computer Science - 50th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2025, Proceedings
EditorsRastislav Královič, Věra Kůrková
PublisherSpringer Science and Business Media Deutschland GmbH
Pages347-360
Number of pages14
ISBN (Print)9783031826696
DOIs
StatePublished - 2025
Externally publishedYes
Event50th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2025 - Bratislava, Slovakia
Duration: 20 Jan 202523 Jan 2025

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15538 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference50th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2025
Country/TerritorySlovakia
CityBratislava
Period20/01/2523/01/25

Keywords

  • Expected Density
  • Minimizer
  • Random Minimizer

Fingerprint

Dive into the research topics of 'Expected Density of Random Minimizers'. Together they form a unique fingerprint.

Cite this