Efficient $\{0,1\}$-String Searching Based on Pre-clustering

O. Aichholzer

Abstract:

In this paper we consider the $\{0,1\}$-string searching problem. For a given set $S$ of binary strings of fixed length $d$ and a query string $q$ one asks for the most similar string in $S$. Thereby the dissimilarity of two given strings is the number of disagreeing bits, that is, their Hamming distance. We present an efficient $\{0,1\}$-string searching algorithm based on hierarchical pre-clustering. To this end we give several useful observations on the inter- and intra-cluster distances.
The presented algorithms are easy to implement and we give exhaustive experimental results for uniformly distributed sets as well as for specially chosen strings. These experiments indicate that our algorithms work well in practice.



Reference: O. Aichholzer. Efficient $\{0,1\}$-string searching based on pre-clustering. In Proc. $14^{th}$ European Workshop on Computational Geometry CG '98, pages 11-13, Barcelona, Spain, 1998. [SFB Report F003-94, TU Graz, Austria, 1996].

www-data, 2020-09-10