Clustering the Hypercube

O. Aichholzer

Abstract:

In this paper we consider various clustering methods for objects represented as binary strings of fixed length $d$. The dissimilarity of two given objects is the number of disagreeing bits, that is, their Hamming distance. Clustering these objects can be seen as clustering a subset of the vertices of a $d$-dimensional hypercube, and thus is a geometric problem in d dimensions. We give algorithms for various agglomerative hierarchical methods (including single linkage and complete linkage) as well as for two-clusterings and divisive methods.
We only present linear space algorithms since for most practical applications the number of objects to be clustered is usually to large for non-linear space solutions to be practicable. All algorithms are easy to implement and the constants in their asymptotic runtime are small. We give experimental results for all cluster methods considered, and for uniformly distributed hypercube vertices as well as for specially chosen sets. These experiments indicate that our algorithms work well in practice.



Reference: O. Aichholzer. Clustering the hypercube. SFB-Report F003-93, SFB 'Optimierung und Kontrolle', TU Graz, Austria, 1996.

www-data, 2020-09-10