Exploit Every Bit: Effective Caching for High-Dimensional Nearest Neighbor Search
DOWNLOAD PROJECT SYNOPSIS
High-dimensional k nearest neighbor (kNN) search has a wide range of applications in multimedia information retrieval. Existing disk-based kNN search methods incur significant I/O costs in the candidate refinement phase.Ã‚Â Propose to cache compact approximate representations of data points in main memory in order to reduce the candidate refinement time during kNN search. This problem raises two challenging issues: (i) which is the most effective encoding scheme for data points to support kNN search? and (ii) what is the optimal number of bits for encoding a data point? For (i), we formulate and solve a novel histogram optimization problem that decides the most effective encoding scheme. For (ii), we develop a cost model for automatically tuning the optimal number of bits for encoding points. In addition, our approach is generic and applicable to exact / approximate kNN search methods.