Binary Descriptors

Binary image descriptors encode patch appearance using a compact binary string. The hamming distance in this space is designed to follow a desired image similarity measure typically sought to be invariant to scene illumination and viewpoint changes.

With the advent of increasingly large image databases that must be handled by relatively low power mobile devices, modern vision systems must not only be accurate but also computationally efficient. To this end, binary image descriptors provide an attractive alternative to the widely used floating point ones such as SIFT and SURF as they offer similar recognition performance at much reduced storage and computational costs.

Our group has been active in this area over many years. BRIEF demonstrated the effectiveness of simple intensity comparisons for describing image patch appearance resulting in an invariant descriptor that is also extremely efficient to compute, store, and match. D-BRIEF was later developed as an alternative to BRIEF that uses efficient image filters to approximate the dimensions of a learned linear projection resulting in further improvements in both speed and accuracy.

BRIEF has been most successfully applied in small-scale image matching problems such as those common to Augmented Reality. For larger scale problems, we showed that using training data to learn the descriptors could be beneficial. This resulted in the LDAHash descriptor. It learns a linear transformation of SIFT and has been shown to improve accuracy on large-scale applications, while still being fast to match and efficient to store.

More recently, we have developed a boosted binary descriptor BinBoost. This involves jointly optimizing over the pooling configuration and weighting of a set of non-linear gradient responses that comprise the descriptor. On the challenging Brown and Winder benchmark, BinBoost achieved a significant improvement over both state-of-the-art binary descriptors and SIFT, and a similar accuracy to the state-of-the-art, learning-based floating point descriptors but at a fraction of their matching and storage cost.

Results

The plot below compares BinBoost, LDAHash, and BRIEF on the Brown and Winder patch matching dataset against many of the state-of-the-art binary and floating-point keypoint descriptors. The 95% error rate and matching time is displayed for each method. BinBoost achieves the best tradeoff between matching time and accuracy. It significantly outperforms SIFT and achieves a comparable accuracy to the state-of-the-art floating point descriptors at a fraction of their matching time and storage cost. See the paper for more details.

References

M. Calonder; V. Lepetit; M. Özuysal; T. Trzcinski; C. Strecha et al. : BRIEF: Computing a Local Binary Descriptor Very Fast; IEEE Transactions on Pattern Analysis and Machine Intelligence. 2012. DOI : 10.1109/TPAMI.2011.222.
T. Trzcinski; V. Lepetit : Efficient Discriminative Projections for Compact Binary Descriptors. 2012. European Conference on Computer Vision, Florence, Italy, 2012.
C. Strecha; A. M. Bronstein; M. M. Bronstein; P. Fua : LDAHash: Improved Matching with Smaller Descriptors; IEEE Transactions on Pattern Analysis and Machine Intelligence. 2012. DOI : 10.1109/TPAMI.2011.103.
T. Trzcinski; C. M. Christoudias; P. Fua; V. Lepetit : Boosting Binary Keypoint Descriptors. 2013. Computer Vision and Pattern Recognition (CVPR), Portland, USA. DOI : 10.1109/Cvpr.2013.370.