Our framework mixes three types of edge counting features. Every row shows an example feature from each type along with its extractions for three samples: an open hand, the same hand where the thumb has moved and a rotated version of this case. The example features are shown on the left column: the solid box shows the support of the feature while the solid line within shows the extracted edge orientation. The dashed box shows the area in the image from which the pose estimate is computed, here the dominant edge orientation. This area is also highlighted in every sample by the bolded outline of the hand. Note how the first feature effectively tracks the hand's backside, the second feature tracks the thumb while the third tracks the forefinger.Complete freedom is given to the learning procedure to select a pose estimator and pose-indexed feature.

Hand Video Sequences
Typical results obtained with our framework for hand detection.
In these sequences, green squares indicate a correct detection whereas red squares indicate a false alarms.
Note how our method is robust to strong changes in the apperance of the hand despite the fact that these changes were not annotated in the training sequence.
Detetion here proceeds frame by frame, independently, without background substraction.
We expect that adding temporal cohesion to significantly improve results.
Aerial Image (Google Earth) of Cars
Typical results obtained with our framework for car detection. In these images, obtained from Google Earth over Geneva, green squares indicate a correct detection
whereas red squares indicate a false alarm. The training was also done from Google Earth Images over a different city. Note how our method is able to detect cars
in any orientation despite the fact that training data was not annotated for orientation: our framework is able to learn the pose variations present in the training data,
adapt to them and provide with reliable detection.
We compared our framework with the state-of-the art in object detection. Both methods have access to the same ground truth for training, namely data that is only annotated for location but not for additional pose parameters such as deformations (hands), in-plane rotations (cars) and rigid rotations (faces). The Receiver Operating Characteristics are shown below.