Here you can download our dataset for evaluating pedestrian detecting/tracking in depth images.
There are two scenarious. The first one (EPFL-LAB) contains around 1000 RGB-D frames with around 3000 annotated people instances. There are at most 4 people who are mostly facing the camera, presumably the scenario for which the Kinect software was fine-tuned. The second one (EPFL-CORRIDOR) was recorded in a more realistic environment, a corridor in a university building. It contains around 3000 frames with up to 8 individuals, split in multiple sequences.
Each sequence consists of depth images and their corresponding RGB images, as well as manually annotated ground truth. Additionally, we provide empty frames to enable background subtraction, as well as the pose estimation output of the latest (as of 2015) Kinect. More details on data format is available in README files inside the archives.
You can find the link to the archives with the data as well as some sample results below.