Logo EPFL
I&C
 Ecole Polytechnique Fédérale de Lausanne
     Projects
 English only       EPFL > I&C > CVLAB > Projects > Cti > Surv
 CVLAB CONTENTS
 People
Research
Publications
Teaching
Student projects
Software
Data
Jobs
Intranet
 QUICK LINKS
 EPFL Infoscience
I&C Doctoral School

Incident Detection in a Multi-Camera Environment for Visual Surveillance Applications

To extend the capabilities of existing visual surveillance systems, we are developing a 3D framework dedicated to incident detection based on a multi-camera setup. Our goal in this project is two-fold:

  1. Processing the output of several cameras in order to handle occlusions among people and their environment and provide us with more robust people detection and tracking strategies;
  2. Capturing the motion of a person or a group of people in order to make the interpretation of abnormal behaviors much easier.

We aim at designing a system that combines the video flows from several cameras with overlapping views in order to generate a 3D representation of the scene under surveillance, potentially with the help of planimetric information when available.

Based on this representation, we will detect individuals and groups of individuals in the scene, to represent their relative positions in the 3D space and to analyze their behaviors and interactions.

Our project has been progressing so far through the following steps:

  1. Multi-people detection on a single time frame using a probabilistic occupancy map (POM)

  2. Multi-people tracking using dynamic programming

  3. Detection-by-classification from multiple views

  4. Anomaly detection using behavioral maps

Multi-People Detection on a Single Time Frame using a Probabilistic Occupancy Map

From the original video streams, a segmentation algorithm generates streams of binary images by estimating which part of the pictures are different from an estimated background picture.

[original video streams] [streams of binary images]

From those binary streams, our algorithm iteratively estimates for each location in the room the probability for an individual to be present. In a nutshell, the algorithm optimizes those probabilities so that average images computed from those probabilities match the image provided by the segmentation algorithm. The convergence process can be displayed by showing successively the computed average image for the current estimates and the original images.

[Convergence on frame #106]
[Convergence on frame #112]
[Convergence on frame #156]
[Convergence on frame #175]
[Convergence on frame #206]

Finally, the detection on the complete sequence can be achieved by running this algorithm on each individual frame and keeping the local maxima as likely to be locations of individuals.

[Result with two persons]
[Result with three persons]

Multi-People Tracking using Dynamic Programming

Using the occupancy probability maps computed by our detection algorithm, we apply dynamic programming on this data to add tracking capability to our framework. We design a HMM taking into account the occupancy probabilities as well as an appearance model of the people and a simple isotropic pedestrian motion model. Such a model allows us to use Viterbi's algorithm to retrieve the most probable trajectories over a batch of frames.

To deal with the complexity of optimizing simultaneously multiple trajectories, we treat them as independent and optimize them one at a time. We carefully chose the trajectory computation order based on a reliability score, in order to avoid trajectories confusion.

As shown on the videos below, our algorithm is capable of following up to 6 people in a small room for several minutes without any tracking error. The results also show that the choice of detection box size does not have a strong influence on the accuracy of our algorithm.

Indoor result with four people
[ WMW (4MB AVI file) ] - [ DivX (38MB AVI file) ]
Indoor sequence with kids
[ WMW (15MB AVI file) ]
Indoor result with six people
[ WMV (11MB AVI file) ] - [ DivX (29MB AVI file) ]
Outdoor result
[ WMV (4Mb AVI file) ] - [ DivX (19Mb AVI file) ]
Outdoor result (with ground trajectories)
[ WMV (3Mb AVI file) ] - [ DivX (16Mb AVI file) ]

Detection-by-Classification from Multiple Views

In order to avoid performing background subtraction, which can be very sensitive to image quality and misses discriminative capability, we perform people detection in the image plan directly.



Overview of our detection-by-classification method.

We train a decision tree to correctly classify windows containing a pedestrian. As illustrated in the figure above, we then apply the classifier in each camera view independently at every possible position of the ground plane. We thus obtain as many score maps as there are cameras, that we then merge using our 3D knowledge of the scene, as well as the model of the classifier answer. We finally obtain an occupancy map, similar to the one derived with our people detection algorithm, but without the need of using background subtraction.


Anomaly Detection using Behavioral Maps

We extend our framework in order to detect abnormal behaviors using behavioral maps. The key idea is to represent standard movements in a scene with a set of behavioral maps. A behavioral map encodes, for every position of a top view, the probability of movement on the ground as well as the probability of switching to another behavioral map. The possibility for a tracked person to switch between maps allows us to model complex situations, with crossings or intersections for example.

To learn the behavioral maps corresponding to a given situation, we process video streams of the scene with our people detection algorithm and obtain a set of probability occupancy maps. We then perform Expectation Maximization on this data to generate a number of behavioral maps. The ideal number of maps for a situation is assessed by cross-validation. On the figure below are shown a set of behavioral map that were extracted from a test scenario described by the left-most image. The two right-most images represent the probability of staying in the same map, with dark colors representing high probability.

scenario 1st movement map 2nd movement map 1st transition map 2nd transition map

Once computed, the behavioral maps are used in two different manners. First, we can use this knowledge about evolution of people in our scene to reinforce the quality of our people tracking algorithm. We replace the simple isotropic motion model by the more evolved motion model with behavioral maps. This was shown to improve the accuracy of the tracking, by reducing the number of mixed trajectories due to ambiguous situations.

Besides, we can also evaluate the likelihood of the trajectories retrieved by tracking with the help of the behavioral maps. This way, we can detect behaviors that are clearly different from the one that were observed during training. The videos below show an example of atypical motion detection. They correspond to the scenario illustrated by the figure above.

[video 1] [video 2] [video 3] [video 4]

Source Code

The source code that we wrote for the people detection part of this project has been released under a GPL license. You can download it from the Software page of our web site.


Data Sets

Some of the multi-camera video sequences that we acquired for this project are available for download on the Data part of our web site.


References

J. Berclaz, F. Fleuret and P. Fua, Multi-Camera Tracking and Atypical Motion Detection with Behavioral Maps, European Conference on Computer Vision, October 2008.
J. Berclaz, F. Fleuret and P. Fua, Principled Detection-by-Classification from Multiple Views, Proceedings of the Third International Conference on Computer Vision Theory and Applications, Vol. 2, pp. 375 - 382, January 2008.
F. Fleuret, J. Berclaz, R. Lengagne and P. Fua, Multi-Camera People Tracking with a Probabilistic Occupancy Map, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, Nr. 2, pp. 267 - 282, February 2008.
J. Berclaz, F. Fleuret, and P. Fua, Robust People Tracking with Global Trajectory Optimization, Conference on Computer Vision and Pattern Recognition, 2006.
F. Fleuret, R. Lengagne and P. Fua, Fixed Point Probability Field for Complex Occlusion Handling, International Conference in Computer Vision, October 2005.

Contact

J. Berclaz [jerome.berclaz@epfl.ch],
F. Fleuret [francois.fleuret@idiap.ch]



Comments/Feedback to webmaster.cvlab { at } epfl.ch
Last update : 18 July 2008 15:52:26