WHAT IS COMPUTER VISION?

The goal of Computer vision is to process images acquired with cameras in order to produce a representation of objects in the world.

There already exists a number of working systems that perform parts of this task in specialized domains. For example, a map of a a city or a mountain range can be produced semiautomatically from a set of aerial images. A robot can use the several image frames per second produced by one or two video cameras to produce a map of its surroundings for path planning and obstacle avoidance. A printed circuit inspection system may take one picture per board on a conveyer belt and produce a binary image flagging possible faulty soldering points on the board.

A zip code reader takes single snapshots of envelopes and translates a handwritten number into an ASCII string. A security system can match one or a few pictures of a face with a database of known employees for recognition.

However, the generic "Vision Problem" is far from being solved. No existing system can come close to emulating the capabilities of a human. Systems such as the ones described above are fundamentally brittle: As soon as the input deviates ever so slightly from the intended format, the output becomes almost invariably meaningless. If we did not have a proof of existence of a very powerful, general and flexible system in our own retinas and visual cortices, the research of the past quarter of a century would seem to indicate that the task of building robust vision systems is hopeless.

Vision is therefore one of the problems of computer science most worthy of investigation because we know that it can be solved, yet we do not know how to solve it well. In fact, to solve the "general vision problem" we will have to come up with answers to deep and fundamental questions about representation and computation at the core of human intelligence.


Return to previous page