Springer, 1992. — 205.
In simple terms, vision is the process of seeing, the inference of world properties from the patterns of light sensed by an observer. At first it seems to be a coherent and effortless process, yet upon investigation, vision appears to be a collection of intricate processes that interact in subtle ways to provide a stable, cohesive model of the visual world. The perception of motion, the dynamic relationship between the camera and objects in the scene, is one of our most important visual faculties. It facilitates, among other things, the inference of our movement through the world, the detection and recognition of other moving creatures, the inference of the depth and the surface structure of visible objects, and hand-eye coordination.
Visual motion perception concerns the inference of world properties from sequences of images. The projection of the 3-d motion of objects onto the image plane is called the 2-d motion field. It is a purely geometric quantity that relates 3-d velocities and positions of points to 2-d velocities and locations in the image plane. From it, with knowledge of the projection, it is possible to infer the 3-d movement of the camera and the local structure of the visible surfaces. The determination of relative 3-d velocity and surface structure has been an active, yet controversial area of research, producing few practical results. One of the main difficulties has been the sensitivity of these methods to the relatively inaccurate estimates of the 2-d motion field that are obtained from image sequences, referred to as optical flow or image velocity.
The goal in measuring image velocity is to extract an approximation to the 2-d motion field from a sequence of images. For this, a definition of image velocity in terms of spatiotemporal patterns of image intensity and a suitable measurement technique are required. The difficulty is that image intensity depends on several factors in addition to the camera and scene geometry, such as the sources of illumination and surface reflectance properties. Until recently, it has been common to assume that, to a good approximation, patches of intensity simply translate between consecutive frames of an image sequence. The monograph argues that this assumption is overly restrictive. A robust method for measuring image velocity should allow for higher-order geometric deformations (e.g. dilation and rotation), photometric effects (e.g. shading), and the existence of more than one legitimate velocity in a local neighbourhood (e.g. because of specular reflections, shadows, transparency, or occlusion).
Towards this end, the monograph presents a new approach to the measurement of image velocity that consists of three (conceptual) stages of processing. The first constructs a velocity-scale specific representation of the input based on the complex-valued outputs of a family of velocity-tuned filters. Such preprocessing separates image structure according to scale, orientation and speed in order to isolate manifestations of potentially independent scene properties. For the second stage of processing, component image velocity is defined, for each filter output, as the component of velocity normal to level contours of constant phase. Except for isolated regions in which phase-based measurements are unreliable, this definition is shown to yield a dense, accurate set of velocity measurements. In a third stage, these component velocity measurements are combined. However, this last stage involves a host of complex issues relating to scene interpretation, and is examined only briefly in this monograph.
The use of phase information is justified in terms of its robustness with respect to deviations from image translation that exist commonly in perspective projections of 3-d scenes. The occasional instability of phase information is shown to be related to the existence of phase singularities. These neighbourhoods of instability about the singularities are reliably detected with little extra computational effort, which is viewed as an essential component of the resulting technique. Finally, we report a series of experimental results for which an implementation of the technique is applied to both real and synthetic image sequences.
Part I BackgroundIntroduction
Time-Varying Image Formation
Image Velocity and Frequency Analysis
Velocity-Specific Representation
Review of Existing Techniques
Part II Phase-Based Velocity MeasurementImage Velocity as Local Phase Behaviour
Experimental Results
Computing 2-D Velocity
Part III On Phase Properties of Band-Pass Signals Scale-Space Phase Stability
Scale-Space Phase Singularities
Application to Natural Images
Part IV ConclusionsSummary and Discussion
Appendices A: Reflectance Model
B: Proof of an n-D Uncertainty Relation
C: Numerical Interpolation and Differentiation of R(x,t)
E: Approximations to E[Δϕ] and E[|Δϕ-E[Δϕ]|]
F: Derivations of z
1G: Density Functions for ϕ
x(x) and ρ
x(x)/ρ(x)