Deformable Object Recognition with Articulations and Occlusions
11:00 a.m., Thurday, July 31, 1997
12th floor conference room, 719 Broadway
The subject of this thesis is deformable object recognition. We concentrate on issues of articulations and of occlusions.
In order to find a target object (undergoing articulations) in an image we use the following procedures: (i) extracting key features in an image, (ii) detecting key points in the model, (iii) efficiently searching through possible image segmentations and (iv) comparing and grouping shapes. Together, they reconstruct the target object in the image. A Bayesian rational is presented to justify this strategy.
Our main focuses in this thesis are on (iii) and (iv). More precisely, we are interested in shape representation, shape similarity and combining shape similarity with image segmentation.
We consider two possible shape representations for an object. The first is given by its shape contour (SC), or silhouette, and the other is described by the structure of symmetry axis (SA), or skeleton, which has a unique free tree structure. For shape similarity, we review a string matching method based on the SC representation and then, we develop a tree matching scheme using the SA-tree representation. The advantage of this approach is that it becomes extremely simple to account for articulations and occlusions. As a novelty, the SA is obtained via a shape comparison between an SC and its mirror version. Finally we study how to integrate the shape module, for both shape representations (SC and SA), with an active contour tracker to yield an image segmentation.
Our efforts through all these issues have been to provide methods that are guaranteed to find optimal solutions.
We also address the topic of occluded object recognition but from a different viewpoint. Our method is to treat it as a function approximation problem with an over-complete basis (a library of image templates), but also accounts for occlusions, where the basis superposition principle is no longer valid. Since the basis is over-complete, there are infinitely many ways to decompose the image. We are motivated to select a sparse/compact representation of the image and to account for occlusions and noise.