PhD Defense: "Segmentation of 3D visual data and its applications"

Kuo-Chin Lien

August 8th (Monday), 12:00pm
Building 406, Rm 216

3D imaging technologies have significantly improved in recent years. 3D displays that actively or passively provide 3D illusions to the two eyes have already stepped from movie theaters into living rooms and even become portable/wearable; 3D sensing technologies have enabled the acquisition of scene depth in real time on a mobile device, and estimating the geometry of a room can be done in only minutes. Targeting intuitive manipulation of such ubiquitous 3D data, this dissertation is focused on advancing the traditional 2D image segmentation, a core technique in image processing and computer vision, to 3D.

Traditionally, 2D image segmentation has aimed at partitioning an image to non-overlapping pieces where each preserves a certain property, such as consisting of only the pixels belonging to the foreground object. In 3D segmentation, the goal is instead to partition the 3D space into multiple entities. When the input 3D data are captured in a manner of multi-view imagery (e.g., stereoscopic 3D), this requires introducing an additional important property — view consistency. In order to maintain a consistent 3D interpretation, corresponding segments in different views should share the same property (e.g., all the segments belong to the foreground object), considering the fact that they are observations of the same entity in the 3D world.

There are many challenges for view-consistent 3D segmentation: an object can be visible from one viewpoint but occluded from the others; the user guidance, if any, is typically in only one view but not the others; most importantly, the coarse and noisy depth information obtained by modern depth estimation techniques is not sufficient to precisely determine the 3D position of each 2D pixel and equivalently its cross-view correspondence. These all limit the performance of prior 3D segmentation algorithms that attempt to group pixels into consistent segments in different views. In some cases, an explicitly reconstructed 3D geometric model is provided so that its segmentation result can be projected to different observation viewpoints, thus naturally guaranteeing the view consistency. However, these geometric models are typically low quality and/or low resolution, which makes direct segmentation on the models very challenging.

Addressing these issues, this dissertation proposes to integrate the typically low-quality third dimensional information and high-quality 2D images in a global optimization to take advantage of both. The insight here is that powerful 3D geometric constraints and rich 2D image context can complement each other. We show that by following this principle, the proposed algorithms achieve state-of-the-art performance in several applications of 3D image editing and object extraction.

Hosted by: Professor Jerry Gibson and Professor Matthew Turk