Fang Bai

Novel theoretical tools for universal perception: nonrigid scenes, effective solutions, and sensor defects

A perception problem aims to infer useful information from the underlying environment to help with advanced tasks. In a broader sense, a perception system is expected to play the role of both our eyes and brains, which uses visual clues to conduct logical inference. Perception has been a key research topic in robotics, computer vision and graphics, where experts from different fields add to this problem from different perspectives. Over years of research, many important questions have been answered, but more seem to occur.

For instance, conventional perception methods assume a rigid scene, but in practice many scenes are essentially nonrigid. This is particularly the case in surgical and medical applications, where the perception system aims to reconstruct the inner part of a human body subject to deformations.
A perception system solves for the maximum likelihood estimation (MLE), which often boils down to certain optimization instances. The effective resolution to these instances is another challenge, as some instances are hard to solve due to the lack of theoretical tools, or the solution is simply too expensive to compute.
The sensor is the eye of a perception system. The most commonly used one is the camera. Critically, most commercial cameras, e.g., the ones used on our cellphones, are manufactured with a rolling-shutter mechanism, where each pixel line (termed scanline) is recorded sequentially. In consequence of the rolling-shutter effect, the image is distorted which requires extra care.

My research project targets for novel theories that can advance perception in non-conventional scenarios. In specific, the project involves fundamental research that:

invents novel theories for perception in nonrigid scenes;
develops novel optimization techniques to solve MLE with complex structures;
innovates geometric methods that handle sensor defects, e.g., the rolling-shutter effect.

My research has resulted in several outputs in the above mentioned directions:

For perception in nonrigid scenes, we invent the DefGPA (short for generalized Procrustes analysis with deformations) and KernelGPA (short for generalized Procrustes analysis with kernel based transformations) methods. These solutions are globally optimal, easy to compute, and applicable to visceral deformations (occurring in medical and surgical data).
For optimization, we develop the cycle based pose graph optimization (PGO) and the proxy step-size techniques. PGO is the backbone of many perception systems, where our cycle based approach is shown to be both robuster and more computationally effective, by taking advantage of the graph sparisty. The proposed proxy step-size is a novel concept in optimization that handles composite cost functions (with one or many convex but possibly non-differentiable regularization terms) on the unit sphere, where many optimization instances in geometric vision fall into this category. Our solution is both exact and stunningly fast.
For the rolling shutter camera, where the camera center moves across image scaliness, we propose the concept of scanline homography. Different from existing works, we do not assume a parametric motion model, but instead seek to build a “bridge” between the camera motion and image distortions. This bridge is the scanline homography, which as a set explains image distortions and as an individual element carries motion information of the camera at a particular scanline.

My research has resulted in a sequence of publications as first author on top rank journals in robotics, vision, and artificial intelligence. Some representative works include T-RO 2021, T-PAMI 2022, IJCV 2022, and IJRR 2023.