Research Interests

Dr. Gong’s research interests span visual computing, including computer graphics, computer vision, image processing, and pattern recognition. To date, he has published over 160 refereed technical papers, including 25 in ACM/IEEE Transactions. He has received the 2025 SIGGRAPH Test-of-Time Award as well as multiple best-paper awards with his students.

His publications are indexed by Google Scholar, DBLP, ResearchGate, ACM Portal, and IEEE Xplore. For a complete publication list, please refer to his Curriculum Vitae.

Selected Research Topics

Here are several key research topics and related publications from Dr. Gong’s work.

Computer Vision

Crowd Counting
Learning-Based Crowd Counting

We design deep architectures to accurately count people in crowded scenes, addressing scale variation, density imbalance, and background clutter. Our latest models achieve high accuracy using only weak supervisory signals without pixel-level annotations.

Papers: PR 2023, TMM 2023

Distilled Image Collection
Distilled Collections from Textual Image Queries

We propose an unsupervised method to distill large, noisy image search results into clean, coherent sets by clustering and segmenting consistent object regions simultaneously.

Papers: Eurographics 2015

Hierarchical Image Segmentation
Hierarchical Color Image Segmentation

We present an unsupervised multilevel segmentation algorithm based on fuzzy partition entropy. The method automatically determines optimal region numbers and handles both grayscale and color images efficiently.

Papers: PR 2017, PR 2014

Transparent Object Modeling
Transparent Object Modeling

We introduce an automatic 3D reconstruction technique for transparent objects by enforcing refraction consistency, silhouette constraints, and smoothness priors.

Papers: SIGGRAPH 2018

Water Surface Reconstruction
Water Surface Reconstruction

We reconstruct dynamic water surfaces and underwater scenes using multi-camera flow analysis, capturing both above- and below-water structures.

Papers: ECCV 2018, CVPR 2017

3D Reconstruction of Transparent Objects
3D Reconstruction of Transparent Objects

We jointly reconstruct 3D positions and normals of transparent surfaces by modeling double refraction paths for both synthetic and real data.

Papers: CVPR 2016

Frequency-based Environment Matting
Frequency-Based Environment Matting

Using compressive sensing, we greatly simplify acquisition for environment matting and achieve superior quality at reduced processing cost.

Papers: ICCV 2015

Underwater Stereo
Underwater Stereo and Imaging

We extend refractive calibration models by accounting for light dispersion, improving accuracy in underwater 3D reconstruction.

Papers: CVPR 2013

Sparse RGBD Avatar Modeling
Avatar Modeling from Sparse RGBD Images

We reconstruct realistic 3D human avatars from sparse RGBD frames captured by a single moving camera, fusing views despite occlusions and pose changes.

Papers: ECCV 2022, TMM 2021

Human Motion Generation
Conditioned Generation of 3D Human Motions

We design a temporal variational autoencoder to generate diverse, natural 3D motion sequences conditioned on actions, balancing realism and variability.

Papers: IJCV 2022, ACM MM 2020

Deformable Shape Modeling
Modeling of Deformable Human Shapes

We reconstruct deformable human surfaces over time using a single depth camera, achieving temporally consistent 3D motion capture results.

Papers: ICCV 2009

Foreground Segmentation
Foreground Segmentation for Live Videos

Robust, real-time foreground segmentation handling dynamic backgrounds, fuzzy boundaries, camera motion, and topology changes—ideal for conferencing and background replacement.

Papers: TIP 2015, CVPR 2011

Video Matting
Real-Time Video Matting

The first real-time matting algorithm for live videos with natural backgrounds, based on a Poisson formulation for color and depth, achieving near-offline quality at real-time speed.

Papers: IJCV 2012, GI 2010

Background Subtraction
Background Subtraction from Dynamic Scenes

Background modeling for spatially/temporally changing textures, GPU-accelerated for real-time performance with high accuracy.

Papers: TIP 2011, ICCV 2009

Source Code: ZIP

Subpixel Stereo
Subpixel Stereo with Slanted Surface Modeling

Local surface orientation per pixel with two-pass estimation guided by disparity planes yields smooth, subpixel-accurate disparity maps.

Papers: PR 2011

GPU Stereo
Real-Time Stereo Matching on GPUs

Dense stereo reconstruction on programmable GPUs for dynamic scenes, enabling robotics and image-based modeling applications.

Papers: TIP 2007, CVPR 2005

Cost Aggregation Evaluation
Evaluation of Cost Aggregation Approaches

Systematic comparisons of six GPU-optimized aggregation algorithms for real-time stereo: speed and accuracy trade-offs.

Papers: IJCV 2007

Source Code: ZIP

Unambiguous Stereo
Unambiguous Matching with Reliability-Based DP

A reliability metric from global cost differences enables selective, unambiguous matches via dynamic programming, improving accuracy under occlusion.

Papers: TPAMI 2005, ICCV 2003

Disparity Flow
Joint Disparity and Disparity Flow Estimation

Temporal consistency in stereo sequences improves 3D motion prediction using efficient, image-space operations.

Papers: CVIU 2008, ECCV 2006

Large Motion Estimation
Large Motion Estimation Using Reliability-DP

Reliability-based dynamic programming adapted for large displacements in fast-motion video, producing dense and accurate optical flow.

Papers: IJCV 2006

Computer Graphics and Visualization

Deep Points Consolidation
Deep Points Consolidation

We present a consolidation framework based on a novel representation of 3D point sets. Each surface point is augmented with an internal “deep point” on the meso-skeleton, enabling effective denoising and completion of noisy, incomplete scans.

Paper: SIGGRAPH Asia 2015

Morfit Reconstruction
Reconstruction from Incomplete Point Clouds

We introduce an interactive reconstruction approach that alternates between user guidance and morphological fitting (“Morfit”) to reconstruct sharp-featured surfaces from partial scans with substantial missing regions.

Paper: SIGGRAPH Asia 2014

L1-Medial Skeleton
L1-Medial Skeleton of Point Clouds

We develop an algorithm for constructing L1-medial skeletons directly from raw, unstructured point scans, robustly handling noise, outliers, and missing data to extract curve- and sheet-like skeletal structures.

Paper: SIGGRAPH 2013

Edge-aware Point Resampling
Edge-Aware Point Set Resampling

We propose a progressive resampling technique that consolidates noisy point clouds while preserving sharp geometric edges. The method yields clean, edge-aligned normals and reliable point distributions for downstream reconstruction.

Paper: TPAMI 2017, TOG 2013

Plant Acquisition
Intrusive Plant Acquisition

We present approaches for plant acquisition by capturing disjoint parts that can be scanned offline. A global-to-local nonrigid registration framework preserves fine geometric details, enabling faithful reconstruction of plants with varied morphology and structure.

Papers: CGF 2017, CGF 2016

Flower Modeling
Flower Modeling from a Single Photo

This method reconstructs flower models from a single photograph by exploiting the regularity and similarity of petal structures. It enables users to rapidly create realistic 3D flowers and animate them using reconstructed geometry.

Papers: Eurographics 2014

Field-guided Registration
Field-Guided Registration for Shape Composition

We propose a field-guided registration framework that aligns shape parts with non-overlapping regions by extending one part’s surface field into the ambient space, establishing natural correspondences for seamless fusion.

Papers: SIGGRAPH Asia 2012

Cylinder Decomposition
Generalized Cylinder Decomposition

We define a quantitative measure of cylindricity and develop an optimization framework for decomposing complex shapes into generalized cylindrical parts, achieving globally optimal, semantically meaningful segmentations.

Papers: SIGGRAPH Asia 2015

Mobility Trees
Mobility Trees for Indoor Scene Manipulation

We introduce the mobility-tree construct for high-level functional representation of indoor scenes. Repetitive objects and motions are analyzed to infer mobility groups, enabling semantic editing and functional manipulation of 3D environments.

Papers: CGF 2013

Shape Segmentation
Projective Analysis for 3D Shape Segmentation

We introduce projective analysis for semantic labeling of 3D shapes, treating each shape as a collection of 2D projections. Supervised learning on 2D data guides the segmentation of 3D models, enabling effective analysis of imperfect geometry.

Papers: SIGGRAPH Asia 2013

Guided Texture Synthesis
Controlled Synthesis of Inhomogeneous Textures

We introduce a texture synthesis method that models local progression and dominant orientation through scalar and directional guidance maps, allowing users to precisely control spatial variation and structure in generated textures.

Papers: Eurographics 2017

Face Stylization
Face Photo Stylization

We propose a unified framework for fully automatic face stylization using a single style exemplar. Our patch-based model adapts samples while maintaining identity consistency and produces visually compelling results.

Papers: TVCJ 2017

Structure-Driven Completion
Structure-Driven Image Completion

We combine salient curve extraction and tele-registration alignment to jointly close gaps between image fragments. Structure-driven completion ensures geometric coherence before traditional inpainting refinement.

Papers: SIGGRAPH Asia 2013

Video Stereolization
Video Stereolization

We introduce a semiautomatic method to convert monocular videos into stereoscopic ones by combining motion analysis with qualitative depth constraints and quadratic programming to produce dense depth maps.

Papers: TVCG 2012

Stereoscopic Inpainting
Stereoscopic Inpainting

We present a joint color and depth inpainting algorithm for stereo imagery, filling occluded regions consistently across both channels to maintain geometric coherence and realism.

Papers: CVPR 2008

Layer-based Morphing
Layer-Based Morphing

We propose a morphing technique that separates scene elements into layers to prevent ghosting artifacts. Each layer is warped independently, supporting complex visibility changes and object-specific control.

Papers: Graphical Models 2001

Data Organization
Organizing Data into Structured Layouts

We study layout algorithms that spatially arrange data items so that their proximity reflects similarity. The resulting visual structures enhance users’ ability to explore relationships among data.

Papers: TMM 2014, GI 2011

Concept-based Search
Concept-Based Web Image Search

We expand short, ambiguous image queries using Wikipedia-based concepts to diversify search results. The returned images are then organized by conceptual and visual similarity to aid user navigation.

Papers: IP&M 2013, JAIHC 2013, JETWI 2012

Image Browsing
Similarity-Based Image Browsing

We propose an image browsing interface that organizes photos on a virtual 2D canvas based on visual similarity. Users can pan, zoom, and dynamically explore related photos in a collage-based interface.

Papers: IVC 2011, CIVR 2009

Rayset Taxonomy
Rayset: A Taxonomy for Image-Based Rendering

We introduce the rayset representation—a parametric formulation that unifies scene representations in image-based rendering—and present a taxonomy classifying reconstruction and rendering techniques.

Papers: IJIG 2006, GI 2001

Camera Field Rendering
Camera Field Rendering for Dynamic Scenes

We render dynamic scenes from a few sample views and noisy disparity maps using backward mapping. The process parallelizes naturally, achieving interactive performance on GPUs.

Papers: GI 2007, GM 2005

Ray-NURBS Intersection
Fast Ray–NURBS Intersection Calculation

We propose a fast intersection algorithm using adaptive subdivision and extrapolated Newton iteration. The method outperforms existing techniques while maintaining precision for complex NURBS surfaces.

Papers: C&G 1997

Robotics and Artificial Intelligence

Neural Packing
Robotic 3D Packing

We present a novel learning framework to solve the 3D object packing problem. It constitutes a complete solution pipeline from partial RGBD observations to compact box placement via robotic motion planning. At its core, a neural network trained through reinforcement learning (RL) addresses this NP-hard combinatorial optimization task.

Papers: SIGGRAPH Asia 2023, SIGGRAPH Asia 2020

Aerial Path Planning
Aerial Path Planning for 3D Reconstruction

We propose an adaptive aerial path planning algorithm that operates before site visits. Using only a 2D map and a satellite image, our method builds a coarse 2.5D model of the area and employs a Max–Min optimization strategy to select a minimal set of viewpoints maximizing reconstructability.

Papers: SIGGRAPH Asia 2021, SIGGRAPH Asia 2020

Drone Videography
Drone Videography

We developed a tool that allows novice users to capture compelling aerial videos. Given starting and ending viewpoints and selected landmarks, our system generates smooth, collision-free, and shape-adaptive trajectories to capture cinematic footage automatically.

Papers: SIGGRAPH 2018, CGF 2016

Autoscanning
Quality-Driven Autoscanning

We introduce a quality-driven autonomous scanning framework that iteratively selects next-best views to ensure completeness and fidelity. Based on Poisson field analysis, this method has been implemented on both PR2 and industrial robotic platforms.

Papers: SIGGRAPH Asia 2014

Style Transfer
Neural Networks for Arbitrary Style Transfer

We propose a self-correcting model that iteratively refines stylized images through an Error Transition Network (ETNet), which predicts and corrects residual errors across spatial and scale domains for improved style-content consistency.

Papers: AAAI 2020, NeurIPS 2019

Dual GAN
Dual Learning for Image-to-Image Translation

We develop a dual-GAN framework where two networks translate between image domains in opposite directions, enforcing consistency via a closed translation loop. This unsupervised approach achieves robust results without paired data.

Papers: ICCV 2017

Multi-Bee Colony Algorithm
Artificial Multi-Bee Colony Algorithm

We introduce the Artificial Multi-Bee Colony (AMBC) algorithm for solving k-nearest-neighbor fields. Independent bee colonies communicate locally, achieving superior matches compared to PatchMatch.

Papers: GECCO 2016

Multiresolution GA
Multiresolution Genetic Algorithms

We enhance genetic algorithms with a multiresolution scheme that encodes image solutions via quadtrees, preserving spatial locality. The approach efficiently solves vision problems such as segmentation and stereo matching.

Papers: PR 2004, IJCV 2002

Genetic Triangulation
Genetic Algorithm for Minimum-Weight Triangulation

We present a genetic algorithm for minimum-weight triangulation using adaptive crossover and mutation operators. The method consistently produces better triangulations than greedy algorithms.

Papers: ICEC 1997