Research Interests
Dr. Gong’s research interests span visual computing, including computer graphics, computer vision, image processing, and pattern recognition. To date, he has published over 160 refereed technical papers, including 25 in ACM/IEEE Transactions. He has received the 2025 SIGGRAPH Test-of-Time Award as well as multiple best-paper awards with his students.
His publications are indexed by Google Scholar, DBLP, ResearchGate, ACM Portal, and IEEE Xplore. For a complete publication list, please refer to his Curriculum Vitae.
Selected Research Topics
Here are several key research topics and related publications from Dr. Gong’s work.
Computer Vision
Learning-Based Crowd Counting
We design deep architectures to accurately count people in crowded scenes, addressing scale variation, density imbalance, and background clutter. Our latest models achieve high accuracy using only weak supervisory signals without pixel-level annotations.
Distilled Collections from Textual Image Queries
We propose an unsupervised method to distill large, noisy image search results into clean, coherent sets by clustering and segmenting consistent object regions simultaneously.
Papers: Eurographics 2015
Transparent Object Modeling
We introduce an automatic 3D reconstruction technique for transparent objects by enforcing refraction consistency, silhouette constraints, and smoothness priors.
Papers: SIGGRAPH 2018
3D Reconstruction of Transparent Objects
We jointly reconstruct 3D positions and normals of transparent surfaces by modeling double refraction paths for both synthetic and real data.
Papers: CVPR 2016
Frequency-Based Environment Matting
Using compressive sensing, we greatly simplify acquisition for environment matting and achieve superior quality at reduced processing cost.
Papers: ICCV 2015
Underwater Stereo and Imaging
We extend refractive calibration models by accounting for light dispersion, improving accuracy in underwater 3D reconstruction.
Papers: CVPR 2013
Conditioned Generation of 3D Human Motions
We design a temporal variational autoencoder to generate diverse, natural 3D motion sequences conditioned on actions, balancing realism and variability.
Papers: IJCV 2022, ACM MM 2020
Modeling of Deformable Human Shapes
We reconstruct deformable human surfaces over time using a single depth camera, achieving temporally consistent 3D motion capture results.
Papers: ICCV 2009
Foreground Segmentation for Live Videos
Robust, real-time foreground segmentation handling dynamic backgrounds, fuzzy boundaries, camera motion, and topology changes—ideal for conferencing and background replacement.
Real-Time Video Matting
The first real-time matting algorithm for live videos with natural backgrounds, based on a Poisson formulation for color and depth, achieving near-offline quality at real-time speed.
Subpixel Stereo with Slanted Surface Modeling
Local surface orientation per pixel with two-pass estimation guided by disparity planes yields smooth, subpixel-accurate disparity maps.
Papers: PR 2011
Unambiguous Matching with Reliability-Based DP
A reliability metric from global cost differences enables selective, unambiguous matches via dynamic programming, improving accuracy under occlusion.
Papers: TPAMI 2005, ICCV 2003
Large Motion Estimation Using Reliability-DP
Reliability-based dynamic programming adapted for large displacements in fast-motion video, producing dense and accurate optical flow.
Papers: IJCV 2006
Computer Graphics and Visualization
Deep Points Consolidation
We present a consolidation framework based on a novel representation of 3D point sets. Each surface point is augmented with an internal “deep point” on the meso-skeleton, enabling effective denoising and completion of noisy, incomplete scans.
Paper: SIGGRAPH Asia 2015
Reconstruction from Incomplete Point Clouds
We introduce an interactive reconstruction approach that alternates between user guidance and morphological fitting (“Morfit”) to reconstruct sharp-featured surfaces from partial scans with substantial missing regions.
Paper: SIGGRAPH Asia 2014
L1-Medial Skeleton of Point Clouds
We develop an algorithm for constructing L1-medial skeletons directly from raw, unstructured point scans, robustly handling noise, outliers, and missing data to extract curve- and sheet-like skeletal structures.
Paper: SIGGRAPH 2013
Edge-Aware Point Set Resampling
We propose a progressive resampling technique that consolidates noisy point clouds while preserving sharp geometric edges. The method yields clean, edge-aligned normals and reliable point distributions for downstream reconstruction.
Paper: TPAMI 2017, TOG 2013
Intrusive Plant Acquisition
We present approaches for plant acquisition by capturing disjoint parts that can be scanned offline. A global-to-local nonrigid registration framework preserves fine geometric details, enabling faithful reconstruction of plants with varied morphology and structure.
Flower Modeling from a Single Photo
This method reconstructs flower models from a single photograph by exploiting the regularity and similarity of petal structures. It enables users to rapidly create realistic 3D flowers and animate them using reconstructed geometry.
Papers: Eurographics 2014
Field-Guided Registration for Shape Composition
We propose a field-guided registration framework that aligns shape parts with non-overlapping regions by extending one part’s surface field into the ambient space, establishing natural correspondences for seamless fusion.
Papers: SIGGRAPH Asia 2012
Generalized Cylinder Decomposition
We define a quantitative measure of cylindricity and develop an optimization framework for decomposing complex shapes into generalized cylindrical parts, achieving globally optimal, semantically meaningful segmentations.
Papers: SIGGRAPH Asia 2015
Mobility Trees for Indoor Scene Manipulation
We introduce the mobility-tree construct for high-level functional representation of indoor scenes. Repetitive objects and motions are analyzed to infer mobility groups, enabling semantic editing and functional manipulation of 3D environments.
Papers: CGF 2013
Projective Analysis for 3D Shape Segmentation
We introduce projective analysis for semantic labeling of 3D shapes, treating each shape as a collection of 2D projections. Supervised learning on 2D data guides the segmentation of 3D models, enabling effective analysis of imperfect geometry.
Papers: SIGGRAPH Asia 2013
Controlled Synthesis of Inhomogeneous Textures
We introduce a texture synthesis method that models local progression and dominant orientation through scalar and directional guidance maps, allowing users to precisely control spatial variation and structure in generated textures.
Papers: Eurographics 2017
Face Photo Stylization
We propose a unified framework for fully automatic face stylization using a single style exemplar. Our patch-based model adapts samples while maintaining identity consistency and produces visually compelling results.
Papers: TVCJ 2017
Structure-Driven Image Completion
We combine salient curve extraction and tele-registration alignment to jointly close gaps between image fragments. Structure-driven completion ensures geometric coherence before traditional inpainting refinement.
Papers: SIGGRAPH Asia 2013
Video Stereolization
We introduce a semiautomatic method to convert monocular videos into stereoscopic ones by combining motion analysis with qualitative depth constraints and quadratic programming to produce dense depth maps.
Papers: TVCG 2012
Stereoscopic Inpainting
We present a joint color and depth inpainting algorithm for stereo imagery, filling occluded regions consistently across both channels to maintain geometric coherence and realism.
Papers: CVPR 2008
Layer-Based Morphing
We propose a morphing technique that separates scene elements into layers to prevent ghosting artifacts. Each layer is warped independently, supporting complex visibility changes and object-specific control.
Papers: Graphical Models 2001
Organizing Data into Structured Layouts
We study layout algorithms that spatially arrange data items so that their proximity reflects similarity. The resulting visual structures enhance users’ ability to explore relationships among data.
Concept-Based Web Image Search
We expand short, ambiguous image queries using Wikipedia-based concepts to diversify search results. The returned images are then organized by conceptual and visual similarity to aid user navigation.
Papers: IP&M 2013, JAIHC 2013, JETWI 2012
Fast Ray–NURBS Intersection Calculation
We propose a fast intersection algorithm using adaptive subdivision and extrapolated Newton iteration. The method outperforms existing techniques while maintaining precision for complex NURBS surfaces.
Papers: C&G 1997
Robotics and Artificial Intelligence
Robotic 3D Packing
We present a novel learning framework to solve the 3D object packing problem. It constitutes a complete solution pipeline from partial RGBD observations to compact box placement via robotic motion planning. At its core, a neural network trained through reinforcement learning (RL) addresses this NP-hard combinatorial optimization task.
Papers: SIGGRAPH Asia 2023, SIGGRAPH Asia 2020
Aerial Path Planning for 3D Reconstruction
We propose an adaptive aerial path planning algorithm that operates before site visits. Using only a 2D map and a satellite image, our method builds a coarse 2.5D model of the area and employs a Max–Min optimization strategy to select a minimal set of viewpoints maximizing reconstructability.
Papers: SIGGRAPH Asia 2021, SIGGRAPH Asia 2020
Drone Videography
We developed a tool that allows novice users to capture compelling aerial videos. Given starting and ending viewpoints and selected landmarks, our system generates smooth, collision-free, and shape-adaptive trajectories to capture cinematic footage automatically.
Papers: SIGGRAPH 2018, CGF 2016
Quality-Driven Autoscanning
We introduce a quality-driven autonomous scanning framework that iteratively selects next-best views to ensure completeness and fidelity. Based on Poisson field analysis, this method has been implemented on both PR2 and industrial robotic platforms.
Papers: SIGGRAPH Asia 2014
Neural Networks for Arbitrary Style Transfer
We propose a self-correcting model that iteratively refines stylized images through an Error Transition Network (ETNet), which predicts and corrects residual errors across spatial and scale domains for improved style-content consistency.
Papers: AAAI 2020, NeurIPS 2019
Dual Learning for Image-to-Image Translation
We develop a dual-GAN framework where two networks translate between image domains in opposite directions, enforcing consistency via a closed translation loop. This unsupervised approach achieves robust results without paired data.
Papers: ICCV 2017
Artificial Multi-Bee Colony Algorithm
We introduce the Artificial Multi-Bee Colony (AMBC) algorithm for solving k-nearest-neighbor fields. Independent bee colonies communicate locally, achieving superior matches compared to PatchMatch.
Papers: GECCO 2016
Genetic Algorithm for Minimum-Weight Triangulation
We present a genetic algorithm for minimum-weight triangulation using adaptive crossover and mutation operators. The method consistently produces better triangulations than greedy algorithms.
Papers: ICEC 1997