Xuejian Rong

I am a researcher working on cutting-edge research projects at the intersection of Computer Vision, Computational Photography, and Machine Learning at the Computational Photography team of Meta Reality Labs. My research interests cover 3D vision, neural rendering, low-level vision, and visual-linguistic understanding.

Email  /  Google Scholar  /  LinkedIn



Boosting View Synthesis with Residual Transfer
Xuejian Rong, Jia-Bin Huang, Ayush Saraf, Changil Kim, Johannes Kopf
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022

We present a simple but effective technique to boost the rendering quality, which can be easily integrated with most volumetric view synthesis methods. The core idea is to transfer color residuals (the difference between the input images and their reconstruction) from training views to novel views.


Robust Consistent Video Depth Estimation
Johannes Kopf, Xuejian Rong, Jia-Bin Huang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021   (Oral)

We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.


Burst Denoising via Temporally Shifted Wavelet Transforms
Xuejian Rong, Denis Demandolx, Kevin Matzen, Priyam Chatterjee, Yingli Tian
European Conference on Computer Vision (ECCV), 2020

Proposed an end-to-end trainable burst denoising pipeline which jointly captures high-resolution and high-frequency deep features derived from wavelet transforms.


Unambiguous Text Localization, Retrieval, and Recognition for Cluttered Scenes
Xuejian Rong, Chucai Yi, Yingli Tian
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Accepted.

Extended our previous CVPR paper as an End-to-End pipeline from scene text detection and retrieval to recognition.


Incremental Scene Synthesis
Benjamin Planche, Xuejian Rong, Ziyan Wu, Srikrishna Karanam, Harald Kosch, Yingli Tian, and Jan Ernst
Thirty-third Conference on Neural Information Processing Systems (NeurIPS), 2019

To incrementally generates complete and consistent 2D or 3D scenes with learned scene priors, while real observations of an actual scene can be incorporated, and unobserved parts of the scene can be hallucinated.

Applications include autonomous agent exploration and few-shot learning.


Towards Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes
Haiyan Wang, Xuejian Rong, Liang Yang, Yingli Tian
British Machine Vision Conference (BMVC), 2019   (Oral)

Presents a method of 3D point cloud segmentation using 2D supervision. A graph-based pyramid feature network is proposed to capture global and local feature of points. A perspective rendering and semantic fusion module is also introduced to offer refined 2D supervision.


Towards Accurate Instance-level Text Spotting With Guided Attention
Haiyan Wang, Xuejian Rong, Yingli Tian
IEEE International Conference on Multimedia and Expo (ICME), 2019

Presents an effective end-to-end framework for detecting multi-lingual scene texts in arbitrary orientations by integrating text attention model and global enhancement block with the pixel-link method without adopting pretrained weights or extra synthetic datasets.


Unambiguous Scene Text Segmentation with Referring Expression Comprehension
Xuejian Rong, Chucai Yi, Yingli Tian
IEEE Transactions on Image Processing (TIP), Accepted.

Combining the power of both instance-level scene text segmentation and visual phrase grounding.


Unambiguous Text Localization and Retrieval for Cluttered Scenes
Xuejian Rong, Chucai Yi, Yingli Tian
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017   (Spotlight)

To utilize text instances for understanding natural scenes, we have proposed a framework that combines image-based text localization with language-based context description for text instances.

Specifically, we explore the task of unambiguous text localization and retrieval, to accurately localize a specific targeted text instance in a cluttered image given a natural language description that refers to it.


Evaluation of Low-Level Features for Real-World Surveillance Event Detection
Yang Xian, Xuejian Rong, Xiaodong Yang, Yingli Tian
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2017

We evaluate several of the most commonly used low-level features for real-world surveillance event detection tasks.


Assistive Indoor Navigation for the Visually Impaired in Multi-Floor Environments
J. Pablo Munoz, Bing Li, Xuejian Rong, Jizhong Xiao, Yingli Tian, and Aris Arditi
IEEE International Conference on Cyber Technology (CYBER), 2017   (Best Paper Award)

Our system allows blind users to explore multi-floor environment with a wearable Tango device.


Adaptive Shrinkage Cascades for Blind Image Deconvolution.
Xuejian Rong and Yingli Tian
IEEE International Conference on Digital Signal Processing (DSP), 2016   (Oral)

A framework is proposed to deconvolve blind image with patch-wise prior and adaptive shrinkage cascades.


Region Trajectories for Video Semantic Concept Detection
Yuancheng Ye, Xuejian Rong, and Yingli Tian
ACM International Conference on Multimedia Retrieval (ICMR), 2016

We introduce an algorithm based on region trajectories to establish the connections between object localization in individual frames and video sequences.


ISANA: Wearable Context-Aware Indoor Assistive Navigation with Obstacle Avoidance for the Blind
Bing Li, J. Pablo Munoz, Xuejian Rong, Jizhong Xiao, Yingli Tian, and Aris Arditi
ECCV Workshop on Assistive Computer Vision and Robotics (ACVR) 2016

We presented a novel mobile wearable context-aware indoor maps and navigation system with obstacle avoidance for the blind.


Assisting Blind People to Avoid Obstacles: An Wearable Obstacle Stereo Feedback System based on 3D Detection
Bing Li, Xiaochen Zhang, J. Pablo Munoz, Jizhong Xiao, Xuejian Rong, and Yingli Tian
IEEE International Conference on Robotics and Biomimetics (ROBIO) 2015

A wearable Obstacle Stereo Feedback (OSF) System for the Blind people based on 3D space obstacle detection is presented to assist the navigation.


Scene Text Recognition in Multiple Frames based on Text Tracking
Xuejian Rong, Chucai Yi, Xiaodong Yang, and Yingli Tian
IEEE International Conference on Multimedia and Expo (ICME) 2014

We proposed a multi-frame based scene text recognition method by tracking text regions in a video captured by a moving camera.

Template from Jon Barron