Xuejian Rong

I am a researcher working on cutting-edge research projects at the intersection of Computer Vision, Computational Photography, and Machine Learning at the Computational Photography team of Meta Reality Labs. My research interests cover 3D vision, neural rendering, low-level vision, and visual-linguistic understanding.

Email / Google Scholar / LinkedIn

Research

	Boosting View Synthesis with Residual Transfer Xuejian Rong, Jia-Bin Huang, Ayush Saraf, Changil Kim, Johannes Kopf IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 We present a simple but effective technique to boost the rendering quality, which can be easily integrated with most volumetric view synthesis methods. The core idea is to transfer color residuals (the difference between the input images and their reconstruction) from training views to novel views.
	Robust Consistent Video Depth Estimation Johannes Kopf, Xuejian Rong, Jia-Bin Huang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 (Oral) We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
	Burst Denoising via Temporally Shifted Wavelet Transforms Xuejian Rong, Denis Demandolx, Kevin Matzen, Priyam Chatterjee, Yingli Tian European Conference on Computer Vision (ECCV), 2020 Proposed an end-to-end trainable burst denoising pipeline which jointly captures high-resolution and high-frequency deep features derived from wavelet transforms.
	Unambiguous Text Localization, Retrieval, and Recognition for Cluttered Scenes Xuejian Rong, Chucai Yi, Yingli Tian IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Accepted. Extended our previous CVPR paper as an End-to-End pipeline from scene text detection and retrieval to recognition.
	Incremental Scene Synthesis Benjamin Planche, Xuejian Rong, Ziyan Wu, Srikrishna Karanam, Harald Kosch, Yingli Tian, and Jan Ernst Thirty-third Conference on Neural Information Processing Systems (NeurIPS), 2019 To incrementally generates complete and consistent 2D or 3D scenes with learned scene priors, while real observations of an actual scene can be incorporated, and unobserved parts of the scene can be hallucinated. Applications include autonomous agent exploration and few-shot learning.
	Towards Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes Haiyan Wang, Xuejian Rong, Liang Yang, Yingli Tian British Machine Vision Conference (BMVC), 2019 (Oral) Presents a method of 3D point cloud segmentation using 2D supervision. A graph-based pyramid feature network is proposed to capture global and local feature of points. A perspective rendering and semantic fusion module is also introduced to offer refined 2D supervision.
	Towards Accurate Instance-level Text Spotting With Guided Attention Haiyan Wang, Xuejian Rong, Yingli Tian IEEE International Conference on Multimedia and Expo (ICME), 2019 Presents an effective end-to-end framework for detecting multi-lingual scene texts in arbitrary orientations by integrating text attention model and global enhancement block with the pixel-link method without adopting pretrained weights or extra synthetic datasets.
	Unambiguous Scene Text Segmentation with Referring Expression Comprehension Xuejian Rong, Chucai Yi, Yingli Tian IEEE Transactions on Image Processing (TIP), Accepted. Combining the power of both instance-level scene text segmentation and visual phrase grounding.
	Unambiguous Text Localization and Retrieval for Cluttered Scenes Xuejian Rong, Chucai Yi, Yingli Tian IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 (Spotlight) To utilize text instances for understanding natural scenes, we have proposed a framework that combines image-based text localization with language-based context description for text instances. Specifically, we explore the task of unambiguous text localization and retrieval, to accurately localize a specific targeted text instance in a cluttered image given a natural language description that refers to it.
	Evaluation of Low-Level Features for Real-World Surveillance Event Detection Yang Xian, Xuejian Rong, Xiaodong Yang, Yingli Tian IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2017 We evaluate several of the most commonly used low-level features for real-world surveillance event detection tasks.
	Assistive Indoor Navigation for the Visually Impaired in Multi-Floor Environments J. Pablo Munoz, Bing Li, Xuejian Rong, Jizhong Xiao, Yingli Tian, and Aris Arditi IEEE International Conference on Cyber Technology (CYBER), 2017 (Best Paper Award) Our system allows blind users to explore multi-floor environment with a wearable Tango device.
	Adaptive Shrinkage Cascades for Blind Image Deconvolution. Xuejian Rong and Yingli Tian IEEE International Conference on Digital Signal Processing (DSP), 2016 (Oral) A framework is proposed to deconvolve blind image with patch-wise prior and adaptive shrinkage cascades.
	Region Trajectories for Video Semantic Concept Detection Yuancheng Ye, Xuejian Rong, and Yingli Tian ACM International Conference on Multimedia Retrieval (ICMR), 2016 We introduce an algorithm based on region trajectories to establish the connections between object localization in individual frames and video sequences.
	ISANA: Wearable Context-Aware Indoor Assistive Navigation with Obstacle Avoidance for the Blind Bing Li, J. Pablo Munoz, Xuejian Rong, Jizhong Xiao, Yingli Tian, and Aris Arditi ECCV Workshop on Assistive Computer Vision and Robotics (ACVR) 2016 We presented a novel mobile wearable context-aware indoor maps and navigation system with obstacle avoidance for the blind.
	Assisting Blind People to Avoid Obstacles: An Wearable Obstacle Stereo Feedback System based on 3D Detection Bing Li, Xiaochen Zhang, J. Pablo Munoz, Jizhong Xiao, Xuejian Rong, and Yingli Tian IEEE International Conference on Robotics and Biomimetics (ROBIO) 2015 A wearable Obstacle Stereo Feedback (OSF) System for the Blind people based on 3D space obstacle detection is presented to assist the navigation.
	Scene Text Recognition in Multiple Frames based on Text Tracking Xuejian Rong, Chucai Yi, Xiaodong Yang, and Yingli Tian IEEE International Conference on Multimedia and Expo (ICME) 2014 We proposed a multi-frame based scene text recognition method by tracking text regions in a video captured by a moving camera. Template from Jon Barron