Feng Gao

Research Scientist, ByteDance. Ph.D. from UCLA.

feng_202401.jpeg

Feng Gao received his Ph.D. from UCLA in 2022 co-advised by Ying Nian Wu and Mark Handcock. From 2017 to 2021, he was advised by Song-Chun Zhu.

He is currenty a Research Scientist at ByteDance. Specifically, he is

At ByteDance, he works on LLM post-training, building Multimodal LLM agents.

Before that, he was a researcher at Amazon, and he

  • 🐢 Built Rufus [News1], [News2], Amazon’s LLM-powered Shopping Assistant.
  • πŸš€ Launch multimodal Rufus (Rufus-MM).

Feel free to contact me: fenggao [dot] pub [at] gmail [dot] com.

news

Jan 20, 2026 Model and technical report released. Multimodal LLMs for video understanding: ViDi2.5πŸ”
video editing/creation: ViDi-Edit 🎬. [demo] [paper]
Feb 26, 2025 One paper about Relightable 3D Generation πŸͺ‘ [paper] [demo] is accepted to CVPR2025.
Feb 25, 2025 One paper about efficient video-LLM 🎞️ [paper] is accepted to CVPR2025.
Oct 14, 2024 Our work about Embodied AI, Planning as In-Painting :robot: is accepted in NeurIPS2024 OWA. :rocket: [paper]
Sep 25, 2024 Two paper accepted to NeurIPS2024. They are about physically constrained Text-to-3D πŸ€ΈπŸ»β€β™€οΈ [paper] [demo] and flow matching generative model [paper].

selected publications

  1. Tech Report
    vidi_2_5.png
    Vidi2.5: Large Multimodal Models for Video Understanding and Creation
    Vidi Team
    ByteDance Technical Report, 2026
  2. CVPR
    cvpr2025_video_preview.png
    M-LLM Based Video Frame Selection for Efficient Video Understanding
    Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, and 3 more authors
    CVPR, 2025
  3. NeurIPS
    neurips2024_atlas3d_preview.gif
    Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication
    Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, and Chenfanfu Jiang
    NeurIPS, 2024
  4. ECCV
    eccv2024_preview.png
    Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
    Yingshan Chang, Yasi Zhang, Zhiyuan Fang, Yingnian Wu, Yonatan Bisk, and Feng Gao
    ECCV, 2024
  5. NeurIPS
    cvpr2024_preview.png
    Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty
    Cheng-Fu Yang, Haoyang Xu, Te-Lin Wu, Xiaofeng Gao, Kai-Wei Chang, and Feng Gao
    NeurIPS OWA, 2024
  6. NeurIPS
    neurips2023_preview.png
    Learning non-Markovian Decision-Making from State-only Sequences
    Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, and Sirui Xie
    NeurIPS, 2023
  7. CVPR
    cvpr2023_preview.png
    GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
    Da Yin, Feng Gao, Govind Thattai, Michael Johnston, and Kai-Wei Chang
    CVPR, 2023
  8. tpa_preview.gif
    TPA-Net: Generate A Dataset for Text to Physics-based Animation
    Yuxing Qiu, Feng Gao, Minchen Li, Govind Thattai, Yin Yang, and Chenfanfu Jiang
    arXiv preprint arXiv:2211.13887, 2022
  9. CVPR
    cvpr2022_preview.png
    Transform-Retrieve-Generate: Natural Language-centric Outside-Knowledge Visual Question Answering
    Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, and Prem Natarajan
    CVPR, 2022
  10. Science Robotics
    sciencerobotics_preview.png
    A Tale of Two Explanations: Enhancing Human Trust by Explaining Robot Behavior
    Mark Edmonds*, Feng Gao*, Hangxin Liu*, Xu Xie*, Siyuan Qi, Brandon Rothrock, Yixin Zhu, Ying Nian Wu, and 2 more authors
    Science Robotics, 2019
    (* co-first author)
  11. NeurIPS
    neurips2019_preview.png
    Learning Perceptual Inference by Contrasting
    Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, and Song-Chun Zhu
    NeurIPS, 2019
  12. CVPR
    cvpr2019_preview.png
    RAVEN: A Dataset for Relational and Analogical Visual Reasoning
    Chi Zhang*, Feng Gao*, Baoxiong Jia, Yixin Zhu, and Song-Chun Zhu
    CVPR, 2019
    (* co-first author)
  13. ICRA
    icra2018_preview.webp
    Unsupervised Learning of Hierarchical Models for Hand-object Interactions
    Xu Xie, Hangxin Liu, Mark Edmonds, Feng Gao, Siyuan Qi, Yixin Zhu, Brandon Rothrock, and Song-Chun Zhu
    ICRA, 2018
  14. IROS
    iros2017_glove_preview.gif
    A Glove-based System for Studying Hand-object Manipulation via Joint Pose and Force Sensing
    Hangxin Liu, Xu Xie, Matt Millar, Mark Edmonds, Feng Gao, Yixin Zhu, Veronica J Santos, Brandon Rothrock, and 1 more author
    IROS, 2017
  15. IROS
    iros2017_bottle_preview.gif
    Feeling the Force: Integrating Force and Pose for Fluent Discovery through Imitation Learning to Open Medicine Bottles
    Mark Edmonds*, Feng Gao*, Xu Xie, Hangxin Liu, Siyuan Qi, Yixin Zhu, Brandon Rothrock, and Song-Chun Zhu
    IROS, 2017
    (* co-first author)