publications

2024

  1. Blog
    LLaVA-NeXT: Improved reasoning, OCR, and world knowledge
    Jan 2024
  2. llava_v15.jpg
    Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)
    Haotian LiuChunyuan LiYuheng Li, and Yong Jae Lee
    CVPR, 2024
  3. vipllava2024.jpg
    Making large multimodal models understand arbitrary visual prompts
    Mu CaiHaotian Liu, Siva Karthik Mustikovela , Gregory P Meyer , Yuning Chai , Dennis Park , and Yong Jae Lee
    CVPR, 2024
  4. edit-one-for-all.jpg
    Edit One for All: Interactive Batch Image Editing
    Thao NguyenUtkarsh OjhaYuheng LiHaotian Liu, and Yong Jae Lee
    CVPR, 2024
  5. wacv2024.jpg
    Computer Vision on the Edge: Individual Cattle Identification in Real-Time With ReadMyCow System
    Moniek Smink , Haotian Liu, Dörte Döpfer , and Yong Jae Lee
    WACV, 2024

2023

  1. arXiv
    LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
    Shilong Liu, Hao Cheng , Haotian Liu , Hao Zhang , Feng Li , Tianhe Ren , Xueyan ZouJianwei Yang, Hang Su , Jun Zhu , Lei Zhang , Jianfeng Gao, and Chunyuan Li
    arXiv, 2023
  2. arXiv
    Aligning Large Multimodal Models with Factually Augmented RLHF
    Zhiqing SunSheng Shen, Shengcao Cao , Haotian LiuChunyuan Li , Yikang Shen , Chuang Gan , Liang-Yan Gui , Yu-Xiong Wang , Yiming Yang , Kurt Keutzer , and Trevor Darrell
    arXiv, 2023
  3. NeurIPS
    An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
    Yadong LuChunyuan LiHaotian LiuJianwei YangJianfeng Gao , and Yelong Shen
    NeurIPS, Workshop on Instruction Tuning and Instruction Following, 2023
  4. llavamed2023.jpg
    LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
    Chunyuan Li, Cliff Wong , Sheng Zhang , Naoto Usuyama , Haotian LiuJianwei Yang, Tristan Naumann , Hoifung Poon , and Jianfeng Gao
    NeurIPS, Datasets and Benchmarks Track, 2023 (Spotlight)
  5. llava2023.jpg
    Visual Instruction Tuning (LLaVA)
    Haotian Liu*Chunyuan Li*Qingyang Wu, and Yong Jae Lee
    NeurIPS, 2023 (Oral, top 0.5%)
  6. arXiv
    Benchmarking and Analyzing Generative Data for Visual Recognition
    Bo LiHaotian Liu, Liangyu Chen , Yong Jae LeeChunyuan Li, and Ziwei Liu
    arXiv, 2023
  7. pacgen2023.jpg
    Generate Anything Anywhere in Any Scene
    Yuheng LiHaotian Liu, Yangming Wen , and Yong Jae Lee
    arXiv, 2023
  8. react2023.jpg
    Learning Customized Visual Models with Retrieval-Augmented Knowledge
    Haotian Liu, Kilho Son , Jianwei Yang , Ce Liu , Jianfeng GaoYong Jae Lee*, and Chunyuan Li*
    CVPR, 2023 (Highlight, top 2.5%)
  9. gligen2023.jpg
    GLIGEN: Open-Set Grounded Text-to-Image Generation
    Yuheng LiHaotian LiuQingyang Wu, Fangzhou Mu , Jianwei YangJianfeng GaoChunyuan Li, and Yong Jae Lee
    CVPR, 2023

2022

  1. elevater2022.jpg
    ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
    Chunyuan Li*Haotian Liu* , Liunian Li , Pengchuan Zhang , Jyoti Aneja , Jianwei Yang, Ping Jin , Houdong Hu , Zicheng Liu , Yong Jae Lee, and Jianfeng Gao
    NeurIPS, Datasets and Benchmarks Track, 2022
  2. maskpoint2022.png
    Masked Discrimination for Self-Supervised Learning on Point Clouds
    Haotian LiuMu Cai, and Yong Jae Lee
    ECCV, 2022
  3. instedge2022.jpg
    End-to-End Instance Edge Detection
    Xueyan ZouHaotian Liu*, and Yong Jae Lee
    arXiv, 2022

2021

  1. icra2021.jpg
    YolactEdge: Real-time Instance Segmentation on the Edge
    Haotian Liu*, Rafael A Rivera Soto* , Fanyi Xiao, and Yong Jae Lee
    ICRA, 2021

2019

  1. iccv2019.jpg
    Identity from here, Pose from there: Self-supervised Disentanglement and Generation of Objects using Unlabeled Videos
    Fanyi XiaoHaotian Liu, and Yong Jae Lee
    ICCV, 2019

2018

  1. JCLP
    Operation strategy of public building: Implications from trade-off between carbon emission and occupant satisfaction
    Yimeng Chen , Haotian Liu, and Lei Shi
    Journal of Cleaner Production, 2018