About Me

I am a first-year PhD student in Computer Vision at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), under the supervision of Prof. Ian Reid. Previously, I was a Research Assistant at the Singapore University of Technology and Design (SUTD), advised by Prof. Ngai-Man Cheung.

I was also an AI Research Resident at FPT Software AI Center under the supervision of Prof. Thieu N. Vo, Prof. Baoru Huang, and Prof. Anh Nguyen.

I also have several years of industry experience in AI, computer vision, and robotics, having worked at well-known AI and robotics tech companies in Vietnam, including VinRobotics, MoMo (a tech unicorn), VinBrain (acquired by NVIDIA), and FPT Software.

Before that, I received my B.Sc. in Computer Science from the University of Information Technology, VNU-HCM, where I graduated from the talented honors program and was supervised by Prof. Le Dinh Duy.

Research

My research interests lie at the intersection of robotics, computer vision, and multimodal learning, with a particular focus on embodied AI, vision-language-action models, multimodal reasoning, world models, robotic manipulation, and open-vocabulary perception.

More broadly, I hope to build intelligent agents that can perceive, reason, and act robustly in the physical world.

I am also actively seeking research internship opportunities in embodied AI, embodied spatial reasoning, spatial intelligence, vision-language-action and vision-language models, multimodal reasoning, world models, and robot planning and perception, where I deeply hope to contribute my experience in both research and production-scale AI systems. Please contact me if you have any opportunities.

News

Publications

Full list available on Google Scholar.

CVPRW 2026
Counting to Four paper figure
Counting to Four is still a Chore for VLMs
Duy Le Dinh Anh, Patrick Amadeus Irawan, Tuan Van Vo.
CVPR 2026 Workshop on Multimodal Foundation Models.
arXiv
World2Act paper figure
World2Act: Latent Action Post-Training via Skill-Compositional World Models
An Dinh Vuong*, Tuan Van Vo*, Abdullah Sohail, Haoran Ding, Liang Ma, Xiaodan Liang, Anqing Duan, Ivan Laptev, Ian Reid.
arXiv preprint.
(*) Co-first authors, equal contribution.
arXiv
CathAction paper figure
CathAction: A Benchmark for Endovascular Intervention Understanding
Baoru Huang*, Tuan Van Vo*, Chayun Kongtongvattana, Anh Nguyen.
arXiv preprint.
(*) Co-first authors, equal contribution.
arXiv
ReFineVLA paper figure
ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning
Tuan Van Vo, Quang Tan Nguyen, Khang Nguyen, Tran Xuan Nhat, Duy Minh Ho Nguyen, An Thai Le, Vien Anh Ngo, Minh Nhat Vu.
arXiv preprint.
NeurIPS 2024
NeurIPS 2024 paper figure
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
Sy-Tuyen Ho*, Tuan Van Vo*, Somayeh Ebrahimkhani*, Ngai-Man Cheung.
NeurIPS 2024.
(*) Co-first authors, equal contribution.
ICRA 2024
Open-vocabulary affordance paper figure
Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Tuan Van Vo, Minh Nhat Vu, Baoru Huang, Toan Nguyen, Ngan Le, Thieu Vo, Anh Nguyen.
ICRA 2024, Oral Presentation.
ICRA 2024
Affordance-pose paper figure
Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
Toan Nguyen, Minh Nhat Vu, Baoru Huang, Tuan Van Vo, Thuy Tuong Vy Truong, Ngan Le, Thieu Vo, Hoai Bac Le, Anh Nguyen.
ICRA 2024, Oral Presentation.
IROS 2024
Language-driven grasp paper figure
Language-driven Grasp Detection with Mask-guided Attention
Tuan Van Vo, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen.
IROS 2024, Oral Presentation.
BMVC 2022
BMVC paper figure
Dual consistency assisted multi-confident learning for the hepatic vessel segmentation using noisy labels
Nam Phuong Nguyen*, Tuan Van Vo*, Soan T. M. Duong, Chanh D. Tr. Nguyen, Trung Bui, Steven Q. H. Truong.
BMVC 2022.
(*) Co-first authors, equal contribution.

Education

Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
PhD in Computer Vision, supervised by Prof. Ian Reid
2025 - Present · Abu Dhabi, UAE
University of Information Technology, Vietnam National University Ho Chi Minh City
B.Sc. in Computer Science, Talented Honors Program
Aug. 2017 - Jun. 2021 ·
Supervised by Prof. Le Dinh Duy. The honors program offered a special curriculum, scholarships, and additional academic privileges for outstanding undergraduate students.

Research Experience

Research Assistant
Singapore University of Technology and Design (SUTD)
Sep. 2023 - Mar. 2025 · Singapore
Worked on zero-shot neural architecture search and vision transformer architecture search for out-of-distribution generalization, resulting in a NeurIPS 2024 publication.
AI Research Residency
FPT Software AI Center
Jul. 2022 - Jun. 2024 · Vietnam
Focused on multimodal robot learning, leading to two papers at ICRA 2024 and one paper at IROS 2024.
Research Assistant
MMLab, University of Information Technology (UIT) - VNU-HCM
Oct. 2019 - May 2021 · Vietnam
Completed my undergraduate research thesis on violence detection in video surveillance using deep multiple instance learning.

Industry Experience

Principal Research Engineer, Founding Team Member
VinRobotics, Vingroup Corporation
Apr. 2025 - Aug. 2025 · Vietnam
Worked on foundation models for vision-language-action robotics.
Computer Vision Research Engineer
MoMo (M-Service JSC) (a tech unicorn)
Jun. 2022 - Aug. 2023 · Vietnam
Built production computer vision systems for eKYC, including face anti-spoofing, OCR, object detection, document understanding, fraud detection, and deepfake detection. Achieved a 99% accuracy rate with low latency, serving nearly 40 million users at about 45 RPS. These AI eKYC systems have gained trust and are utilized by the Apple Online Store in Vietnam.
Applied Scientist
VinBrain (acquired by NVIDIA)
May 2021 - May 2022 · Vietnam
Conducted research and product development for medical imaging, smart city, and smart home applications, and contributed to the BMVC 2022 publication on noisy-label liver vessel segmentation.
AI Engineer, Technical Lead for Healthcare Projects
FPT Software
May 2020 - May 2021 · Vietnam
Designed AI solutions for the Japanese healthcare market, including semantic segmentation for dental X-rays and implant classification systems.

Academic Service

Reviewer Service
Reviewer for TMM 2024, ICRA 2025, IROS 2025, CoRL 2025, and CoRL 2026.

Honors and Activities

  • Full scholarship and stipend for PhD study at MBZUAI.
  • Rosen Scholarship, 2020 and 2021.
  • Study Scholarships, 2018 and 2019.
  • Participated in the ICPC 2019 Vietnam National Programming Contest.
  • Presented two oral papers at ICRA 2024 in Yokohama, Japan.

Visitors