ACM Multimedia 2026  ·  Workshop

The 1st Workshop on
Agentic Multimodal
Intelligence

Models, Benchmarks, and Applications

TBD 2026 Rio de Janeiro, Brazil Hybrid Format
Latest News
Mar 2026
The AMI Workshop has been officially accepted at ACM Multimedia 2026 in Rio de Janeiro!
Coming Soon
Paper submission portal will open shortly. Stay tuned for the official Call for Papers.
Coming Soon
Keynote speaker lineup confirmed — 4 world-leading researchers in multimodal AI and agentic systems.
Coming Soon
Important dates and paper submission deadlines will be announced. View topics →
Paper Submission 16 July 2026
Notification 6 Aug 2026
Camera Ready 20 Aug 2026
Workshop Date TBD

About the Workshop

Recent advances in multimodal foundation models—including large vision-language models and multimodal generative models—have significantly reshaped artificial intelligence by enabling unified perception, understanding, and generation across text, images, videos, audio, and other sensory modalities. Beyond perception and generation, a growing body of research indicates that the next frontier of intelligence lies in agentic AI: systems that can autonomously reason, plan, act, and adapt over time in interactive environments. This shift marks a fundamental transition from passive multimodal models to Agentic Multimodal Intelligence (AMI).

This workshop brings together researchers and practitioners across multimedia, vision, natural language processing, robotics, and human-computer interaction to advance agentic intelligence—fostering interdisciplinary discussion on models, data, benchmarks, and evaluation for next-generation intelligent agents. We particularly encourage work that bridges multimodal learning, reinforcement learning, embodied AI, and human-in-the-loop systems toward building scalable, robust, and trustworthy agentic systems.

Topics of Interest

Agentic Multimodal Intelligence & Autonomous Decision-Making
Vision-Language-Action Models
Long-horizon Planning & World Models
Multimodal Reasoning, Planning, and Control
Learning from Interaction & Human Feedback
Data, Benchmarks, and Evaluation Protocols
Embodied AI & Simulated Environments
Trustworthy & Responsible Multimodal Agents
Continual & Lifelong Learning for Agents

Organizers

Qi Chen

Qi Chen

Adelaide University, Australia

qi.chen04@adelaide.edu.au

Postdoctoral research fellow at AIML, Adelaide University, working with Prof. Anton van den Hengel and A/Prof. Qi Wu. Research focuses on Multimodal Generative AI and Large Multimodal Models. 30+ publications in IEEE-TPAMI, CVPR, NeurIPS, ICCV.

Junyan Wang

Junyan Wang

Adelaide University, Australia

junyan.wang@adelaide.edu.au

Postdoctoral research fellow at AIML, Adelaide University. Ph.D. from UNSW (2024). Research spans computer vision, video understanding, and generative AI. Publications include CVPR, ICLR, NeurIPS.

Yuankai Qi

Yuankai Qi

Macquarie University, Australia

qykshr@gmail.com

Lecturer at Macquarie University and Adjunct Lecturer at AIML, Adelaide University. Ph.D. from Harbin Institute of Technology (2018). Research spans vision-language navigation, video captioning, image generation, and music generation.

Jiajun Deng

Jiajun Deng

USTC, China

dengjj@ustc.edu.cn

Professor at USTC. Research interests in multimodal understanding, spatial intelligence, and embodied AI. 50+ papers in IEEE TPAMI, NeurIPS, CVPR, ICCV, ECCV. Area Chair at ACM Multimedia 2024 & 2025.

Hao Li

Hao Li

Fudan University, China

lihao_lh@fudan.edu.cn

Professor at Fudan University. Ph.D. from the University of Chinese Academy of Sciences. 100+ peer-reviewed articles and 20+ patents. Research interests include foundation models and video generation.

Mingkui Tan

Mingkui Tan

SCUT, China

mingkuitan@scut.edu.cn

Professor at South China University of Technology. Ph.D. from Nanyang Technological University (2014). Research in machine learning, sparse analysis, deep learning, and large-scale optimization.

Houqiang Li

Houqiang Li

USTC, China

lihq@ustc.edu.cn

Professor at USTC. IEEE Fellow (2021). 200+ papers in top-tier journals and conferences. Research interests include image/video coding, computer vision, and reinforcement learning. Associate Editor of IEEE TMM.

Xiaojun Chang

Xiaojun Chang

USTC, China

xjchang@ustc.edu.cn

Professor at USTC and Visiting Professor at MBZUAI. ARC DECRA Fellow (2019–2021). Previously at CMU, Monash University, and RMIT. Research on machine learning for multimedia analysis and computer vision.

Qi Wu

Qi Wu

Adelaide University, Australia

qi.wu01@adelaide.edu.au

Associate Professor at Adelaide University. Ph.D. from the University of Bath (2015). Research interests in Image Captioning, Visual Question Answering, and Vision-to-Language. State-of-the-art results in MS COCO Captioning Challenges.

Anton van den Hengel

Anton van den Hengel

Adelaide University, Australia

anton.vandenhengel@adelaide.edu.au

Chief Scientist at AIML and Director of the Centre for Augmented Reasoning, Adelaide University. Fellow of the Australian Academy of Technology and Engineering. Former Director of Applied Science at Amazon.

Call for Papers

We invite submissions of original, unpublished research on topics related to agentic multimodal intelligence. All accepted papers will be published in the ACM Digital Library as part of the ACM MM 2026 workshop proceedings.

Paper Format

  • Use the official ACM MM 2026 two-column template
  • Minimum 4 pages, maximum 8 pages
  • Up to 2 additional pages for references only
  • PDF format, submitted in English

Review Process

  • Double-blind peer review — submissions must be anonymized
  • Remove author names, affiliations, and acknowledgements
  • Avoid self-identifying citations where possible
  • Each paper reviewed by at least 3 program committee members

Submission Policy

  • Work must be original and not under review elsewhere
  • Previously published work is not eligible
  • At least one author must register and present at the workshop
  • Accepted papers will appear in the ACM Digital Library

Submit Your Paper

Submissions are managed through OpenReview. Click below to submit your paper.

Submit via OpenReview

Submission deadline: 16 July 2026

Keynote Speakers

Speaker lineup to be announced. Stay tuned for updates.

TBD

To Be Announced

Affiliation TBD

Talk title TBD

TBD

To Be Announced

Affiliation TBD

Talk title TBD

TBD

To Be Announced

Affiliation TBD

Talk title TBD

TBD

To Be Announced

Affiliation TBD

Talk title TBD

Workshop Schedule

The workshop will be held in a hybrid format, accommodating both on-site and online participation.

Topic Duration Speaker Affiliation
Morning Session
Opening of the Workshop 5 min Workshop Chairs
Keynote Talk 1 30 min TBD TBD
Keynote Talk 2 30 min TBD TBD
Coffee Break 10 min
Round Table Discussion 30 min Workshop Host
Keynote Talk 3 30 min TBD TBD
Keynote Talk 4 30 min TBD TBD
Afternoon Session
Paper Presentations (Accepted Papers) 20 min each TBD