AMI Workshop @ ACM MM 2026

Overview

About the Workshop

Recent advances in multimodal foundation models—including large vision-language models and multimodal generative models—have significantly reshaped artificial intelligence by enabling unified perception, understanding, and generation across text, images, videos, audio, and other sensory modalities. Beyond perception and generation, a growing body of research indicates that the next frontier of intelligence lies in agentic AI: systems that can autonomously reason, plan, act, and adapt over time in interactive environments. This shift marks a fundamental transition from passive multimodal models to Agentic Multimodal Intelligence (AMI).

This workshop brings together researchers and practitioners across multimedia, vision, natural language processing, robotics, and human-computer interaction to advance agentic intelligence—fostering interdisciplinary discussion on models, data, benchmarks, and evaluation for next-generation intelligent agents. We particularly encourage work that bridges multimodal learning, reinforcement learning, embodied AI, and human-in-the-loop systems toward building scalable, robust, and trustworthy agentic systems.

Topics of Interest

Agentic Multimodal Intelligence & Autonomous Decision-Making

Vision-Language-Action Models

Long-horizon Planning & World Models

Multimodal Reasoning, Planning, and Control

Learning from Interaction & Human Feedback

Data, Benchmarks, and Evaluation Protocols

Embodied AI & Simulated Environments

Trustworthy & Responsible Multimodal Agents

Continual & Lifelong Learning for Agents

Team

Organizers

Qi Chen

Adelaide University, Australia

qi.chen04@adelaide.edu.au

Postdoctoral research fellow at AIML, Adelaide University, working with Prof. Anton van den Hengel and A/Prof. Qi Wu. Research focuses on Multimodal Generative AI and Large Multimodal Models. 30+ publications in IEEE-TPAMI, CVPR, NeurIPS, ICCV.

Junyan Wang

Adelaide University, Australia

junyan.wang@adelaide.edu.au

Postdoctoral research fellow at AIML, Adelaide University. Ph.D. from UNSW (2024). Research spans computer vision, video understanding, and generative AI. Publications include CVPR, ICLR, NeurIPS.

Yuankai Qi

Macquarie University, Australia

qykshr@gmail.com

Lecturer at Macquarie University and Adjunct Lecturer at AIML, Adelaide University. Ph.D. from Harbin Institute of Technology (2018). Research spans vision-language navigation, video captioning, image generation, and music generation.

Jiajun Deng

USTC, China

dengjj@ustc.edu.cn

Professor at USTC. Research interests in multimodal understanding, spatial intelligence, and embodied AI. 50+ papers in IEEE TPAMI, NeurIPS, CVPR, ICCV, ECCV. Area Chair at ACM Multimedia 2024 & 2025.

Hao Li

Fudan University, China

lihao_lh@fudan.edu.cn

Professor at Fudan University. Ph.D. from the University of Chinese Academy of Sciences. 100+ peer-reviewed articles and 20+ patents. Research interests include foundation models and video generation.

Mingkui Tan

SCUT, China

mingkuitan@scut.edu.cn

Professor at South China University of Technology. Ph.D. from Nanyang Technological University (2014). Research in machine learning, sparse analysis, deep learning, and large-scale optimization.

Houqiang Li

USTC, China

lihq@ustc.edu.cn

Professor at USTC. IEEE Fellow (2021). 200+ papers in top-tier journals and conferences. Research interests include image/video coding, computer vision, and reinforcement learning. Associate Editor of IEEE TMM.

Xiaojun Chang

USTC, China

xjchang@ustc.edu.cn

Professor at USTC and Visiting Professor at MBZUAI. ARC DECRA Fellow (2019–2021). Previously at CMU, Monash University, and RMIT. Research on machine learning for multimedia analysis and computer vision.

Qi Wu

Adelaide University, Australia

qi.wu01@adelaide.edu.au

Associate Professor at Adelaide University. Ph.D. from the University of Bath (2015). Research interests in Image Captioning, Visual Question Answering, and Vision-to-Language. State-of-the-art results in MS COCO Captioning Challenges.

Anton van den Hengel

Adelaide University, Australia

anton.vandenhengel@adelaide.edu.au

Chief Scientist at AIML and Director of the Centre for Augmented Reasoning, Adelaide University. Fellow of the Australian Academy of Technology and Engineering. Former Director of Applied Science at Amazon.

Submissions

Call for Papers

We invite submissions of original, unpublished research on topics related to agentic multimodal intelligence. All accepted papers will be published in the ACM Digital Library as part of the ACM MM 2026 workshop proceedings.

Paper Format

Use the official ACM MM 2026 two-column template
Minimum 4 pages, maximum 8 pages
Up to 2 additional pages for references only
PDF format, submitted in English

Review Process

Double-blind peer review — submissions must be anonymized
Remove author names, affiliations, and acknowledgements
Avoid self-identifying citations where possible
Each paper reviewed by at least 3 program committee members

Submission Policy

Work must be original and not under review elsewhere
Previously published work is not eligible
At least one author must register and present at the workshop
Accepted papers will appear in the ACM Digital Library

Submit Your Paper

Submissions are managed through OpenReview. Click below to submit your paper.

Submit via OpenReview

Submission deadline: 16 July 2026

Invited Talks

Keynote Speakers

Speaker lineup to be announced. Stay tuned for updates.

TBD

To Be Announced

Affiliation TBD

Talk title TBD

TBD

To Be Announced

Affiliation TBD

Talk title TBD

TBD

To Be Announced

Affiliation TBD

Talk title TBD

TBD

To Be Announced

Affiliation TBD

Talk title TBD

Topic	Duration	Speaker	Affiliation
Morning Session
Opening of the Workshop	5 min	Workshop Chairs	—
Keynote Talk 1	30 min	TBD	TBD
Keynote Talk 2	30 min	TBD	TBD
Coffee Break	10 min	—	—
Round Table Discussion	30 min	Workshop Host	—
Keynote Talk 3	30 min	TBD	TBD
Keynote Talk 4	30 min	TBD	TBD
Afternoon Session
Paper Presentations (Accepted Papers)	20 min each	TBD	—

The 1st Workshop on
Agentic Multimodal
Intelligence

About the Workshop

Topics of Interest

Organizers

Qi Chen

Junyan Wang

Yuankai Qi

Jiajun Deng

Hao Li

Mingkui Tan

Houqiang Li

Xiaojun Chang

Qi Wu

Anton van den Hengel

Call for Papers

Paper Format

Review Process

Submission Policy

Submit Your Paper

Keynote Speakers

To Be Announced

To Be Announced

To Be Announced

To Be Announced

Workshop Schedule