Multimodal AI Summit 2025

Session_chair: General Chair Track: Program Room: Main Hall Featured

Chair: Session

Leads teams building vision-language-audio foundation models.

Keynote: Opening Keynote Track: Keynotes Room: Main Hall Nov 1, 2025 9:00 AM – 9:45 AM Featured

Keynote: Grounded Multimodality for Embodied Agents

Linking perception, language, and action with guarantees.

Researches grounded VLMs and embodied multimodal learning.

Keynote: Day 2 Keynote Track: Keynotes Room: Main Hall Nov 2, 2025 9:00 AM – 9:45 AM Featured

Keynote: Video Generation Meets World Models

Latent video diffusion with causal priors.

Builds high-fidelity video and audio generation models.

Session_chair: Track A Chair Track: VLMs Room: Room A Featured

Chair: Session

Vision-language alignment and retrieval-augmented VLMs.

Session_chair: Track B Chair Track: Audio Room: Room B Featured

Chair: Session

Self-supervised audio representation learning.

Workshop_instructor: Hands-on Lab Track: Workshops Room: Lab A Nov 2, 2025 1:30 PM – 3:00 PM Featured

Workshop: RLHF for Multimodal Models

Dataset prep, reward models, and safety filters.

Alignment strategies for VLMs and audio-LMs.

Speaker Track: VLMs Room: Room A Nov 1, 2025 11:00 AM – 11:30 AM

Talk: Temporal Grounding for VLMs

Segment-level grounding with contrastive objectives.

Temporal grounding for video-language models.

Speaker Track: VLMs Room: Room A Nov 1, 2025 11:35 AM – 12:05 PM

Talk: Scaling Video-Text Retrieval

Training curricula and hard negative mining.

Video-text retrieval with large encoders.

Speaker Track: Audio Room: Room B Nov 1, 2025 2:00 PM – 2:30 PM

Talk: Prosody-Aware Speech Translation

Style tokens and controllable TTS.

Speech-to-speech translation and prosody control.

Speaker Track: Generation Room: Room G Nov 1, 2025 3:00 PM – 3:30 PM

Talk: Long-Context Video Diffusion

Temporal attention and motion coherence.

Long-context video diffusion and conditioning.

Speaker Track: Healthcare Room: Room H Nov 2, 2025 10:30 AM – 11:00 AM

Talk: Radiology VLMs with Clinical Text

Fusion blocks with uncertainty estimation.

Cross-modal fusion for clinical imaging and reports.

Panelist: Panelist Track: Evaluation Room: Room E Nov 1, 2025 4:00 PM – 4:45 PM

Panel: Evaluating Multimodal Systems

Human preference data and multi-criteria scoring.

Model evaluation and human preference alignment.

Moderator: Panel Moderator Track: Evaluation Room: Room E Nov 1, 2025 4:00 PM – 4:45 PM

Panel: Evaluating Multimodal Systems

Moderation and structured Q&A.

Moderates panels on evaluation and human factors.

Workshop_instructor: Hands-on Lab Track: Workshops Room: Lab B Nov 1, 2025 1:30 PM – 3:00 PM

Workshop: Curation Pipelines for VLMs

Deduplication, NSFW filtering, and metadata QA.

Data curation and filtering for multimodal corpora.

Panelist: Panelist Track: Ethics Room: Room X Nov 2, 2025 11:15 AM – 12:00 PM

Panel: Deepfakes, Attribution, and Consent

Content provenance, watermarking, and disclosure.

AI policy and media integrity in multimodal systems.

Moderator: Panel Moderator Track: Ethics Room: Room X Nov 2, 2025 11:15 AM – 12:00 PM

Panel: Deepfakes, Attribution, and Consent

Moderated discussion and audience Q&A.

Moderates panels on AI governance and policy.

Organizer: Program Operations

Ops: Session

Program operations and speaker success.

Organizer: Logistics

Ops: Session

Venue logistics and A/V.

Judge: Poster Awards Judge Nov 2, 2025 3:00 PM – 4:00 PM

Judging: Session

External judge for poster awards.

Speaker Track: Production Room: Room P Nov 2, 2025 3:15 PM – 3:45 PM

Talk: Shipping Multimodal Assistants Safely

Telemetry, safety gates, and iteration loops.

Builds production pipelines for multimodal assistants.

Multimodal AI Summit 2025

Event Details

Participants

Dr. Dr. Hannah Lee 🇺🇸

Chair: Session

Prof. Prof. Daniel Okoye 🇳🇬

Keynote: Grounded Multimodality for Embodied Agents

Dr. Dr. Sofia Alvarez 🇺🇸

Keynote: Video Generation Meets World Models

Ritika Sharma 🇮🇳

Chair: Session

Jonah Miller 🇨🇦

Chair: Session

Jing Wu 🇨🇳

Workshop: RLHF for Multimodal Models

Elena Petrova 🇷🇺

Talk: Temporal Grounding for VLMs

Mohamed Idris 🇪🇬

Talk: Scaling Video-Text Retrieval

Giulia Romano 🇮🇹

Talk: Prosody-Aware Speech Translation

Takumi Sato 🇯🇵

Talk: Long-Context Video Diffusion

Rhea Kapoor 🇮🇳

Talk: Radiology VLMs with Clinical Text

Carlos Mendes 🇧🇷

Panel: Evaluating Multimodal Systems

Amira El-Sayed 🇪🇬

Panel: Evaluating Multimodal Systems

Lina Kovacs 🇭🇺

Workshop: Curation Pipelines for VLMs

Nora Haddad 🇱🇧

Panel: Deepfakes, Attribution, and Consent

Owen McCarthy 🇮🇪

Panel: Deepfakes, Attribution, and Consent

Patricia Gomez 🇺🇸

Ops: Session

Keith Brown 🇺🇸

Ops: Session

Dr. Dr. Greta Schultz 🇩🇪

Judging: Session

Aaliyah Johnson 🇺🇸

Talk: Shipping Multimodal Assistants Safely

Professional Impact & Significance