Multimodal AI Summit 2025

Multimodal AI Summit 2025

Vision, speech & code models

May 15, 2025 5:00 AM — 1:30 PM Boston Convention & Exhibition Center, 415 Summer St • Boston, MA, 02210, US Public Published USD 299.00 – 899.00 • Cap 2200
← All Events Register Event Website

Event Details

Summary: Large multimodal models in production.

Cross-modal retrieval, latency budgets, vector DBs, and eval.

When: May 15, 2025 5:00 AM — 1:30 PM (America/New_York)

Where: Boston Convention & Exhibition Center, 415 Summer St • Boston, MA, 02210, US

Accessibility: Assistive listening devices

Organizer: REGKT Events • events@regkt.org • +1-617-000-0000

Participants

Dr. Dr. Hannah Lee 🇺🇸

VisionAudio Lab

Director, Multimodal Research

Session_chair: General Chair Track: Program Room: Main Hall Featured

Chair: Session

Leads teams building vision-language-audio foundation models.

Prof. Prof. Daniel Okoye 🇳🇬

University of Lagos

Professor of Machine Perception

Keynote: Opening Keynote Track: Keynotes Room: Main Hall Nov 1, 2025 5:00 AM – 5:45 AM Featured

Keynote: Grounded Multimodality for Embodied Agents

Linking perception, language, and action with guarantees.

Researches grounded VLMs and embodied multimodal learning.

Dr. Dr. Sofia Alvarez 🇺🇸

Aurora Systems

Principal Scientist, Generative AI

Keynote: Day 2 Keynote Track: Keynotes Room: Main Hall Nov 2, 2025 4:00 AM – 4:45 AM Featured

Keynote: Video Generation Meets World Models

Latent video diffusion with causal priors.

Builds high-fidelity video and audio generation models.

Ritika Sharma 🇮🇳

VisionWorks

Research Manager

Session_chair: Track A Chair Track: VLMs Room: Room A Featured

Chair: Session

Vision-language alignment and retrieval-augmented VLMs.

Jonah Miller 🇨🇦

AudioGraph

Lead Scientist, Speech

Session_chair: Track B Chair Track: Audio Room: Room B Featured

Chair: Session

Self-supervised audio representation learning.

Jing Wu 🇨🇳

AlignLab

Workshop Instructor

Workshop_instructor: Hands-on Lab Track: Workshops Room: Lab A Nov 2, 2025 8:30 AM – 10:00 AM Featured

Workshop: RLHF for Multimodal Models

Dataset prep, reward models, and safety filters.

Alignment strategies for VLMs and audio-LMs.

Elena Petrova 🇷🇺

SenseSpan

Senior Researcher

Speaker Track: VLMs Room: Room A Nov 1, 2025 7:00 AM – 7:30 AM

Talk: Temporal Grounding for VLMs

Segment-level grounding with contrastive objectives.

Temporal grounding for video-language models.

Mohamed Idris 🇪🇬

ClipHub

Applied Scientist

Speaker Track: VLMs Room: Room A Nov 1, 2025 7:35 AM – 8:05 AM

Talk: Scaling Video-Text Retrieval

Training curricula and hard negative mining.

Video-text retrieval with large encoders.

Giulia Romano 🇮🇹

AudioFlow

Research Engineer

Speaker Track: Audio Room: Room B Nov 1, 2025 10:00 AM – 10:30 AM

Talk: Prosody-Aware Speech Translation

Style tokens and controllable TTS.

Speech-to-speech translation and prosody control.

Takumi Sato 🇯🇵

VidSynth

Scientist

Speaker Track: Generation Room: Room G Nov 1, 2025 11:00 AM – 11:30 AM

Talk: Long-Context Video Diffusion

Temporal attention and motion coherence.

Long-context video diffusion and conditioning.

Rhea Kapoor 🇮🇳

MedSight AI

Senior ML Engineer

Speaker Track: Healthcare Room: Room H Nov 2, 2025 5:30 AM – 6:00 AM

Talk: Radiology VLMs with Clinical Text

Fusion blocks with uncertainty estimation.

Cross-modal fusion for clinical imaging and reports.

Carlos Mendes 🇧🇷

DataSense

Panelist

Panelist: Panelist Track: Evaluation Room: Room E Nov 1, 2025 12:00 PM – 12:45 PM

Panel: Evaluating Multimodal Systems

Human preference data and multi-criteria scoring.

Model evaluation and human preference alignment.

Amira El-Sayed 🇪🇬

Independent

Moderator

Moderator: Panel Moderator Track: Evaluation Room: Room E Nov 1, 2025 12:00 PM – 12:45 PM

Panel: Evaluating Multimodal Systems

Moderation and structured Q&A.

Moderates panels on evaluation and human factors.

Lina Kovacs 🇭🇺

DataCanvas

Workshop Instructor

Workshop_instructor: Hands-on Lab Track: Workshops Room: Lab B Nov 1, 2025 9:30 AM – 11:00 AM

Workshop: Curation Pipelines for VLMs

Deduplication, NSFW filtering, and metadata QA.

Data curation and filtering for multimodal corpora.

Nora Haddad 🇱🇧

CivicAI Forum

Policy Researcher

Panelist: Panelist Track: Ethics Room: Room X Nov 2, 2025 6:15 AM – 7:00 AM

Panel: Deepfakes, Attribution, and Consent

Content provenance, watermarking, and disclosure.

AI policy and media integrity in multimodal systems.

Owen McCarthy 🇮🇪

Independent

Moderator

Moderator: Panel Moderator Track: Ethics Room: Room X Nov 2, 2025 6:15 AM – 7:00 AM

Panel: Deepfakes, Attribution, and Consent

Moderated discussion and audience Q&A.

Moderates panels on AI governance and policy.

Patricia Gomez 🇺🇸

REGKT Events

Organizer

Organizer: Program Operations

Ops: Session

Program operations and speaker success.

Keith Brown 🇺🇸

REGKT Events

Organizer

Organizer: Logistics

Ops: Session

Venue logistics and A/V.

Dr. Dr. Greta Schultz 🇩🇪

OpenEval

Judge

Judge: Poster Awards Judge Nov 2, 2025 10:00 AM – 11:00 AM

Judging: Session

External judge for poster awards.

Aaliyah Johnson 🇺🇸

SynthWorks

Product Lead, Multimodal

Speaker Track: Production Room: Room P Nov 2, 2025 10:15 AM – 10:45 AM

Talk: Shipping Multimodal Assistants Safely

Telemetry, safety gates, and iteration loops.

Builds production pipelines for multimodal assistants.

Professional Impact & Significance

This event represents a platform where leaders, researchers, and practitioners exchange advancements that influence real-world technical, scientific, and organizational practices. The contributions and presentations delivered here reflect recognized expertise in the field and demonstrate active participation in advancing modern knowledge.

Participants are encouraged to archive discussions, slide decks, and session recordings as part of documenting their contributions, outcomes, and role in shaping the dialogue presented here.

← Back to All Events Register