Vision, speech & code models
Summary: Large multimodal models in production.
Cross-modal retrieval, latency budgets, vector DBs, and eval.
When: May 15, 2025 5:00 AM — 1:30 PM (America/New_York)
Where: Boston Convention & Exhibition Center, 415 Summer St • Boston, MA, 02210, US
Accessibility: Assistive listening devices
Organizer: REGKT Events • events@regkt.org • +1-617-000-0000
Linking perception, language, and action with guarantees.
Latent video diffusion with causal priors.
Dataset prep, reward models, and safety filters.
Segment-level grounding with contrastive objectives.
Training curricula and hard negative mining.
Style tokens and controllable TTS.
Temporal attention and motion coherence.
Fusion blocks with uncertainty estimation.
Human preference data and multi-criteria scoring.
Moderation and structured Q&A.
Deduplication, NSFW filtering, and metadata QA.
Content provenance, watermarking, and disclosure.
Moderated discussion and audience Q&A.
Telemetry, safety gates, and iteration loops.
This event represents a platform where leaders, researchers, and practitioners exchange advancements that influence real-world technical, scientific, and organizational practices. The contributions and presentations delivered here reflect recognized expertise in the field and demonstrate active participation in advancing modern knowledge.
Participants are encouraged to archive discussions, slide decks, and session recordings as part of documenting their contributions, outcomes, and role in shaping the dialogue presented here.