This page is a growing collection of short research contributions from researchers participating in or connected to the MMI community. Each entry highlights an important open problem, challenge, or future research direction in Multimedia Intelligence, Multimodal AI, Information Retrieval, Trustworthy AI, Human-Centered AI, Intelligent Systems, or related areas.
Each entry captures what a researcher believes is worth working on next: a problem they keep encountering, a gap they wish more students would explore, or a direction that remains underinvestigated. The goal is to give students a map into the research landscape, written by researchers who are actively shaping it.
Unlike surveys or tutorials, these contributions focus on unanswered questions and emerging opportunities. They are intended to help students identify impactful research directions, understand why they matter, and discover where they can contribute.
Long-form video understanding requires maintaining semantic coherence across thousands of frames while simultaneously integrating speech transcripts, visual events, and temporal context. Current vision-language models excel at short clips but struggle to reason about relationships between events separated by minutes or hours within a single recording.
The open problem is: How can a multimodal system maintain cross-modal alignment and temporal coherence across an hour-length video without losing contextual meaning or introducing factual inconsistencies?
Video has become the dominant form of information in education, medicine, science communication, and professional training. A system that can answer complex temporal queries over long recordings — connecting what was said at minute 12 to what appeared visually at minute 47 — would fundamentally transform how knowledge is accessed and retrieved from video archives.
This problem sits at the intersection of multimedia retrieval, multimodal reasoning, and memory-efficient AI, making it one of the most practically important open challenges in the field.
If there's a problem you'd like to add to this collection, we'd love to include it. Send your contribution using the checklist below as a guide. One email with everything included is ideal.
A complete submission includes:
Consider yourself challenged. Once your entry is live, pass it on: introduce us to 3 researchers you think should contribute next. Name them, connect us, and help the collection grow one problem at a time.