About this Edition

The inaugural MMI workshop brought together students and international researchers for a half-day of invited talks on vision-language pretrained models, a decade of lifelog access and retrieval, and knowledge graphs for multimedia data. Three speakers from NII (Japan) and DCU (Ireland) joined NJIT for the first edition, marking the start of the series.

Workshop Chair

Prof. Vincent Oria

Role: Opening Remarks · Workshop Chair

Prof. Vincent Oria is a Professor of Computer Science and Chair of the Computer Science Department at NJIT. He received a diplôme d'ingénieur from the Institut National Polytechnique Houphouët-Boigny in Yamoussoukro, Côte d'Ivoire, in 1989, and a Ph.D. in Computer Science from the École Nationale Supérieure des Télécommunications (Telecom-ParisTech), Paris, France, in 1994.

His research interests include multimedia databases, spatiotemporal databases, search in high-dimensional spaces, and recommender systems. He served as General Co-Chair of the 2022 ACM International Conference on Multimedia Retrieval (ICMR) and Program Committee Co-Chair of ACM Multimedia 2022, and is the recipient of the 2015 ACM SIGMOD Test-of-Time Award.

Google Scholar · Talk: 2014 Research Projects · CBS News

Invited Speakers

Prof. Shin'ichi Satoh

Talk: Possibilities and Limitations of Vision-Language Pretrained Models

Abstract: Vision-language pretrained models, such as CLIP and its variants, are widely used by the computer vision, natural language, and multimedia communities. They are very powerful, especially for semantic analysis of visual contents. On the other hand, several drawbacks have been pointed out by recent research. In this talk, vision-language pretrained models are first briefly explained, followed by a discussion of their limitations. Issues related to fine-grained image recognition and image similarity retrieval are examined in particular.

Prof. Shin'ichi Satoh is a Professor at the National Institute of Informatics (NII) in Tokyo, Japan. He received his BE degree in Electronics Engineering in 1987, and his ME and PhD degrees in Information Engineering in 1989 and 1992 at the University of Tokyo. He joined NII in 2004 as a full professor.

His research interests include image processing, video content analysis, and multimedia databases. He leads the video processing project at NII, addressing video analysis, indexing, retrieval, and mining for broadcast video archives. He was a visiting scientist at the Robotics Institute, Carnegie Mellon University, from 1995 to 1997.

Google Scholar · Talk: ACM Multimedia

Prof. Cathal Gurrin

Talk: Learnings from A Decade of Lifelog Access & Retrieval

Abstract: Over the past decade, lifelogging has evolved from a niche research topic into a vibrant interdisciplinary field at the intersection of computer vision, information retrieval, and human-computer interaction. This talk reflects on ten years of research into interactive lifelog retrieval, drawing insights from the ACM Lifelog Search Challenge (LSC) and other major initiatives that have shaped the community. Key milestones in multimodal lifelog datasets, advances in semantic indexing and search, and the emergence of novel user interfaces for interactive retrieval are discussed, along with a forward-looking perspective on the opportunities ahead.

Prof. Cathal Gurrin is a Full Professor in the School of Computing at Dublin City University (DCU) and Deputy Director of the ADAPT Centre.

His research focuses on lifelogging, personal analytics, and multimedia information retrieval, with an emphasis on using wearable sensors and AI-driven data analysis to build assistive technologies that enhance human memory, health, and productivity.

Prof. Gurrin is widely recognized as a pioneer in lifelogging, having continuously captured a personal digital record of his daily life since 2006 using wearable devices — making him likely the longest continuous wearer of such a device in the world. His personal archive has grown to over 18 million images, generating roughly a terabyte of personal data per year. Using information retrieval algorithms, his team segments this archive into life events such as eating, driving, and social interactions, with new events recognised daily through machine learning. As he describes it: "If I need to remember where I left my keys, or where I parked my car, or what wine I drank at an event two years ago... the answers should all be there."

He has led and contributed to numerous international research initiatives and is the founder and co-organizer of major benchmarking efforts such as the Lifelog Search Challenge, helping advance global research in multimedia retrieval and human-centered AI.

Google Scholar · Talk: Introduction to Lifelogging · Talk: Lifelogging Research

Prof. Luca Rossetto

Talk: Knowledge Graphs and Multimedia — Things we can do, things we can't, and how to change the latter

Abstract: Knowledge Graphs are an effective mechanism for representing knowledge as a structure of interconnected facts. They work especially well for information that can be captured using a short textual label or a literal value. For information best represented in different modalities — such as visual or aural — these graphs experience several limitations. Multimodal Knowledge Graphs commonly incorporate multimodal information by linking to external documents which are opaque to a query engine and hence only of limited use in complex graph queries. This talk presents an overview of the current state of multimodal knowledge graphs and introduces the MediaGraph concept, which aims to make multimodal information into first-class citizens in knowledge graphs.

Prof. Luca Rossetto is an Assistant Professor at the School of Computing at Dublin City University. His research focuses on managing, analyzing, and retrieving multi-modal data. He is one of the core developers of the open-source multimedia retrieval engine vitrivr and co-creator of the Distributed Retrieval Evaluation Server used for interactive multimedia evaluations.

More recently, his research focuses on the intersection between Knowledge Graphs and Multimedia Data, with the aim of seamlessly integrating multimodal information into graph structures.

Google Scholar

Schedule

09:00 – 10:00	Welcome & Coffee Refreshments and informal gathering
10:00 – 11:00	Invited Talk Prof. Shin'ichi Satoh (NII, Japan) Possibilities and Limitations of Vision-Language Pretrained Models
11:00 – 12:00	Invited Talk Prof. Cathal Gurrin (DCU, Ireland) Learnings from A Decade of Lifelog Access & Retrieval
12:00 – 13:00	Invited Talk Prof. Luca Rossetto (DCU, Ireland) Knowledge Graphs and Multimedia
13:00 – 14:00	Lunch

Venue

NJIT Campus, GITC 4402, Newark, NJ

Organizing Committee

Vincent Oria (NJIT, USA)
Shin'ichi Satoh (NII, Japan)
Mohammad Dindoost (NJIT, USA)

MMI 2025