Maaike de Boer (TNO) (2015-06-16 13:45 - 14:30 in ZI-2042)
A common approach in content based video information retrieval is to perform automatic shot annotation with semantic labels using pre-trained classifiers. The visual vocabulary of state-of-the-art automatic annotation systems is limited to a few thousand concepts, which creates a semantic gap between the semantic labels and the natural language query.
This talk is focused on the semantic gap in the field of complex video event retrieval. Typical events are social events (‘birthday party’) and procedural events (‘attempting a bike trick’). The TRECVID Multimedia Event Detection 2014 is used as a dataset to evaluate. In this research, we assume that we do not have any example video for an event.
I will quickly dive into two ways to bridge the gap between the semantic labels of the pre-trained classifiers and the event: