So far, localization and description of these patterns is currently limited to micro-studies due to the involved extremely high manual annotation effort.
We therefore pursue two main objectives: 1) creation of a standardized annotation vocabulary to be applied for semantic annotations and 2) semi-automatic classification of audio-visual patterns by training models on manually assembled ground truth annotation data. The annotation vocabulary for empirical film studies and semantic annotations of audio-visual material based on Linked Open Data principles enables the publication, reuse, retrieval, and visualization of results from film-analytical methods. Furthermore, automatic analysis of video streams allows to speed up the process of extracting audio-visual patterns.
This paper will focus on describing the semantic data management of the project and the developed vocabulary for fine-grained semantic video annotation. Furthermore, we will give a short outlook on how we aim to integrate machine learning into the process of automatically detecting audio-visual patterns.