Multimedia Processing | Athena Research Center

Language English

The proliferation of user-generated content in the Web 2.0 era and the convergence of media delivery channels (Web, TV, mobile) shape the current multimedia area posing new challenges to the research community. Today’s devices are equipped with advanced functionalities and capabilities providing users unlimited access to a wealth of shared content. This content incorporates a rich set of media combinations encompassing text, graphics, animation, sound, speech, image and video. Multimedia technology today allows the user to manipulate content in ways not possible in the past. The combination of PCs, mobile devices and networks allows the individual to create, edit, transmit, share, aggregate, personalize and interact with multimedia content in increasingly flexible ways.

Multimedia content needs to be organised and analysed in a structured manner in order to be valuable assets to enterprises, governmental services, and community-based services. Special concern is given to representational issues as also modelling and multimedia analysis and indexing processes.

Research at ILSP focuses on the development of methods and tools for content-based analysis and organization of multimedia collections. Key areas of concern are the major components in a typical multimedia architecture:
Multimedia harvesting interface
Analysis and Indexing interface (audio, video and image analysis tools)
Multimedia Access interface (searching, browsing and filtering tools)
Personalization and content delivery subsystem (summarization, visualization tools)

Accomplishments so far:

award winning algorithms for the segmentation of handwritten document images in text lines and words (ICDAR07, ICDAR09, Handwriting segmentation contest)
novel techniques for the speaker diarization task that improves the performance rates
tempo extraction and tracking algorithms for transcribing music signal
VideoTextReader, a key application for recognition of overlay text in videos
prototype for recognition of on-line handwriting mathematical expressions
multimedia summarization prototype for different genres
SpeakerTracer, a key application for speaker diarization and identification using the audio signal in broadcast news

Current R&D focus:

recognition of scrolling text
localization and enhancement of scene text in video frames
construction of a parallel corpus of on-line and off-line Greek handwriting data
recognition of cursive handwriting with hybrid techniques
reconstructing mathematical expressions with topology context and expression grammars
handwriting recognition intended for digital paper, whiteboards and tablet pcs
indexing of handwriting information in ancient and byzantine Greek scripts in historical documents
automatic creation of speaker models and incorporation of channel information to increase recognition rates
speaker indexing with fusion techniques
extraction of tonality and melody features for music similarity and music information retrieval
exploration of new time-frequency/scale representations and transformations of musical signals
mobile multimedia retrieval applications
audio mining applications

Institutes:

Institute for Language and Speech Processing