Multimedia Processing

The proliferation of user-generated content in the Web 2.0 era and the convergence of media delivery channels (Web, TV, mobile) shape the current multimedia area posing new challenges to the research community. Today’s devices are equipped with advanced functionalities and capabilities providing users unlimited access to a wealth of shared content. This content incorporates a rich set of media combinations encompassing text, graphics, animation, sound, speech, image and video. Multimedia technology today allows the user to manipulate content in ways not possible in the past. The combination of PCs, mobile devices and networks allows the individual to create, edit, transmit, share, aggregate, personalize and interact with multimedia content in increasingly flexible ways.
Multimedia content needs to be organised and analysed in a structured manner in order to be valuable assets to enterprises, governmental services, and community-based services. Special concern is given to representational issues as also modelling and multimedia analysis and indexing processes.
  • Research at ILSP focuses on the development of methods and tools for content-based analysis and organization of multimedia collections. Key areas of concern are the major components in a typical multimedia architecture:
  • Multimedia harvesting interface
  • Analysis and Indexing interface (audio, video and image analysis tools)
  • Multimedia Access interface (searching, browsing and filtering tools)
  • Personalization and content delivery subsystem (summarization, visualization tools)
Accomplishments so far:
  • award winning algorithms for the segmentation of handwritten document images in text lines and words (ICDAR07, ICDAR09, Handwriting segmentation contest)
  • novel techniques for the speaker diarization task that improves the performance rates
  • tempo extraction and tracking algorithms for transcribing music signal
  • VideoTextReader, a key application for recognition of overlay text in videos
  • prototype for recognition of on-line handwriting mathematical expressions
  • multimedia summarization prototype for different genres
  • SpeakerTracer, a key application for speaker diarization and identification using the audio signal in broadcast news
Current R&D focus:
  • recognition of scrolling text
  • localization and enhancement of scene text in video frames
  • construction of a parallel corpus of on-line and off-line Greek handwriting data
  • recognition of cursive handwriting with hybrid techniques
  • reconstructing mathematical expressions with topology context and expression grammars
  • handwriting recognition intended for digital paper, whiteboards and tablet pcs
  • indexing of handwriting information in ancient and byzantine Greek scripts in historical documents
  • automatic creation of speaker models and incorporation of channel information to increase recognition rates
  • speaker indexing with fusion techniques
  • extraction of tonality and melody features for music similarity and music information retrieval
  • exploration of new time-frequency/scale representations and transformations of musical signals
  • mobile multimedia retrieval applications
  • audio mining applications
Institute for Language and Speech Processing