Speech and Music Technology

Spoken language processing is considered as one of the most important technological areas of research and development in the field of human language technologies and signal processing. The recognition, processing and production of speech signal, is a challenging field of research and constitutes a major factor in human-machine interaction, offering many new and significant applications. Main fields of research in this domain are speech recognition, speech synthesis and speech coding. In addition, robust analysis and representation of speech signal is a continuous pursuit and is regarded as a major technological aim internationally, supporting and feeding research and development in many domains. During the last years, ILSP actively contributes to developing methods, systems, resources and tools in the areas of speech synthesis and speech recognition. In this context, research on music technology is of equal importance. ILSP’s research agenda in this field includes music recognition, extraction of high-level music features, music representation and symbolic processing. The ongoing effort and development on the above fields, together with complementary technologies such as speaker indexing and diarization as well as audio mining, constitutes contemporary scenery of multimedia as well as multimodal human-machine interaction which offers many potential applications in areas such as, electronic publishing, electronic education, multimedia, internet, virtual reality and games etc. In this context, ILSP continuously plans and adapts its research and development activities so as to effectively respond to this rapid technological evolution.
Accomplishments so far:
  • The very fist top quality Text-to-Speech (TTS) system for the Greek language, based on unit selection speech synthesis technology
  • The very fist parametric Text-to-Speech (TTS) system for the Greek language, based on Hidden Markov Modelling (HMM) speech synthesis technology
  • Top quality Text-to-Speech (TTS) system for the Bulgarian language, based on unit selection speech synthesis technology
  • Text-to-Speech (TTS) system for the Greek language using diphone-based speech synthesis technology that lead to the EKFONITIS+ product
  • Formant Rule-based Text-to-Speech (TTS) system for the Greek language combined with computational intelligence approaches
  • Robust Speech Recognition engine for the Greek language
  • Speech recognition for persons with dyslexia combined with a virtual tutor
  • Audio mining and automatic subtitling
  • Music recognition system for monophonic instruments
  • Score matching/following system
  • System for real-time detection and visualization of poor musical performances
  • Innovative applications and systems in the context of speech-enabled web content, unit selection Text-to-Speech synthesis system for mobile phone, Greeklish-to-Greek transliteration technology, home-based telecare systems, monitoring of broadcast sector etc.
Current R&D focus:
  • spontaneous and emotional speech recognition
  • spontaneous speech search in dialogues
  • automatic subtitling
  • robust statistical parametric as well as hybrid concatenative-parametric speech synthesis
  • expressive and emotional speech synthesis
  • multimodal speech synthesis
  • voice adaptation, transformation and conversion
  • speaker indexing/diarization
  • analysis and extraction of emotional and expressive features
  • speech and voice quality assessment and categorization
  • spectral analysis
  • robust speech signal representation and coding
  • intelligent human-machine interaction
  • design for all, accessibility with ICT
  • music recognition, extraction of high level music features, music representation and symbolic processing
  • singing speech synthesis
  • score matching/following
  • instrument recognition
  • development of resources and tools in all the above fields
Institute for Language and Speech Processing