Speech and Music Technology | Athena Research Center

Language English

Spoken language processing is considered as one of the most important technological areas of research and development in the field of human language technologies and signal processing. The recognition, processing and production of speech signal, is a challenging field of research and constitutes a major factor in human-machine interaction, offering many new and significant applications. Main fields of research in this domain are speech recognition, speech synthesis and speech coding. In addition, robust analysis and representation of speech signal is a continuous pursuit and is regarded as a major technological aim internationally, supporting and feeding research and development in many domains. During the last years, ILSP actively contributes to developing methods, systems, resources and tools in the areas of speech synthesis and speech recognition. In this context, research on music technology is of equal importance. ILSP’s research agenda in this field includes music recognition, extraction of high-level music features, music representation and symbolic processing. The ongoing effort and development on the above fields, together with complementary technologies such as speaker indexing and diarization as well as audio mining, constitutes contemporary scenery of multimedia as well as multimodal human-machine interaction which offers many potential applications in areas such as, electronic publishing, electronic education, multimedia, internet, virtual reality and games etc. In this context, ILSP continuously plans and adapts its research and development activities so as to effectively respond to this rapid technological evolution.

Accomplishments so far:

The very fist top quality Text-to-Speech (TTS) system for the Greek language, based on unit selection speech synthesis technology
The very fist parametric Text-to-Speech (TTS) system for the Greek language, based on Hidden Markov Modelling (HMM) speech synthesis technology
Top quality Text-to-Speech (TTS) system for the Bulgarian language, based on unit selection speech synthesis technology
Text-to-Speech (TTS) system for the Greek language using diphone-based speech synthesis technology that lead to the EKFONITIS+ product
Formant Rule-based Text-to-Speech (TTS) system for the Greek language combined with computational intelligence approaches
Robust Speech Recognition engine for the Greek language
Speech recognition for persons with dyslexia combined with a virtual tutor
Audio mining and automatic subtitling
Music recognition system for monophonic instruments
Score matching/following system
System for real-time detection and visualization of poor musical performances
Innovative applications and systems in the context of speech-enabled web content, unit selection Text-to-Speech synthesis system for mobile phone, Greeklish-to-Greek transliteration technology, home-based telecare systems, monitoring of broadcast sector etc.

Current R&D focus:

spontaneous and emotional speech recognition
spontaneous speech search in dialogues
automatic subtitling
robust statistical parametric as well as hybrid concatenative-parametric speech synthesis
expressive and emotional speech synthesis
multimodal speech synthesis
voice adaptation, transformation and conversion
speaker indexing/diarization
analysis and extraction of emotional and expressive features
speech and voice quality assessment and categorization
spectral analysis
robust speech signal representation and coding
intelligent human-machine interaction
design for all, accessibility with ICT
music recognition, extraction of high level music features, music representation and symbolic processing
singing speech synthesis
score matching/following
instrument recognition
development of resources and tools in all the above fields

Institutes:

Institute for Language and Speech Processing