IVONA

From Wikipedia, the free encyclopedia

The neutrality of the style of writing in this article is questioned. Please see the discussion on the talk page.(April 2008)

IVONA

IVONA visualisation
Developed by	IVO Software
Initial release	2005
Written in	C/C++
OS	Cross-platform
Available in	English / Polish / Romanian / more comming soon
Genre	Text-To-Speech
License	Commercial
Website	www.ivona.com

IVONA is a multi-lingual speech synthesis system developed at IVO Software. It offers a full text to speech system with various APIs.

1 Inside IVONA
- 1.1 Unit selection synthesis
2 Generated speech quality
3 Voices and languages
4 System compatibility
5 See also
6 References
7 External links

[edit] Inside IVONA

IVONA text-to-speech system was described at Blizzard Challenge 2006. ^[1] and Blizzard Challenge 2007 (special version for Blizzard Challenge). ^[2] It is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.

[edit] Unit selection synthesis

IVONA uses Unit Selection with Limited Time-scale Modyfication (USLTM) described in ^[3]. Unit selection synthesis uses large databases of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones, syllables, morphemes, words, phrases, and sentences. The division into segments is done using a specially modified speech recognizer.^[4] An index of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, position in the syllable, and neighboring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection).

Unit selection provides the greatest naturalness, because it applies digital signal processing (DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural and IVONA avoids that. The output from IVONA TTS is often indistinguishable from real human voices.

[edit] Generated speech quality

IVONA Text To Speech System received the highest Mean Opinion Score (MOS) at the prestigious scientific contest Blizzard Challenge 2007 in Bonn, Germany. The sentences read out by IVONA were evaluated by experts, a group of British and American students and volunteers recruited via the Internet. Average mean opinion score for IVONA was the highest (3.9 points) from all speech synthesizers. A real person’s recording scored 4.7.^[5]

IVONA was also evaluated at Blizzard Challenge 2006 in Pittsburgh, USA and received best Mean Opinion Score (MOS) provided by Speech Experts and Undergraduates for full database results.^[6]

[edit] Voices and languages

IVONA speaks in three languages (US-English, Romanian and Polish) and with four voices (Jennifer, Carmen, Ewa, Jacek).

[edit] System compatibility

IVONA is compatible with Windows and Unix based systems.

[edit] See also

[edit] References

[edit] External links

IVONA TTS on-line.
See IVONA TTS in action.
Expressivo Text Reader application voiced by IVONA TTS.
Free web service say.expressivo.com - send and publish prompts spoken by IVONA TTS voices.
Festvox - advancing speech synthesis project.

Hidden category: Articles with minor POV problems from April 2008

See also ebooksgratis.com: no banners, no cookies, totally FREE.

IVONA

From Wikipedia, the free encyclopedia

Contents

[edit] Inside IVONA

[edit] Unit selection synthesis

[edit] Generated speech quality

[edit] Voices and languages

[edit] System compatibility

[edit] See also

[edit] References

[edit] External links

Views

Navigation

Interaction

Search