The Concept
VocaTalk generates personal podcasts that sound like an audio documentary
with sound effects and music in the background.
You copy and paste any content, select music and effects, generate episode,
and download to your iPod, Zune or any mp3 player.
If the content is too large,
VocaTalk will automatically split it into episodes of 50-60 minutes.
VocaTalk uses text-to-speech (TTS) technology to read the text aloud and digital signal processing (DSP)
technology to massage the generated speech.
Listening to a VocaTalk episode is much more comfortable and fun than
raw TTS (text-to-speech) voice.
VocaTalk is designed to make reading and learning experience fun and efficient for people who read and learn a lot, or people who
want to do so but have no time. Students, engineers, teachers, doctors or anyone who wants to keep himself/herself up
to date with information in a fun way can use VocaTalk.
What does it sound like?
A VocaTalk episode is based on whatever you chose it to be. You just give the content to VocaTalk and chose music and effects.
VocaTalk will read the text as it plays the music in the background. VocaTalk will use all available voices
installed on your system randomly so you don't get bored by listening to the same voice especially during long hours of listening. It's
like a documentary audio that is narrated by multiple speakers. VocaTalk will leave periods of silence between paragraphs
to make the listening experience closer to a documentary. The background music makes it also fun and engaging. You can get pretty creative here
and try different genres and see what goes with what type of reading. In order to make the listening even more comfortable, keep the focus
and attract the attention, VocaTalk uses some sound effects and enhancements to the generated mp3. There are a number of
different effects that you can turn on and off. For example positional audio effect moves the sound position of the
speakers smoothly and changes depth, echo and pitch of the sound. So you're always immersed into an environment which
sounds like a big theatre.
What makes VocaTalk unique?
- Most TTS applications let you choose a single voice for an entire text. VocaTalk let's you choose multiple
voices for the same text so you're not bored.
- Most TTS applications just generate mp3. VocaTalk not only generates mp3, but also publishes podcasts
so it's much easier to track the generated episodes. Also, you can just queue your episodes without generating them,
and generate when you want to actually listen.
- No TTS application can put background music. VocaTalk can put background music which creates a whole new experience.
- Ordinary TTS output is 16 to 22 Khz, mono. VocaTalk's output is always 44.1 Khz, stereo (CD quality). This makes it
possible to move the voice position, add stereo music and other effects that is only achievable using stereo sound.
- No TTS application supports brainwave enrainment technology. VocaTalk supports binaural beats and crossfeed modulation
to improve focus and learning, or relaxation.
- Most TTS applications save the generated audio directly into an mp3 file. VocaTalk massages the generated output and adds
cool effects like echo, reverberation, positional audio, frequency modulation, and more. This makes it even more fun and engaging.
- Most TTS applications do not save the original text into a file for you future reference. VocaTalk
saves content in a rich format with images and font styles, allows you to reopen or regenerate the episode,
and embeds the text into mp3, so you can enjoy it on your player.
- Some TTS applications are server based. VocaTalk runs on your computer and can use its full power.
If you have a multicore system, VocaTalk will also make use of multiple cores. You don't have to rely on
internet connection speed, everything is local and private to you.
More features are being planned and will be published on this site soon. All for one purpose: Make listening and learning
fun and enjoyable!
Compare VocaTalk episodes to regular text-to-speech and vote
Following demonstration shows the difference between regular text-to-speech output
and VocaTalk's. This is an extract from a technology article that was originally published at CodeProject
by I. Benian.
Ordinary Text-to-speech
This is a regular text-to-speech output that is generated by ordinary text-to-speech applations. A single voice
reads the whole text continuously.
Download mp3
(The player may start with a few seconds of latency depending on your internet connection speed.)
Mono 16 Khz, Single speaker
VocaTalk episode sample 1
And this is a VocaTalk episode that is generated using background music and other enhancements. Multiple voices
read the text, the voice position smootly shifts and the echo effect gives more realism.
Download mp3
(The player may start with a few seconds of latency depending on your internet connection speed.)
Stereo 44.1 Khz CD Quality, Multiple speakers, Movie Score and Ambient music, Echoes, Wandering Voices
VocaTalk episode sample 2
This is another VocaTalk episode that is generated using techno music in the background and additional voice modulator effect.
Download mp3
(The player may start with a few seconds of latency depending on your internet connection speed.)
Stereo 44.1 Khz CD Quality, Multiple speakers, Techno and Electronic music, Echoes, Wandering Voices, Voice Modulator
Have you noticed the periods of silence in VocaTalk episode? Just like in a documentary, these periods make listening much more comfortable
and gives you a break to digest the content while enjoying the music.
See the full original article 'A Simple Object Collaboration Framework' at CodeProject.