Log in

No account? Create an account

anatomy · of · distance

Text to speech

Recent Entries · Archive · Friends · Profile

* * *
This would probably go in my blog, only that's temporarily nonexistent, so I'll post it here:

I just looked at AT&T's new text-to-speech demo site. They've now got two non-American English voices, and there's no prize for guessing that they're British (in the artificial-BBC-accent sense). Both voices have plummy Received Pronounciation accents, though the male one ("Charles") breaks up a bit in places and also speaks with a subtly deranged lilt, as if he had, at some time in the past, eaten some BSE-contaminated beef. The female one ("Audrey") sounds a bit better, and not unlike a recorded announcement.

It'd impress me more if they had some more natural British accents alongside the RP ones; perhaps Estuary English, for example, or Mancunian, or even Glaswegian Scots. Or, indeed, non-British accents, such as Irish, South African or New Zealand. Though this is a good start. (Apparently they also have an Indian English accent in the commercial version of the product; I suspect that's because India is a large enough market.)

I can imagine what they'd do for Australian accents: "Norm" (a broad 'Ocker' voice that sounds like Paul Hogan or someone: "Yewbewdymate!") and "Noelene", the female of the species.

One thing that all these new voices miss is an inflectionless, mid-90s-speech-synth sound; all their voices sound vibrantly human, which is good if you're writing phone-based commerce apps or something, though not good if you want a machine-like voice for aesthetic reasons. The best one of those I heard (not too mechanical, yet oddly cold and detached) was on a program which ran on SGI workstations around the mid-90s. (That's the voice I used on "Dear Robot", incidentally.)

On a tangent: I'm not fond of Apple's MacOS text-to-speech engine. For one, the designers overemphasised the way the voice tone goes up and down, whilst leaving the voices themselves sounding rather rough; thus, it still falls into some speech-synthesis analogue of Mori's Uncanny Valley. More fatally, it's inflexible. There's (AFAIK) no way of getting the output of MacInTalk to go anywhere other than the audio output of your Mac. You can't render speech to an AIFF file, let alone to a buffer in a VST plugin or what have you.

On another tangent: I wouldn't mind taking a look at Yamaha's Vocaloid sometime; it's a speech synth geared towards synthesising sung vocals (in English or Japanese), and apparently sounds quite good.
Current Music:
Saint Etienne - The Chemicals
* * *
* * *
On September 30th, 2003 10:15 am (UTC), tony_laetrile commented:
I have been playing with this all day long. Thank you.
Replies Frozen · Thread
* * *
[User Picture]
On October 2nd, 2003 12:36 pm (UTC), addedentry commented:
Accents beyond RP, beyond the Commonwealth, beyond English: the speech accent archive, with Creative Commons licence and four synthesized voices from Mac System 8.5 to boot.
Replies Frozen · Thread
[User Picture]
On October 2nd, 2003 11:05 pm (UTC), kineticfactory replied:
That's still just audio recordings, not actual data that can be used to synthesise speech in the accent in question. What the world needs is a way of abstracting speech synth voices from accents, perhaps storing the accent data in an XML document or something, and allowing voices to be used with different accents. (A good actor can use her voice to speak with different accents; why shouldn't a TTS program be able to do the same?)

Replies Frozen · Parent · Thread
* * *

Previous Entry · Share · Flag · Next Entry