this is not your sawtooth wave (kineticfactory) wrote,
this is not your sawtooth wave

  • Music:

Text to speech

This would probably go in my blog, only that's temporarily nonexistent, so I'll post it here:

I just looked at AT&T's new text-to-speech demo site. They've now got two non-American English voices, and there's no prize for guessing that they're British (in the artificial-BBC-accent sense). Both voices have plummy Received Pronounciation accents, though the male one ("Charles") breaks up a bit in places and also speaks with a subtly deranged lilt, as if he had, at some time in the past, eaten some BSE-contaminated beef. The female one ("Audrey") sounds a bit better, and not unlike a recorded announcement.

It'd impress me more if they had some more natural British accents alongside the RP ones; perhaps Estuary English, for example, or Mancunian, or even Glaswegian Scots. Or, indeed, non-British accents, such as Irish, South African or New Zealand. Though this is a good start. (Apparently they also have an Indian English accent in the commercial version of the product; I suspect that's because India is a large enough market.)

I can imagine what they'd do for Australian accents: "Norm" (a broad 'Ocker' voice that sounds like Paul Hogan or someone: "Yewbewdymate!") and "Noelene", the female of the species.

One thing that all these new voices miss is an inflectionless, mid-90s-speech-synth sound; all their voices sound vibrantly human, which is good if you're writing phone-based commerce apps or something, though not good if you want a machine-like voice for aesthetic reasons. The best one of those I heard (not too mechanical, yet oddly cold and detached) was on a program which ran on SGI workstations around the mid-90s. (That's the voice I used on "Dear Robot", incidentally.)

On a tangent: I'm not fond of Apple's MacOS text-to-speech engine. For one, the designers overemphasised the way the voice tone goes up and down, whilst leaving the voices themselves sounding rather rough; thus, it still falls into some speech-synthesis analogue of Mori's Uncanny Valley. More fatally, it's inflexible. There's (AFAIK) no way of getting the output of MacInTalk to go anywhere other than the audio output of your Mac. You can't render speech to an AIFF file, let alone to a buffer in a VST plugin or what have you.

On another tangent: I wouldn't mind taking a look at Yamaha's Vocaloid sometime; it's a speech synth geared towards synthesising sung vocals (in English or Japanese), and apparently sounds quite good.

Comments for this post were locked by the author