this is not your sawtooth wave (kineticfactory) wrote,
this is not your sawtooth wave

  • Music:

Text to speech

This would probably go in my blog, only that's temporarily nonexistent, so I'll post it here:

I just looked at AT&T's new text-to-speech demo site. They've now got two non-American English voices, and there's no prize for guessing that they're British (in the artificial-BBC-accent sense). Both voices have plummy Received Pronounciation accents, though the male one ("Charles") breaks up a bit in places and also speaks with a subtly deranged lilt, as if he had, at some time in the past, eaten some BSE-contaminated beef. The female one ("Audrey") sounds a bit better, and not unlike a recorded announcement.

It'd impress me more if they had some more natural British accents alongside the RP ones; perhaps Estuary English, for example, or Mancunian, or even Glaswegian Scots. Or, indeed, non-British accents, such as Irish, South African or New Zealand. Though this is a good start. (Apparently they also have an Indian English accent in the commercial version of the product; I suspect that's because India is a large enough market.)

I can imagine what they'd do for Australian accents: "Norm" (a broad 'Ocker' voice that sounds like Paul Hogan or someone: "Yewbewdymate!") and "Noelene", the female of the species.

One thing that all these new voices miss is an inflectionless, mid-90s-speech-synth sound; all their voices sound vibrantly human, which is good if you're writing phone-based commerce apps or something, though not good if you want a machine-like voice for aesthetic reasons. The best one of those I heard (not too mechanical, yet oddly cold and detached) was on a program which ran on SGI workstations around the mid-90s. (That's the voice I used on "Dear Robot", incidentally.)

On a tangent: I'm not fond of Apple's MacOS text-to-speech engine. For one, the designers overemphasised the way the voice tone goes up and down, whilst leaving the voices themselves sounding rather rough; thus, it still falls into some speech-synthesis analogue of Mori's Uncanny Valley. More fatally, it's inflexible. There's (AFAIK) no way of getting the output of MacInTalk to go anywhere other than the audio output of your Mac. You can't render speech to an AIFF file, let alone to a buffer in a VST plugin or what have you.

On another tangent: I wouldn't mind taking a look at Yamaha's Vocaloid sometime; it's a speech synth geared towards synthesising sung vocals (in English or Japanese), and apparently sounds quite good.

  • Get Zucked

    It looks like Facebook ads are about to get much more obnoxious. We're talking huge, bandwidth-sucking full-motion video ads along the side of your…

  • Lyrics quiz

    It has been a while since I posted one of these. Below are 10 lyrics from songs. For each one, if you know the artist and song title, post them in…

  • Laura Macfarlane/Hong Kong In The 60s/Hissing At Swans

    Last night, I made a return to putting gigs on. So far, a once-off, though there may well be more gigs in the future. I put on a gig by Laura…


Comments for this post were locked by the author