Early in the millennium, when voice-recognition technology was still immature, my employer (in TV post-production) aptitude-tested two dozen volunteers to produce texts from TV — a bit like simultaneous translation, but speaking all the parts in a drama or debate, including punctuation, and editing slightly as you go — all whilst listening to the next part. That’s the tricky bit, like rubbing your tum while patting your head.
This was just a new production method for a job we were all experienced at.
When managers emailed us the result of the aptitude testing — the names of six selected for training — I was out of the room. I returned to find a whole chain of comments between a clique of colleagues: as well as adding their own group to the address line, someone had hit reply-all anyway, and everyone on shift at that moment got it. Several more emails from friends went “Quick! Archive your emails now, before you read anything!”
These cliques were the same people who had managed to leave several desktop computers in our hot-desk workplace permanently signed in to their messaging app, so it wasn’t a surprising blunder, and nor was the indignant “OMG!!!” response to my being selected much of a shock: I knew they disliked me as a person. But the blatancy of their snobbery was an eye opener.
They were outraged that someone with speech as “common” and “gutteral” (pun intended) as mine could have been considered adequate at all to use voice-recognition software, and stated that they were stunned that I’d been put forward ahead of most of them. They literally were disbelieving that I could’ve passed at all. Especially since the test had been administered and scored by people who came up from London. The clique were anticipating a great denouement, when the voice-recognition software would melt down on contact with my “terrible” accent.
I have a bog-standard west-of-Scotland accent — a range of registers, like most people in the surrounding conurbation, but defaulting to working-class rather than the “well-spoken” or “telephone voice” that schoolteachers used to try to scold us all into. I speak like most of the population here, and like most of the clique’s own parents and many of their friends. But in our workplace, as in most big organisations in Scotland, it’s the accent of the majority of security guards, canteen staff, cleaners, maintenance crews…but few of the people they maintain.
We had all been briefed on the nature of the output software that those who “went forward” would be using — you would build up a corpus of your own speech through example and correction to make a “model” for the voice software to recognise your own voice. All that matters is that your diction is acoustically distinct.
Handily, the standard working class west of Scotland accent has fully-sounded (guttural, if you like) consonants and more vowels than Standard English. Wales and whales are quite distinct; so are eyes/ice, boot/boat, four/for, and or/oar/awe. “Girls” is a diphthong and can’t be misheard as gulls, gills or gales. And for homophones, the Scottish teacher’s-bane childhood accent easily distinguishes they’re from their (thur) and there (thai-ur).
The weird thing is that, of all people, my colleagues were educationally and occupationally selected to know all this better than just about anyone who doesn’t actually work as a linguist: but even months later, they still seemed mystified that “the machine” discriminates acoustically, not socially.
When two of the clique started their training a few weeks after me, (yes, one was the instigator of the OMG slag-fest) both had more problems than almost anyone with getting consistent voice-recognition. When you are under pressure, or tired, or unselfconscious, your speech tends to revert to what is most ingrained — the speech you first learned. Both individuals had been through years of elocution lessons in their teenage years. Each would teach the machine a beautifully modulated exemplar in their acquired pronunciation, and as the situation got more demanding — eventually live on-air — it would broaden into speech that most people from outside Scotland wouldn’t be able to tell from my own “dreadful” accent. But they hadn’t taught that to their voice-recognition software. They weren’t hearing it themselves.
My revenge was to be as helpful as possible. (I never mentioned that email, though some other colleague surely must have at sometime.) Several people, including me, did try to explain that you have to be “honest” with the software, but these two continued to have a hard time with it.
More distinctly-enunciated consonants. No intrusive-R or missing R. At least two sets of vowel distinctions not in Standard British English. Plenty of acoustically distinct sounds for the software to latch onto: that’s my accent. From what I hear, over a decade later, Siri and Alexa still can’t cope with un-anglicised Scottish or Irish speech. That can only be because we’re literally not worth hearing.