That Thing About Accents

TypeOlogy member Howard David Ingham tackles one of our most frequently asked frequently asked questions…

If you’ve been through the TypeOlogy site and read our FAQ (and obviously you’ve read our FAQ), you will have seen some of the variables that affect the pricing of the transcriptions we do. A large number of speakers, for example, or a poor quality sound file will influence the cost of work, as you might expect. The one that has  inspired the most conversation among our clients, freelancers and members, however, is the presence of speakers with accents that are difficult to parse.

That’s not surprising! It’s the one that needs the most explanation. And it’s honestly the one that we’ve had to think the hardest about.

OK, it’s a simple enough principle to begin with: if the typist struggles with the accent of the person talking, they’re going to find it harder to finish the transcript and it’s going to take them longer. And they need to be compensated fairly for their time.

But wait. Isn’t that a bit… discriminatory?

This is the part where I say, “Yeah, maybe?”

It’s a sticky question, because we have to admit that accents aren’t just a neutral characteristic of a person’s speech, they’re also a marker of nationality, social class, ethnic identity and other things. Generally, people who are told that they have impenetrable accents tend to be poor and/or non-white and/or not a first-language English speaker. Sometimes those accents place them in a community, supply them with an identity. 

Accents are often used as a way to exclude people, and accusing a person of impenetrability can be a bad-faith way to make their lives more difficult. I’m reminded of my own experience here, of how as a teenager I expunged my regional accent to hide the fact that I was from a deprived social background, and to reduce the flak I kept getting from my significantly more privileged peers. Years later I would return to my hometown and find that local people I met were unable to believe that I was born there. In losing my accent, I lost my identity. In some ways, I deprived myself of a home.

Another way that people with strong local/ethnic/working-class accents get punished is how they get gatekept out of things, because people in the “default” don’t understand them, aren’t willing to go to the extra effort of understanding them, or penalise them for going to that effort.   

And the thing about marginalised identities – and one of the lovely things about working for TypeOlogy is how we primarily employ people with marginalised identities – is that having one doesn’t stop you being prejudiced against other people. You don’t get points for your own experience of marginalisation (no matter what the DWP might think).

As an organisation committed to those co-operative principles of equity, equality, honesty, openness and social responsibility, we have to grapple with this question. And in the last couple of months, we’ve talked about it quite a bit.

The accents people give a pass to (and why we don’t)

One really important thing to note is that quite a few people we’ve transcribed who our typists have found a bit hard to parse don’t have the sort of accent that is normally counted as marginalised. It’s just that no one notices when someone is hard to understand and they’re wealthy. Upper-class British accents are a classic example of this. Hardly anyone ever calls out members of the aristocracy for being really hard to understand. This is because they’re rich and powerful. 

But does the size of a person’s hedge fund really make it easier to transcribe someone? No. No, it does not. We’ve had transcripts like that, and we have charged the increased rate for them, because they’re still harder to listen to.

(Yes, if you’re old enough to remember The Fast Show, that means you’re as old as me and I’m sorry)

Wait a minute, though, who’s actually getting discriminated against?  

The thing about a discriminatory act is that it’s directional, an action performed against a person. But most of the time, if not all of the time, we’re not hired by the interviewees. We’re being paid by the interviewers (or the institutions they work for) and often they’re not the ones with the difficult-to-parse accents. We’re simply given a file to transcribe, which we do, and then we invoice the client or the client’s institution, and that invoice depends on the difficulty of the work done. We’re not communicating with the interviewees most of the time. 

This brings up the point that our commitment to equity also translates into a commitment to compensating our typists for their work. Our typists, both members and freelancers, really do put in the effort to understand, and to provide the best and most accurate transcripts a professional can. 

Occasionally, we all receive a file that for whatever reason we have to work harder to interpret. And the key is working harder. Our typists have a commitment to come to the work we do with no presuppositions as to who’s talking, but if it’s taking longer, we’re going to pay them more.

And then there’s experience

Now a lot of this is down to the typist. Some of our typists find some accents a lot easier to handle than others, simply due to personal experience. 

For instance, I’ve spent years living with international students, so I don’t particularly struggle with the accents of speakers from South and East Asia or central Africa. Unlike everyone else in the co-operative, however, I’ve never lived north of Swansea, so I struggle a bit with English accents from anywhere North of Birmingham, while most of my colleagues hear those pretty much every day. Other members of the co-op have a bit of a time handling accents from Wales (not a problem for me, obviously), and have much less trouble than me with, for example, American or European accents.

This is important for us to know and communicate with each other. Part of our conversation going forward has led us to now have documentation of who finds what accents easiest to transcribe, which means we can assign work effectively and more efficiently, according to everyone’s strengths.  

Doing better

We will carry on trying to do better, and it’s a conversation we’re keen to continue. We’re always ready to talk about our rates, and we are willing to negotiate them,  particularly if a project aligns with our ethical stance. For the time being, though, we charge for difficult to understand accents for the simple reason that our typists work harder to transcribe them. They do everything they can to make sure our clients’ interviewees are fairly represented in text. We return the favour by paying them fairly for their work.

Why Automated Transcripts Aren’t Less Work (Yet)

In the first of a new series, co-op member Howard David Ingham explains why we aren’t quite ready to submit to our robot overlords.

Are we worried that the growing availability of cheap AI speech-to-text software might put us out of a job? 

You won’t be surprised that we’ve heard that one a lot recently.

Engineers have been working for a whopping 70 years to get a computer to write down what a person is saying without anyone else having to listen. It’d be interesting to get into why this is a thing and who’s paying for it (a big shout out to everyone listening at GCHQ, by the way). But let’s stick to the subject: AI transcription utilities, powered by machine learning, really are getting better and better. I can see a time in the near future when an AI transcription service might be as good as the “real thing.” 

We’re in a world where you can even do it on your phone. In fact, I dictated that last paragraph on my phone while walking the dog, because I was curious to see how good it was. So are we old school transcribers going to go the way of the triceratops and the typesetter? The simple answer is: not for a while yet, and probably not entirely. 

OK, then, so AI isn’t going to put us out of a job. So it’s going to make our jobs easier, right? Well, no, that’s not really true either. 

Let’s look more closely at both of those questions.  

Let’s talk about Intelligent Verbatim

Like a lot of transcription services, the folks at TypeOlogy use the Intelligent Verbatim method. When you transcribe using Intelligent Verbatim, you do a bit of cleaning up as you go. You skip the ums and ahs and the occasional “y’know” or “I mean”, cutting the words we inevitably repeat while we’re getting our thoughts in order. Then you fix the punctuation, because people don’t actually punctuate their speech. And all of this depends upon interpreting the speech we’re transcribing in a way that’s sensitive and appropriate. Right now, AI just isn’t very good at that. It’s starting to get better at it – for example, you can set up some online services to remove ums and ahs automatically – but it’s still hit and miss, and when people are talking at length, especially when it’s an unscripted conversation, the result is often a bit of a mess. 

To demonstrate, below is the raw transcript of the preceding paragraph, read out clearly and at a normal speaking pace, and put through one of the better online AI speech-to-text services (one I do actually use from time to time): 

OK, that isn’t terrible, but it is still going to take a bit of cleaning up. Now imagine what that’d look like with the stops, starts, and interjections of unrehearsed, unread speech. 

The proof is in the word salad   

We’ve noticed an increasing number of clients coming to us with automated transcripts, asking if we’ll edit them to make them fit for purpose. Surely a half-decent transcript is better than no transcript at all? Doesn’t that make things a lot easier?

Well, that depends on who’s talking. Audio from an experienced, trained orator (a barrister or politician, perhaps) will come out better than someone with a regional accent describing personal experiences in an informal conversation, for example. This is another problem with AI transcription: it remains less accessible for people who aren’t provided with education or certain class markers. But even your very best case is probably going to look something like the text that my example generated. 

Isn’t that still better than not having it? Crucially, this really depends on your skill set. The fact is, an automated transcription requires a different set of skills to work on. 

An experienced transcriber with the right equipment can do a full transcription without really stopping typing, just occasionally taking their foot off the pedal while they catch up with the audio. 

But a pre-produced transcription is different. No matter how perfect the transcription is, there isn’t currently an AI that can do Intelligent Verbatim. You’ve got to stop and start the audio while you separate out the bits where the AI got the speakers mixed up. You need to fix the punctuation. You have to remove the “yeahs” and the “OKs” and all the other phatic utterances people unconsciously make while they’re listening. You’ll have to go through and sort out the proper names that the AI just didn’t know, and occasionally streamline some phrasing. And from time to time some background noise, or two people talking at the same time, or just a strong regional accent will generate a nice tasty word salad. 

No matter how accurate the AI output, the task of correcting it stops being a transcription job and becomes much more of an editorial and proofing job. And on top of that, it’s one that needs you to be listening to the audio at the same time. As a result, the time a skilled editor winds up spending on it works out roughly the same as a transcriber would take typing it up from scratch – sometimes even longer. 

This is why, as members of an ethical co-operative that believes in compensating its workers fairly, we don’t currently offer a discount for editing automated transcripts. As AI transcriptions improve, this might change, but for the moment, the technology still has a way to go. 

This doesn’t mean that AI services aren’t a valuable tool. They will definitely change how we work in years to come, but they’re still not at the stage where they make our lives easier, let alone put us out of jobs. They’re just another tool, and they create different challenges.