Why Human Transcriptionists Remain Vital

One of the most frequent enquiries I receive in my practice is for transcription services – digital, microtape, standard tape (yes, still tape!). These enquiries are not just from potential clients but from new VAs wanting to ‘get in on’ what looks like a burgeoning niche in the Virtual Assistant industry. Remember just because you can type doesn’t mean you’ll be good at transcription.

The most frequent enquiries are from the medical and legal industries. Authors also need assistance with their book manuscripts. Researchers too, require transcription of interviews or focus groups.

So, here are a few tips and facts to help clients understand the costs of providing transcription services. I won’t discuss rates here because different VAs will charge out differently: some will work out a project rate; others will charge per audio minute and this rate varies widely across the board.

Ensure you check with your VA how they charge for the work so you understand exactly what’s involved. And VAs, understand that just because you can type doesn’t necessarily mean you can be a transcriptionist. To be successful, you need certain personal skills.

A: Transcriptionist -v- Typist

Not all transcriptionists are alike. There are varying typing speeds, varying levels of expertise – both with word processing software and with computer-based player software – and as a result, varying charge out rates.

If you are a client looking for transcription services who actually cares about the resulting product without having to do too much post-transcription checking and editing, you need to find a professional service.

The skills of a transcriptionist vary from a typist. According to the Industry Production Standards (IPS) Guide:

“Tape transcription is a specialised service, very different from general text keyboarding [which] relies on visual processing and can be measured as words or characters per minute; then corrected for accuracy. Transcriptionists however, must rely on aural processing, and the rhythm of the work depends on the person doing the original recording. The keyboarding portion of the tape transcription process includes a certain amount of editing ‘on the fly’ by the transcriptionist – ie paragraphing, insertion of punctuation, capitalisation, correction of grammar (in non-verbatim transcripts) and sometimes aural identification of speakers.”

IPS Guide

Skill Set to Look For

So what should you look for when assessing candidates? According to the IPS a ‘model operator’ for transcription purposes should possess the following skills:

1. Someone with at least 2-3 years of business, office or secretarial experience;
2. Keyboarding speed of around 70 words per minute (wpm);
3. Good language/grammar skills;
4. The software skills to handle the project;


Image: Robot trying to type on laptop highlighting why human typists still matter

5. Someone with a minimum of 2-3 years basic transcription experience;
6. Mastery of advanced language skills, including grammar, punctuation, spelling and sentence structure;
7. Exceptional level of accuracy;
8. Excellent independent judgment and decision making skills;
9. Superior on-screen proofreading and editing abilities;
10. Ability to recognise errors and inconsistencies in dictated material while transcribing;
11. Proficiency in clarification of dictation without altering meaning or style;
12. Hearing acuity and language discrimination skills, including familiarity with and understanding of accents and dialects, and recognition of voice inflections within a document.

I can imagine a few of you shaking your head in disbelief at this list! Indeed, these are the identified requirements of a professional transcriptionist.

If you do not partner with an operator with this skill set then you can be sure that transcription time will be longer, and post-transcription proofing and editing of the document will most likely defeat the purpose of outsourcing the job in the first place.

B: Transcription Time Determinants

It’s important to understand that transcription time is determined by the quality of the audio. Any background noise, accents, multiple speakers, poor recording quality, or a poorly positioned recording device, will increase transcription time.

As a result, audios are classified on a scale of Class 1 to Class 5 based on these factors.

Also, if you have an hour of audio it is NOT going to take an hour to transcribe. Even for someone like me with a typing speed of 116wpm or more. Conversational English is in the vicinity of 200 to 250wpm. Relistening to identify speakers in multiple-speaker audios, or background noise, will slow things down.

The IPS place transcription time for a straightforward, single person, clear audio file (think, dictating a letter) (classified as Class 1) at a ratio of 1:3 to 1:5. This means for every minute of recorded audio it will take approximately 3 to 5 minutes for a ‘model operator’ to transcribe. An hour of audio therefore, will take approximately 3 to 5 hours to transcribe.

The ratio changes based on factors like:

  • complexity of the recording;
  • whether it contains jargon or technical language;
  • if the speaker has an accent; and
  • whether there is any looking up of addresses, internet research required and so on.

This range goes up to 4.8 to 8 hours for an audio file classified as Class 5.

By ensuring that their audio files are recorded in the best possible circumstances, clients can reduce the amount of the final invoice.

Better quality audio = less time to transcribe.

C: What About Speech Recognition Software, Apps and AI?

With the emergence of speech recognition technology like Dragon and now countless apps available across platforms that enable speech to be converted to text, not to mention AI, many clients – and for that matter VAs – wonder whether the transcription industry is dead. I don’t think it is.

Apart from the fact that technology like Dragon is quite expensive, you also need to train it to recognise your accent, your inflections, and speech characteristics. Presumably, over time, the cost of the tech will decrease, and some clients won’t have an issue with training software if they believe it will save them in the long run from having someone else perform the transcription. But I have always argued that tech cannot replace humans when it comes to picking up small idiosyncrasies, and in addition, tech cannot edit on the fly like transcriptionists do – especially if the author makes mistakes in dictation.

I have trialled AI with voice to text – I mean, there’s no point advising clients on its application if I don’t know the capability. In my experience, there is no way you would let the document produced by it go out as is. A real human with the necessary skills gives you a far richer and accurate result. Whilst it may not take 30 seconds, the result should require little if any post-production.

Benefits of a VA over Tech

A VA who is skilled at transcription can make changes to spelling and grammar the author may not have noticed. They can also make corrections if the author makes dictation errors in, say, dates or names, and fix inconsistencies. Tech won’t do this. They can also format the text the way the client wants the final product to look. Currently, tech can only perform basic formatting, and the author must specify the required formatting as they go.

Don’t Forget Post-Production

I have had one client who was happy using Dragon to dictate his book manuscripts, however he could not spare the time to format the manuscript correctly prior to going to his publisher, and he didn’t want to have to proof the work – given that the software did make quite a few errors of transcription. So he needed me to proof the text, edit and format it.

Even if you like to use tech to convert your speech you might consider a VA to help you with the post-production work. This increases your productivity because as they are finalising the document, you are busy producing the next one.

The Final Word …

In terms of whether software will make us redundant one day, I think I’ll leave the final word to Dr Robert Fox PhD, faculty member of Ohio State University, a linguist, expert on acoustic phonics, and a forensic audio analyst. He said:

“Speech recognition is a science, we put a lot of money into it but nothing is as good as the human ear and the human perceptual system in understanding speech, in part because human listeners are also human speakers and there’s a very close connection between those two.”

Dr Robert Fox PhD

So if you’re considering speech recognition or AI as your solution, maybe also give a VA transcriptionist a go to see just what sort of quality work they can produce.

©Lyn Prowse-Bishopwww.execstress.com