asd asdsa f
Authoring Tools, Technology & Devices

The Text-to-Speech Conundrum


As an Instructional designer, I try to provide my clients with the best of all worlds when developing online learning modules. One of the many features is audio.

Audio has always been a way to enhance the learner experience, improve retention and ensure your courses are accessible to more people. But one of the many things that has not changed is the bias towards text-to-speech or computer generated voices.

If you are a Star Trek fan you will always remember the computer voice that answered any and all questions the crew asked. This was a very robotic voice, providing information but with no nuance.

Today, many organizations still require a real person to record the audio portion of eLearning modules, even though TTS has come a very long way.

Real person audio has some benefits but also some drawbacks, let’s look at these:

  • It is realistic with all inflections that a person can provide
  • It takes time, money and multiple recordings to get it right
  • If a change is required, then you have to reschedule the same person to complete the revisions
  • Costs of a professional voice over can be up to $250 per hour or more plus studio/ recording space
  • If you have used an employee to complete this audio, and they leave the organization, you may be required to rerecord ALL of a course because of a minor change

Text-to-speech has improved dramatically and we now have SAPI5 voices available that provide near real life audio with some inflection for use. So why would we want to use TTS versus a real person:

  • Lower costs once software and voices are purchased
  • No scheduling or recording studio time or cost of a voice over professional
  • Changes regardless how minor can be made quickly and easily
  • No worry if an employee leaves your organization, unless of course its the person who works on ┬ádeveloping your courses
  • If you purchase about 4 voices (2 female and 2 male) you can reduce audio fatigue. Audio fatigue happens when one individual in your organization becomes the de facto voice but if they are used over and over, this will fatigue the listener

So I want you to think about and/or listen to the new versions of computer generated voices available and see for yourself how these can be used to create engaging audio for your eLearning modules.

I introduced every new client I have to the possibility of text-to-speech so they can see or better yet hear for themselves the value it offers.


In my next post, I will discuss the kinds of software and voices available, and why I use the companies I use.

About the author /