Well, that remains to be seen. The question is if I can publish an audiobook using AI?
Good question, I’ve written a few books, and published them both in paperback and ebook format, but never given the idea of publishing them as audiobooks much thought. While I’ve always been an avid reader of printed books and more recently also of ebooks, audiobooks never really became my thing. I tried one of them once a long time ago but never managed to finish it. Mind you, that was in the age when CD’s had just become fashionable.
Recently, a friend of mine, JP from Podverkstan, reported on social media about his experience of making an audiobook by using an AI voice. He had previously produced a few books that were read by a physical person, but this one was his first attempt of using an artificial voice. In a discussion following his reporting, he proposed to try one of my books. As my question if AI voices in American English were available was promptly confirmed, and we both fancy an interesting experiment, a date for this experiment was quickly set.
The process of converting an existing text to audio can be split up into a number of steps:
- Select a voice suitable for the text
- Prepare the text for the AI voice
- Create the audiobook
- Check the audiobook for errors
- Correct any errors
- Publish the audiobook
Well, you may say, that sure sounds simple! And indeed, as with most complex things, they tend to sound much simpler when split up into their basic elements. Just look at spaceflight, that’s essentially nothing more than a bunch of aluminum parts and some electronics…and a couple of rocket engines of course!
For this experiment, my experience in handling and fiddling with sound files is virtually zero. To make matters worse, I’ve never done anything remotely audiobookish. Luckily, my friend JP turned out to be extremely handy with both.
The book:
As for the book, that’ll be my contribution. One of my books ‘The Big Day‘, is a short story and just shy of 80 pages, so that might just provide us with a manageable workload for the day
The voice:
JP and I spent an hour checking out all AI voices available for American English at Narakeet. For reading the book, we settled for a male voice. As for the introduction and final word, a female voice was chosen. We did try a female voice for the book, but it did not get the sound that I was looking for. So, Ronald’s AI voice it would be.
Some of the voices at Narakeet were very good, but others were ok but did not seem fitting for the book – according to JP and me of course. Some voices sounded great for a non-fiction book, which mine obviously is not. Obviously, this is a personal choice, you might choose another voice than we did.
The tool:
The tool we used for our experiment, Narakeet, can be used for reading small and large texts. You can type a short text, or paste in a text, press the button, and the AI voice starts talking. Alternatively, you can upload a suitable document and ask the AI to read it for you. Note that having the AI talking is for free, but when you want to download the audio file you have to pay.
Text preparation:
We did some experimenting to find out what words might be troublesome for the AI to pronounce. My book uses some Swedish and Norwegian words, as well as a bunch of names of cities in Europe. To top it off, a word in the Sami language is used. All of this was handled perfectly. Of course, the AI does not pronounce Malmö as a Swede would and doesn’t pronounce the Sami word correctly. Instead, it came out exactly as could be expected by a native American speaker.
Abbreviations turned out to be a bit of a challenge, MB was assumed to refer to Megabyte by the AI, which really did not suit the context of the book – not even close. No problem, we just rewrote it with a space in-between as M B and the AI handled it perfectly pronouncing it as em-bee.
We entered short samples of text in the Narakeet interface to test their behavior, to see how they came out. Then we adjusted the text with whatever we found worked best. Small pauses between paragraphs and chapters were accomplished by inserting an extra empty line of text. Really predictable and easy to implement.
Creating the audio file:
When we were happy with the above, it was time to upload your manuscript. To do that, just press the button to download the audiobook. You get to choose the format you prefer, .mp3, .mp4a, .wav etc. At this point, the site requires you to pay. You can buy chunks of time per half hour, or in larger chunks of several hours each. With half an hour rating at about €6, I found the pricing not bad at all.
Intro and outro:
While you have a lot of space in an ebook or paperback to add narrative text, you probably want to limit that to the absolute essentials in your audiobook. We decided to have a short intro and a short outro, both read by another AI voice, female this time (sorry, I forgot her name…).
Conclusion:
The results of the experiment surprised me in the most positive way possible. Shortly after lunch – we started around 10 am – the audiobook had been generated and downloaded for the review stage.
JP demonstrated great expertise in all aspects of the process of producing the audiobook with the procedure and tools described above. In order to streamline the process, knowledge of an audio editing tool (such as Audacity or similar) can be very handy, something JP showed he could handle very well too.
The result:
As you understand from the above, the job was now almost done. I’ll publish a short bit of the result here when we’re ready. Also details on how and where to get the complete audiobook will be published here shortly.
…a few days later…
While JP reported he had listened to the complete book to get an impression of it, I already knew the book well enough to do my listening while reading the manuscript. I picked up quite a few things that I’d like to improve on, mostly abbreviations and numbers, but also the name of one of the main characters.
By listening carefully to how the abbreviation and numbers were pronounced by AI Ronald, it was quite easy to figure out how to modify them to get what I wanted. ‘B&Bs’ became ‘Bee N Bee S’ and was simply changed to ‘B n Bees’ and worked splendidly that way. Ronald pronounced the gun caliber ‘9mm’ as ‘nine millimeters‘ which is an ‘s’ too much, so I changed ‘9mm’ to ‘9 millimeter’ and that fixed it.
The name of the character was easily adjusted by changing its spelling so Ronald’s anticipated pronunciation in American English would be much closer to what I had mind. It did.
With every chapter starting with a title, we’d like Ronald to pause shortly after reading the title before continuing with the content of the chapter. Fortunately, Narakeet has thought of this, and if you simply add (pause: X) in the text where you’d like a small break, X being the number of seconds to pause, Ronald had been instructed by Narakeet to understand this and stay silent for the indicated time.
Creating the final audio file:
If you kept track, you’ll notice that now the time had come to create the final audio file. So I sent my corrected document for the manuscript to JP and he made a new audio file again, adding a <pling> prior to each chapter.
These final results were listened again and turned out fine. So it was time for the final step: publication.
Publication:
We chose to upload our audiobook on Digibook. Their site allows you to upload a .wav file, but they also do accept other formats which they’ll convert to .wav format for you at an extra fee. Uploading your audiobook and your homemade cover file costs around 300SEK.
Digibook provides extra services at a fee, the earlier mentioned formatting to .wav file, but also creating a cover for your audiobook. I simply repurposed the cover file I already had for my ebook and paperback for the same title.
Similar to publishing an ebook, you fill out their form, declare you’re not a robot, pay the bill and you’re in business, or published of course in our case. A small extra hurdle in our case was the file size. Apparently, there’s an upload limit of 300Mb while ours was considerably larger. But not to worry, Digibook sent us a Dropbox link to use for uploading the file.
With the file uploaded, I now await further distribution to Adlibris, Bokon, Bokus, Nextory, BookBeat and Storytell.
Final conclusion:
The process to get from a written text to an audiobook seems quite doable. There are excellent tools online that help you get where you want to go, and for reasonable costs. Of course, there are some hurdles to take, such as handling the .mp3 and .wav files, and getting the audio file mixed up and ready for upload. If you’re like me – better at writing than juggling audio files – you just get some help with that bit, eg. from JP at Podverkstan.
Here’s a sample of my audiobook ‘The Big Day’.
And, if you listened to or read this book, or one of my other books, let me know what you think of it!
Paul
March 2023
Sources:
- https://www.podverkstan.se/produktion-av-ai-ljudbok/ (swedish)
- https://www.podverkstan.se/ (swedish)
- https://www.narakeet.com/
- https://www.digibook.se/
Click here for all essays on books and writing this site