RingCentral Game Changers: Transcription with Rev.ai

Today we learn how to automatically transcribe incoming audio messages into text using Rev.ai. This will allow us to leverage text-based categorization and sentiment analysis tools!

This article is part one in a month-long series aimed at learning and exercising the RingCentral APIs in Python as part of their new Game Changers challenge. Feel free to follow along, leave a comment, or even participate in the challenge yourself!

Yesterday, we talked a great deal about the MonkeyLearn API and how it can be used for topic extraction or sentiment analysis of text-based messages. Unfortunately, this text limitation means we can’t directly leverage MonkeyLearn to categorize and filter incoming voice messages. We have to convert the audio into text first!

Once upon a time, I leveraged the services of Rev.com to translate some personal documents from English into another language so I could file them with the government of another country. It was fast, easy, and relatively inexpensive to use their platform to schedule the translation job. Recently, the team has expanded their services and now provides audio to text “translation” through the Rev.ai platform.

Setup

Once again, the very first step is to create a free account on Rev.ai. According to the platform’s pricing, the first 5 hours of transcription are free and no credit card is required to sign up, so let’s get started. Once you sign up, the very first thing you can do is upload a file to try a direct transcription.

Rev.ai Dashboard
Rev.ai Dashboard

As I want a better idea of how the system works, let’s give it a try using the audio from a recent YouTube video I posted about CoderCruise. It’s only a 3 minute video, and the extracted audio file is just about 3MB in size. It’s a great quick test, though I truly hope my customers never leave voice messages this long. Uploading the file takes you immediately to the Recent Jobs screen where you can see the job is processing. After a few minutes, it will complete and the transcript is available, either in text or JSON format.

Video Transcript
Transcript of my CoderCruise testimonial video.

It’s not perfect, but it’s a close enough speech-to-text solution that we can then pipe it through our AI classifier and gain some insight on the contents without needing to listen to the audio itself.

Wiring the API

Just like before, we’ll be leveraging Python for our API interactions. The first step to leveraging the Rev.ai API is to install it:

pip install rev_ai

Once the API is installed, we can submit a job programmatically, just like we did through the web console:[ref]There is a capability to submit hosted files directly. I’m not standing up a web server to host this audio for now, but moving forward we can have Rev.ai download our voice messages directly from RingCentral and avoid storing the raw audio anywhere else.[/ref]

from rev_ai import apiclient
access_token = 'your_access_token'

# Create client with your access token
client = apiclient.RevAiAPIClient(access_token)

file_job = client.submit_job_local_file(filename="/home/ericmann/codercruise.m4a", metadata="CoderCruise 2019 testimonial", skip_diarization=False)

The output of this operation is, again, a job handle with an in-process status. We can further leverage the API to query for the job status, polling until it completes, or leverage a callback URL so the API will tell us when it’s done.[ref]Our full voicemail assistant will leverage a callback URL so we can directly link the transcript output from Rev.ai to our classifier with MonkeyLearn.[/ref]

For the moment, it’s enough to refresh the Recent Jobs web console and see the new job queued up for processing. Once it’s done, we can retrieve either the text or JSON transcript of the audio with another API call:

import json
from rev_ai import apiclient
access_token = 'your_access_token'
job_id = 'your_job_id'

# Create client with your access token
client = apiclient.RevAiAPIClient(access_token)

# Get transcript as text
transcript_text = client.get_transcript_text(job_id)
print(transcript_text)

Next Steps

Now the various building blocks of our virtual voicemail assistant are starting to take shape. The assistant can receive voice messages from customers. We can convert those messages to text. An AI platform can automatically categorize those messages for us. The next step in the process will be to act on those categorizations as they come in!