This post walks through the steps I took to set up text-to-speech in a Rails 4 app. To learn about other options for implementing text-to-speech in web apps generally, see Text-To-Speech Options for Web Apps.
Also, the Twilio blog has a tutorial worth reading, Integrating Twilio With Your Rails 4 App, but it does not cover setting up the crucial Twilio.Device, the “conduit” for TTS and a lot of other Twilio API goodness, in Rails 4.
Before we get started, note that Twilio.Device only works when connected to the internet. So you can’t test Twilio.Device from localhost. Instead, you’ll need to expose your localhost to the internet with a utility called ngrok. (See #9 below.)
In order to set up Twilio TTS in a Rails 4 app, we need to complete 10 steps.
- Sign up for Twilio
- Install the twilio-ruby gem
- Add endpoint for Twilio
- Disable DSRF Detection on Twilio endpoint
- Create a TwiML App
- Setup Capability Tokens
- Put Capability Token and text in an HTML form
- Setup ngrok to test
For this walkthrough, I’ll refer to code examples from PoemToday, a Rails 4 app that generates random poems from user input and reads them aloud. Let’s begin.
1) Sign up for Twilio
Easy enough. You’ll need an Account
SID (short for “security identifier”) and
Auth Token. Remember to hide these keys in
secrets.yml if you’re pushing up to a public repo.
2) Install the twilio-ruby gem
Twilio provides an official ruby wrapper. We’ll use the DSL from the gem to create TWiML, or Twilio XML, and to create Capability Tokens.
3) Add endpoints for Twilio
We’re going to create a POST endpoint for Twilio to tap into our app and receive the text we want read aloud. To do this in PoemToday, I added a
voice action on my poems controller.
Then, within the PoemsController, I setup the
voice action to build a Twilio Response Object with the text we want spoken (sent as
params). The method utilizes Twilio’s Say verb and renders the
response as TwiML.
1 2 3 4 5 6 7 8 9 10 11
This code makes use of the
render_twiml helper methods from the Twilio Rails 4 tutorial, which we can put into a
Webhookable module in Concerns.
1 2 3 4 5 6 7 8 9 10 11
One way to think about this code is that we’re creating an endpoint in a special variant of XML for Twilio to come and read. We’re creating a private API for Twilio.
4) Disable CSRF Protection on Twilio endpoint
By default, Rails 4 blocks 3rd parties from
POSTing, so as to prevent CSRF attacks. To accomplish this, Rails generates a random token when a form is created and then checks the token when the form is submitted. We want to accept a
POST from Twilio, so we’ll disable CSRF detection for the voice controller action.
5) Create a TwiML App
Now that we have a permitted endpoint, we can create a new TwiML App, which is just a set of URLs that tells Twilio what to do when it receives a call via telephone or Twilio.Device.
Create a new TwiML App under Dev Tools in your account and enter the
POST endpoint (ie,
http://poemtoday/poems/voice) as the
Voice Request URL. After saving, note your new TwiML App’s
SID, which we’ll need to create Capability Tokens.
6) Setup Capability Tokens
In order to invoke Twilio.Device, users need to have a valid Capability Token. Tokens are valid for 24 hours and it’s better for security reasons to give each of your users their own token. I actually create one on each poem page load.
Tokens can have Incoming or Outgoing Connection capabilities, or both. Since we want Twilio to
POST to our app, we’ll configure these Capability Tokens with an Outgoing Connection and pass in our TwiML App’s
SID, which lets Twilio know where to find our
1 2 3 4 5 6 7 8 9 10 11 12 13
Note that we’re creating
@token, an instance variable on the controller, so that we can pass it up to the view.
7) Put Capability Token and text in an HTML form
As we saw above in PoemsController, our Twilio Response Object accepts the text to be spoken as params, which we can send via form submission. We can use the HTML
data attribute tag to pass the
PoemToday actually uses a hidden form and the
volume-up icon from Font Awesome for a submit button.
1 2 3
Twilio’s TTS works through Twilio.Device, an API object. Twilio.Device serves as the main entry point for connecting with Twilio. For TTS, a connection can be understood as both a telephone and an api call. Twilio.Device connects, sends the relevant text to Twilio, holds open a port which receives an audio reading of the text, and then disconnects / hangs up.
Twilio.Device is only available on pages that have twilio.js. PoemToday is mostly poem pages, so I added a
#poem-content). The code below disables the button while the connection is active.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
As mentioned above, Twilio.Device only works when connected to the internet. To test, you’ll need to make localhost accessible via the internet with a utility such as ngrok. Twilio provides a good tutorial on how to setup ngrok.
Be sure to create a second TwiML App with your currently running ngrok address as your root domain. It should look something like
That’s it! If you’ve completed all the steps above, you should be good to go with adding text-to-speech to your Rails 4 app.
If you run into any trouble or have any questions, feel free to ask me for help in the comments below or message me on Twitter.