This post walks through the steps I took to set up text-to-speech in a Rails 4 app. To learn about other options for implementing text-to-speech in web apps generally, see Text-To-Speech Options for Web Apps.
Also, the Twilio blog has a tutorial worth reading, Integrating Twilio With Your Rails 4 App, but it does not cover setting up the crucial Twilio.Device, the “conduit” for TTS and a lot of other Twilio API goodness, in Rails 4.
Before we get started, note that Twilio.Device only works when connected to the internet. So you can’t test Twilio.Device from localhost. Instead, you’ll need to expose your localhost to the internet with a utility called ngrok. (See #9 below.)
In order to set up Twilio TTS in a Rails 4 app, we need to complete 10 steps.
- Sign up for Twilio
- Install the twilio-ruby gem
- Add endpoint for Twilio
- Disable DSRF Detection on Twilio endpoint
- Create a TwiML App
- Setup Capability Tokens
- Put Capability Token and text in an HTML form
- Add Twilio.Device javascript
- Setup ngrok to test
- Deploy
For this walkthrough, I’ll refer to code examples from PoemToday, a Rails 4 app that generates random poems from user input and reads them aloud. Let’s begin.
1) Sign up for Twilio
Easy enough. You’ll need an Account SID
(short for “security identifier”) and Auth Token
. Remember to hide these keys in secrets.yml
if you’re pushing up to a public repo.
2) Install the twilio-ruby gem
Twilio provides an official ruby wrapper. We’ll use the DSL from the gem to create TWiML, or Twilio XML, and to create Capability Tokens.
3) Add endpoints for Twilio
We’re going to create a POST endpoint for Twilio to tap into our app and receive the text we want read aloud. To do this in PoemToday, I added a voice
action on my poems controller.
1
|
|
Then, within the PoemsController, I setup the voice
action to build a Twilio Response Object with the text we want spoken (sent as params
). The method utilizes Twilio’s Say verb and renders the response
as TwiML.
1 2 3 4 5 6 7 8 9 10 11 |
|
This code makes use of the set_header
and render_twiml
helper methods from the Twilio Rails 4 tutorial, which we can put into a Webhookable
module in Concerns.
1 2 3 4 5 6 7 8 9 10 11 |
|
One way to think about this code is that we’re creating an endpoint in a special variant of XML for Twilio to come and read. We’re creating a private API for Twilio.
4) Disable CSRF Protection on Twilio endpoint
By default, Rails 4 blocks 3rd parties from POST
ing, so as to prevent CSRF attacks. To accomplish this, Rails generates a random token when a form is created and then checks the token when the form is submitted. We want to accept a POST
from Twilio, so we’ll disable CSRF detection for the voice controller action.
1 2 |
|
5) Create a TwiML App
Now that we have a permitted endpoint, we can create a new TwiML App, which is just a set of URLs that tells Twilio what to do when it receives a call via telephone or Twilio.Device.
Create a new TwiML App under Dev Tools in your account and enter the POST
endpoint (ie, http://poemtoday/poems/voice
) as the Voice Request URL
. After saving, note your new TwiML App’s SID
, which we’ll need to create Capability Tokens.
6) Setup Capability Tokens
In order to invoke Twilio.Device, users need to have a valid Capability Token. Tokens are valid for 24 hours and it’s better for security reasons to give each of your users their own token. I actually create one on each poem page load.
Tokens can have Incoming or Outgoing Connection capabilities, or both. Since we want Twilio to POST
to our app, we’ll configure these Capability Tokens with an Outgoing Connection and pass in our TwiML App’s SID
, which lets Twilio know where to find our voice
endpoint.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Note that we’re creating @token
, an instance variable on the controller, so that we can pass it up to the view.
7) Put Capability Token and text in an HTML form
As we saw above in PoemsController, our Twilio Response Object accepts the text to be spoken as params, which we can send via form submission. We can use the HTML data
attribute tag to pass the @token
to javascript.
PoemToday actually uses a hidden form and the volume-up
icon from Font Awesome for a submit button.
1 2 3 |
|
8) Add Twilio.Device javascript
Twilio’s TTS works through Twilio.Device, an API object. Twilio.Device serves as the main entry point for connecting with Twilio. For TTS, a connection can be understood as both a telephone and an api call. Twilio.Device connects, sends the relevant text to Twilio, holds open a port which receives an audio reading of the text, and then disconnects / hangs up.
Twilio.Device is only available on pages that have twilio.js. PoemToday is mostly poem pages, so I added a javascript_include_tag
to my layout.
1
|
|
Finally, we’re ready to add the javascript that will invoke Twilio.Device with the user’s token and pass the text we want read aloud (ie, #poem-content
). The code below disables the button while the connection is active.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
9) Setup ngrok to test
As mentioned above, Twilio.Device only works when connected to the internet. To test, you’ll need to make localhost accessible via the internet with a utility such as ngrok. Twilio provides a good tutorial on how to setup ngrok.
Be sure to create a second TwiML App with your currently running ngrok address as your root domain. It should look something like http://3eb3c9ba.ngrok.com/poems/voice
.
10) Deploy
That’s it! If you’ve completed all the steps above, you should be good to go with adding text-to-speech to your Rails 4 app.
If you run into any trouble or have any questions, feel free to ask me for help in the comments below or message me on Twitter.