Today I’m introducing PoemToday, a random poem generator that’s powered by user behavior. Here is the story of how it works and how I made it.
PoemToday (also on Github) is a Rails web app that statistically generates a poem based on its users’ profile and behavior. Each of the 6,000 poems on the site actually has a link wrapped around every word in every poem. When a user clicks one of the words, the site initiates a search of its database for the best-matching poem and redirects the user to the top result, alongside with the top image from the Flickr API for that word.
1 2 3 4 5 6 7 8 9
Behind the scenes, PoemToday is storing information about all of the words the user has clicked and the poems the user has visited in a temporary session. When the session has enough data, it statistically generates a completely unique “ephemeral” poem with a statistical process known as a Markov chain.
simplenlg vs. Markov chains
Initially I tried to implement a linguistics strategy for randomly generating poems. If a user had clicked three nouns, I figured, how difficult would it be to find a verb and then create a semi-sensical sentence? Using the Wordnik API and a ruleset I could fill in the missing parts of speech.
I spent a while building a Natural Language Generator without realizing that was what I was doing. Then I stumbled on simplenlg, both the Java API and the ruby gem, which opened up a whole new tangent of linguistic programming to me. I really love language (why else would I make a random poetry app?? ;) and this intersection of code and linguistics was an especially exciting turn in the project.
With user-selected keywords, simplenlg, and the Wordnik API’s random words endpoint, which features random word generation by part of speech as well as frequency, I was able to weave together some fairly compelling random poems.
Then I stumbled upon Markov chains in my research, which on the first few tests in the command line quickly proved much more compelling than my patchwork NLG solution. The Markov chains were uncanny and required users click far fewer keywords to produce better sounding random poetry. It was a much lower bar to “minimum emotional product”.
1 2 3 4 5
The most fascinating aspect of Markov chains to me is that they know nothing about parts of speech or linguistics. They are just pure mathematics. And yet they sound so human.
PoemToday also features a daily email option. The app will email you a poem every morning, along with information about why that poem was matched to you on that day.
The app connects to APIs from Twitter, The New York Times and Forecast.io for user profile data sources. The app also uses the Wordnik API to score keywords on frequency from each data source. The Wordnik API goes back to 1800, but I limited it to 2000 for contemporary word usage.
I treated keyword frequency like golf scores – the lower, the better, so that rare words percolate to the top of poem matches and users’ inboxes. You can get a good sense of how the user-to-poem daily matching algorithm works from this Gist.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
I also used a Postgres search gem, pg_search, to quickly search through all of the poems and their content. This was a lightweight option versus implementing a platform such as Solr or EC2.
PoemToday was one of my most technically ambitions project to date. Judging from the fascinated reactions of a few dozen beta users, I’d say PoemToday is on a path to success and I look forward to continuing to build it.