Our dodgy twitterbot

July 23, 2009

As a bit of fun, and in order to get a bit of experience with python and Google’s Appengine, I decided to write a twitterbot. The requirements were pretty straightforward: don’t be an annoying spambot; be useful.

Taking those I came up with a bot that will listen for questions asked of it, and reply with an answer. It won’t pay any attention to anything else unless you talk directly to it.

Being a python newbie, I avoided the temptation to dive right in and start hacking away. That way lies madness. Or nasty code anyway. I downloaded GAEUnit and resolved to TDD the whole thing.

As usual, the design that emerged from the tests was completely different to the one I had in my head to start with. Much more flexible and testable.

I used python mock library for a lot of the tests. Mocking Google’s urlfetch library, for instance, to test the code for calling Twitter’s API.

The final design consists of a ListenerHandler, which gets called by a cron job every few minutes (currently five minutes, but feedback has been that this is not responsive enough). The listener asks Twitter for a list of all the mentions for a particular user, filters out ones without text or that mention the user without directing a question at it (“check out @csausbot it is oarsum!”), writes them to the datastore, and finally adds a task for each question to the default queue.

Task queues were only introduced into the API a few weeks ago, and are labelled “experimental” by Google. Adding a task to a queue involves specifying a URL to call and a POST payload that defines the task. The API will then call that URL at some point in the future. If the URL returns a http status of 200, the task is taken off the queue.

I defined an AnswerHandler to be the queue consumer, passing the question ID in the payload. The Answerer loads the question, asks for an answer from an oracle (just a class that has an “answer” function) and replies via the twitter API.

The Answerer is written in a way so that you can plug in different oracles without having to touch the rest of the code. Our Citysearch oracle simply takes the question and uses it to generate a keyword search URL which it sends to the user, but there’s no reason why you couldn’t do something fancier. You have access to previous questions asked by the user, so you could tailor responses based on past questions or even carry on a conversation.

The source code is on github, with all the tests and the appengine config files. Improvements I’d like to make include using OAuth to access Twitter (currently it uses basic authentication, storing the Twitter credentials in the datastore), perhaps making use of the location information provided with each tweet, and providing mobile-friendly versions of the search results.