Part One – Getting to Grips with GNIP
Okay so now I suppose I better start doing some geek stuff as its been far too long. What I’m going to try and accomplish is to build a simple Twitter client using PHP and GNIP and as little help from the Twitter API. Sure this may fail badly but we’ll learn something along the way. Why Twitter? Well I do lots of stuff with Twitter data, so why not. But I also want to demonstrate the power of GNIP to you as well as showing that there are ways to be friendly to Twitter & other providers without having to hammer their API’s.
A brief bit about GNIP. GNIP is a really cool service that allows publishers to push their content to the GNIP network. From there, we as developers or consumers of that data, can access it and do some pretty clever things like create filters on it, or have it pushed to a script somewhere on the internet. This means there has to be very little polling done on our part.
The really cool thing here is that we are not hammering the Twitter API. Sure we’ll still use it a little, but now we’re not exclusively tied to it. Really they guys in GNIP are a fantastic & friendly bunch and can describe their service much better than I, so pop along and check them out.
There are a couple things you are going to need here in order to build this client. These are;
1. A valid GNIP account
2. A PHP 5 enabled web server accessible over the internet.
3. MySQL DB server (any version is fine but I’ll be using 5.1RC as its my new play thing at present)
4. A valid Twitter account
Theory of Operation
Twitter provide GNIP a direct feed of their XMPP traffic. From there GNIP perform some XML transforms on the data to make it easier for us to consume. As we have a GNIP account, we’re going to create a filter that will send us all the tweets our friends make to a script sitting on our web server. Once we get the data, we’ll create an XML object in PHP to manipulate it and hopefully store the contents in a small DB. We can then do what we like with the data we’ve just stored. Perhaps we wish to count it, or maybe send it via email, sms, jabber etc
Getting GNIP Sorted
Signing up for GNIP is easy to do. Pop along to http://gnipcentral.com/ and click the “Create Account” link at the top of the page. Is a nice small form that does not look for too many details, so fill it in. At this point I have to mention their very cool auth system in the background. Although I can’t confirm it, I think Amazon’s Simple DB is somewhere in there. Pretty cool I think, I’ve been playing on the quite with something in that space not to far ago.
Play Time
Okay, so all of the above done, its time we had some fun with code. We’re not going to build the client just yet. We just going to play a little with the data that GNIP are sending to us and log it to a file so you get an idea of the workings.
The first thing we’ll need is a PHP script on our server that GNIP can access and send us the data we need. This is going to be pretty simple and just log the data.
<?php
// Define a log file where we can write our
// twitter data to.
define('LOGFILE' , '../var/logs/tweet.log');
// Get the raw post from GNIP, this should be raw XML
$data = $GLOBALS['HTTP_RAW_POST_DATA'];
// Open a log file to write data to
$logFile = fopen(LOGFILE, 'a+');
// We'll just put this all in a try catch block to catch
// any exceptions SimpleXML may throw at us.
try
{
// Create a new Simpel XML object that represents
// the data GNIP have sent us.
$xmlObj = new SimpleXMLElement($data);
// Get the twitter user, source, status url, tweet etc
$twitterUser = $xmlObj->activity->attributes()->actor;
$twitterTime = $xmlObj->activity->attributes()->at;
$twitterSource = $xmlObj->activity->attributes()->source;
$twitterStatusUrl = $xmlObj->activity->attributes()->url;
$twitterTweet = $xmlObj->activity->payload->body;
$twitterRaw = $xmlObj->activity->payload->raw;
// Create a log message that represents the tweet
$logMsg = date('Y-m-d', time()) . " - $twitterUser said $twitterTweet at $twitterTime using $twitterSource. View it at $twitterStatusUrl \n";
// And write the message to the log.
fwrite($logFile, $logMsg);
}
catch (Exception $e)
{
// Log the caught exception
fwrite($logFile, "Fail Whale \n");
fwrite($logFile, $e->getTraceAsString() . "\n");
}
// Close the log file
fclose($logFile);
?>
Yes for the moment there is no filtering or security checking on this script. Its just so you get an idea of what is going on. You should be able to follow the above easily enough, let me know if you get stuck anywhere. GNIP send us the data as raw XML that looks similar to the below;
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<activities publisher="twitter">
<activity source="web" url="http://twitter.com/statuses/show/963150866.xml" action="notice" actor="ajmckee" at="2008-10-17T00:37:02.000Z">
<payload>
<body></body>
<raw></raw>
</payload>
</activity>
</activities>
The activity element has many attributes. But don’t let that put you off, in fact they are really handy to have. They translate to the following;
1. Source = The twitter client that made the request
2. URL = The URL Twitter have assigned the status update in XML format
3. Action = The Twitter action of tweeting, this is called notice following the diso standard.
4. Actor = The Twitter username of the person whom made the tweet
5. At = The time the tweet was made
There is also the child element payload which itself has two further child elements. The first, body, represents the contents of the tweet itself, the second, and a very important one later on in this raw. Raw represents the raw data GNIP got from Twitter themselves. Its even more XML that has been gzip’d and base64′d. We wont worry about this just yet however as we just want to play a little here to understand what is going on.
Once you have the script uploaded to your server, make sure the log file is writable by you server software, otherwise you won’t see anything.
Create a GNIP Filter.
Log into the GNIP interface and you’ll be presented with a list of documents you will need to read after this. Scroll down to the publishers list at the bottom of the screen and there you will see Twitter listed (This would work for any publisher you wish to integrate with). Click on the “Create Filter” link.
This is where your going to tell GNIP what data you want captured and where you wish to send it. Give your filter an easy to remember name. Something like myfiltertest is fine. Then you need to define your post URL. This should be the location of the above PHP script on your web server. As we don’t have any security checks in place, I recommend you keep that URL very well hidden and don’t tell anyone about it except GNIP. Its not a sure fire way against abuse though so use it at your own risk.
Finally you need to define some actors you wish to get messages from. Now normally for testing purposes, you would add your Twitter followers. However as its just for a test, I usually pop along to http://tweetrush.com and get daily top ten Tweeters and use them. You must specify the actors separated with a comma (,). That way we’re guaranteed to capture some data.
You can ignore the rest of the filters for the moment. They allow you to further filter the data that can be sent to you based on source, to etc.
Once you are happy that your form is correct, hit submit and you should be done. GNIP is pretty fast, so you should start getting data within moments of you creating the filter and someone on your actors list tweeting.
It you’ve done every thing right up to here, you should be able to tail -f your log file. As soon as someone tweets, you will begin to see an entry similar to this;
2008-10-18 - ItIsNow said ItIsNow: 1:33 pm, Saturday 18th of October 2008 at 2008-10-18T12:33:00.000Z using web. View it at http://twitter.com/statuses/show/965069427.xml
Finally
That should be enough to keep you playing for a little while. Remember to delete your filter when finished so that your log file does not fill up. A couple of very active tweeters generate a lot of messages.
You should now have a basic twitter client that is capable of getting tweets from people you tell it to watch for a log it to a file. Not too useful yet but don’t worry I’ll get to all that soon. At this point, go an read the GNIP documentation so that you know what you can and can’t currently do. Next time I’ll walk you through creating a small client interface and DB to hold your preferences then creating filers programatically using the GNIP PHP library. FOr the moment have fun and explore, feel free to contact me via comments with your questions.
