Category: Social Media

Twitter Data Collection tools released

I have released the Java programs that I’ve been using to collect data from Twitter for my PhD since November 2009.

It is suite of programs that use the Twitter Search API and Twitter Stream API to get tweets and then store them in a mySQL database. They have been tested on Mac OS X and Debian Linux.

The Java source is split into 9 Eclipse projects using Maven to bring in the external library dependencies.

For people who just want to use the tools, I’ve included the runnable Java files for the main modules and the shell scripts and a suggested cron file to run them in the Example directory. The README.MD file in the Example directory has instructions on how to create the mySQL databases and configuration files.

The tStreamingArchiver project code is available on GitHub: https://github.com/brendam/tStreamingArchiver.

Please tell me if you find it useful.

Twitter changed to SSL only for streaming API today

This morning my Twitter data collection program suddenly started failing to connect. I’m using the the excellent twitter4j library for connecting to Twitter.

The error was “Connection Refused” with this response:

TwitterException{exceptionCode=[b5e7486f-24943238 b5e7486f-2494320e], statusCode=-1, retryAfter=-1, rateLimitStatus=null, featureSpecificRateLimitStatus=null, version=2.2.4}

I found out that Twitter has turned on only accepting SSL connections for connecting to streams today. (https://dev.twitter.com/blog/streaming-api-turning-ssl-only-september-29th)

I tried setting builder.setUseSSL(true) in Twitter4j, but that didn’t fix the problem. There is a new snapshot build of twitter4j that does fix it (2.2.5-SNAPSHOT). It is available for download from http://twitter4j.org.

I’m using Eclipse Helios and Maven and had some trouble working out how to get the SNAPSHOT. In the configuration I have, it picked up the snapshot of twitter4j-stream-2.2.5-SNAPSHOT.jar, but not the twitter4j-core-2.2.5-SNAPSHOT.jar. I tried a few different things to make it get the core snapshot which didn’t work, but then found that disabling the releases in the repository definition worked:

Not sure why the twitter4j-stream snapshot was downloaded but not the twitter4j-core without changing the POM. But with this change, my data collection is working again although I’ve missed a few hours of data.