Illustrend home

What's been cooking?

It's been a while since my last update, so I figured it was about time I gave a short overview of what's new.

Unique Tracking

Tracking of unique values was disabled when I switched to using counters in Cassandra, as reading the old value to check if it was the first update wasn't very scaleable. The whole point of using counters in Cassandra was to skip this step, and make Illustrend scale horizontally.

To work around this limitation, I've introduced Bloom Filters instead, and created an Erlang gen_server which handles everything.

I still need to make this new gen_server scale, but it's quite doable with hashing, and distributing the load across the cluster.

Monitoring

We've tried all kinds of things to monitor our traffic at work and get notifications whenever something abnormal happens, but nothing has worked satisfactory.

As I now have full history after moving to counters instead of values, I figured I'd have a go, and it actually seems to work.

What seems to work is keeping a running estimate of how much traffic you can expect in a given 5-minute time slice, based on previous performance for the same time slice.

The estimates are calculated with a variation of a Kalman Filter, which took a while to get working correctly, but I'm really happy with the results so far.

I then average the previous 5-minute slot estimate, this slot estimate, and the estimate for the next 5 minutes.

If your current traffic is very different from the averaged estimate, you'll get notified.

New Inputs

I've added a few more input scripts to try and generate some load on my development installation. It's pretty easy, and you can write inputs in any language you want, so long as you can make a HTTP GET request, or push data to a socket.

PostgreSQL

Chart and track table and index usage, including sequential scans, index scans, rows read, blocks read from disk, and just about any other statistic in PostgreSQL.

Twitter

Twitter is a great place to get data for systems you're trying to scale or test.

I've inserted all hashtags from the public Twitter gardenhose (about 10% of Twitters traffic) which gives me a chart of each hashtag, as well as a real-time view of trending hashtags.

I know there are lots of services doing this on the net, but it's fun to roll your own and it's a great way to see how flexible Illustrend is.

Wikipedia

Another fun place to get loads of data is the Wikipedia changes irc channel. So I've created a small Ruby bot which connects to the channel and counts the changes to all pages, as well as all changes done by a user.

The Future

Next on the agenda is adding logins and some form of access control to the user interface, as this has been requested by pretty much everyone I've talked to so far.