Building a GeoIP server with ZeroMQ

I discovered ZeroMQ while researching Mongrel2 and thought it was worth deeper investigation.  At my day job, we have lots of services that don't scale well inside our front end stack, so we expose them as a service that each front end consumes.  Some examples: search, memcache, geoip resolution, geo targeting, compatibility between users, etc. Basically, anything that relies on state that is expensive to duplicate on each front end becomes a service.

Some of these services are common 'off the shelf' components (search, memcache), others require custom logic (geo ip resolution, geo targeting, user compatibility). For our own services, we didn't want to write and debug a bunch of socket code, so we shoe-horned them into a custom memcached build with some new verbs.  This way, memcached handled all the socket connection management, and we could just focus on writing our service. 

This Frankenstein memcached approach worked well, but as I learned more about ZeroMQ it's clearly a more natural fit. ZeroMQ handles all connection management, makes concurrency fairly simple, and has bindings in just about every language.  As a quick weekend project, I implemented a clone of our GeoIP service using ZeroMQ.  This was a 'learning ZeroMQ' project for me, so if you notice any bugs, please send them my way.

Introducing geoipmq - src at http://github.com/bohlander/geoipmq

The geoip service performs a simple task: given an ip address, where is it located geographically?  Luckily, MaxMind has a great open source library that makes it easy to answer this question.  The free version is 79% accurate at the city level, but you can pay for a more accurate version of the database. Implementing this service is basically just a matter of wrapping the MaxMind library in a ZeroMQ communication layer.  All of the code snippets below are based on this excellent ZeroMQ guide.

ZeroMQ has several different socket types, such as 'request/reply, pub/sub, push/pull'. The best fit for this project is the 'request/reply' socket. Creating a reply socket to handle incoming requests is very easy:

Once the socket is created and bound to a transport, it's just a matter of responding to messages that are received on the socket. You don't have to worry about accepting or managing incoming connections -- you just respond to messages, and ZeroMQ handles the rest. In the code below, the zmq_recv() call blocks until a message is received:

It's worth noting that this code treats any message larger than MAX_BUFFER as an error -- this isn't a ZeroMQ limitation, just one I instituted to defensively deal with long messages. Another important note is that ZeroMQ messages are not null terminated, so they recommend you terminate yourself if you're using the C library.

Finally, we process and respond to the geoip request if it's formatted correctly.  We expect requests to be of the form "geoip ipaddr".  We could represent this on the wire more efficiently, but let's keep it simple for now:

That's more or less it -- the resolve_ip() function just calls the MaxMind library that gives the geographic information for each ip, and that gets returned to the client as a tab-delimited string.

Clients

How to communicate with this service?  Just create a 'request' socket, and connect it to the server.  In Ruby:


What did we just build? 

This very simple server is actually quite functional.  ZeroMQ automatically handles all connection management and socket IO in a separate thread, leaving our main thread with the sole responsibility of responding to geoip requests.  In a completely unscientific test on my laptop I was able to get roughly 13k requests/sec using random IP addresses and 20 concurrent clients.

A possible improvement would be to make our message responder multithreaded.  I'll explore this, as well as more detailed perf testing in a later post.