Three Ways to Scale StatsD

May 22, 2015 One Comment statsd

scaling statsd
StatsD is a simple yet powerful aggregation tool that is usually a good fit for code monitoring. Node.js makes it fast, but since it’s single threaded, it’s a bit challenging to scale up.

Why Scale StatsD?

On commodity hardware, StatsD can easily achieve rates of 40,000 messages per second. That’s a lot, but in some cases might not essay editing service be enough. Perhaps you are monitoring 100 metrics per pageview (and why not?). If so, above 400 pageviews per second StatsD will drop metrics.

In this article we will discuss three ways of scaling up StatsD:

Requirements

Install node.js

On Debian / Ubuntu:

On CentOS / Fedora:

On Mac OS X with brew:

Download node.js install for Windows and MacOSX

I – StatsD Cluster Proxy

* didn’t really work out

StatsD Cluster Proxy redirects load in front of other StatsD workers. It always routes the same metric types to the same StatsD process, using a hashring.
If an instance goes down, the hashring is paper writers recalculated so (almost) no metrics get lost.

The main problem: it’s still single threaded. It becomes the new bottleneck and achieved fewer messages per second than a single StatsD, at least in the current version.

First, install and configure StatsD
Then create a proxyConf.js file:

statsd proxy

Create four StatsD configuration files:

File node1.js :

File node1.js :

File node2.js :

File node4.js :

Run StatsD Proxy Cluster:

Run the 4 StatsD processes:

Results:

Doing some load testing yielded disappointing results. StatsD Proxy Cluster is single threaded, so it becomes the new bottleneck. I was able to achieve “only” 30k events per second.

The proxy cluster reaches 100% CPU and starts dropping events. Behind the proxy, the workers didn’t do much.

load statsd proxy cluster

StatsD Proxy Cluster isn’t ready yet for scaling up by itself. Still, it’s a good tool that adds a lot of flexibility.

II – Hashring in Code

StatsD Cluster Proxy uses the clever idea of the Hashring. As this process is the bottleneck, why not moving the logic into our own code?

Almost all programming languages have a hashring library. If you’re developing in node.js you can use node-hashring

The hashring tell us where the metrics “my.own.metric.name” should be sent:
> this metric should be sent to 127.0.0.1:8040

Fully working Hashring code:

You will have to create the four StatsD configuration files and start the four processes.
(see explanation in Part I above)

III – One StatsD Proxy Per Host

If you need to scale StatsD, chances are that you are running a cluster. In a clustered environment, multiple servers run in parallel, so you can install one proxy cluster per host. Calling the proxy on localhost will redirect the same events to the same StatsD process.

Rule of thumbs under heavy load: number of StatsD processes = number of StatsD proxies

statsd multi host

Results:

Using this configuration on an EC2 instance with 36 cores (c4.8xlarge) we achieved 440k events per second. A very good result that can scale linearly. To scale even further, simply start more proxies and worker nodes.

load statsd multi proxy

Monitor & detect anomalies with Anomaly.io

SIGN UP
help with term papers