Building Polymur: a tool for global Graphite-compatible metrics ingestion

[big shoutout to to Dixon for keeping Graphite great and giving Polymur a glance ahead of this post (along with adding it to the Graphite tools page)] About two years ago, I wrote about my initial endeavors with Graphite (in hindsight, there's a little bit of I-have-no-idea-what-I'm-doing sprinkled throughout that…

Field notes - ElasticSearch at petabyte scale on AWS

I manage a somewhat sizable fleet of ElasticSearch clusters. How large? Well, "large" is relative these days. Strictly in ElasticSearch data nodes, it's currently operating at the order of: several petabytes of provisioned data-node storage thousands of Xeon E5 v3 cores 10s of terabytes of memory indexing many billions of…

Concurrent communication performance in Go (and 288 cores of dessert)

This is in a way, a continuation of my last post on Go performance. In particular it's a discussion around the performance cost of communication at high rates— something fairly easy to experience, usually easy to solve, and often ignored. For kicks, this post evolves from somewhat useful anecdotes into…

Shared counter performance in Go using batched updates

I've long known that my tool Sangrenel (used for load testing Kafka) had some inefficiencies. Namely, the global counter. Sangrenel effectively fires up many workers that generate random messages. A global counter is used to periodically dump the rate of message generation in addition to controlling a global rate limiter…

Load testing Apache Kafka on AWS

[update: this was written before EC2 d2 instances were released, which I'm currently a huge fan of; I would definitely choose them over r3s] Notice my careful usage of the phrase "load testing" vs "benchmarking". Why is that? I think we've all learned by now that benchmark tests are often…

The architecture of clustering Graphite

[Note: It's not quite 'clustering' by my definition, but this post is linked to enough that it's too late to change the title. Based this on the Graphite config naming conventions for consistency.] [Note #2 - April, 2015: the purpose of this post was originally to describe the logical architecture…

Scaling PHP apps via dedicated PHP-FPM nodes

I'm just going to go out there and say it: I hate PHP apps. Actually, I'm just being mean. But now that that's out of the way ;) Having previously spent a few years working at a web host, I put in fair amount of time on scaling PHP applications, so…