1 Million Writes on 60 nodes with Cassandra and AWS EBS Service

The videos are now online from the 2015 Amazon Re:Invent Conference where I had the privilege to speak on behalf of CrowdStrike on how we were able to achieve 1 million writes per second on 60 nodes with Amazon EBS.

We put a good 3+ months into testing and benchmarking Cassandra with EBS and getting the current EBS story with their team  to understand what improvements were made in regards to availability and performance since the outages of the 2011/2012 timeframe.

I’m happy to say that because of our work with EBS and DataStax, that DataStax has now given the green-light to run EBS GP2 in production with Cassandra. Get ready to save some $$!

Below is a link to the presentation

Cassandra’s DateTieredCompaction Strategy does not work as billed

UPDATE: While at first DTCS seemed like a workable solution, after running it under real world workloads, while needing to scale up the cluster, DTCS broke down completely. I recommend looking at TWCS by Jeff Jirsa of CrowdStrike. It’s what DTCS should have been. Reference:


I’ve been looking at throwing Cassandra at a use case I had come up. Storing billions of items a day but only needing to keep that data for a couple weeks on a rolling window. I was obviously nervous about having to TTL out that large of a data volume on a rolling window. Basically write(2TB), delete(2TB) in the same day. I started on the hunt on the latest C* docs and came across a gem that was recently released in the 2.0.11 release contributed by Spotify: DateTieredCompactionStrategy.  This strategy is great if you’re looking to have a table that is just on a rolling time window. Use cases like time series and analytics where every write has the same ttl and it comes in a forward only manner, meaning you’re not backfilling data later. The feature that really interested me was that it can look at an SSTable to determine if the whole table is out of expiry vs having to spend CPU cycles on merging and dropping data. rm *foo-Data.db. which should make large scale TTL’ing much less I/O and CPU intensive. I wanted to see if it lived up to the hype so I set up a cluster of 3 m3.2xlarge machines and created brand new keyspaces and a new table defined later in the article. I fired up another loader machine to act as a writer and tuned it to write 10,000 events per second to the new cluster on a continuous stream for 12-24 hours. Here’s what I set up

  • 3 node dse 4.60 test cluster with Cassandra 2.0.11
  • created a new table with the DateTieredCompactionStrategy
  • inserter in golang using gocql writing 10,000 inserts per second causing a load of 3 on an 8 core box
  • each insert had a 30 minute TTL with “USING TTL 1800”

I started the test hoping to see a new saw tooth pattern of SSTables on disk and disk space consumed by that column family. Instead to my shock it just kept going up and up all day. I scoured the docs but came up empty. Datastax was kind enough to point out it may be related to the “tombstone_compaction_interval” which defaults to 1 day. That basically means an SSTable won’t be a candidate for deletion until after that date. Once I changed that setting to 1 everything seemed to work like a champ. This was before I set the interval for testing

Screenshot 2015-01-30 16.01.30


Screenshot 2015-01-30 16.02.13

You can also see with DTCS it results in a nice sawtooth on the SSTable count.

Screenshot 2015-01-30 16.04.02


For a comparison, below is the exact same test with LeveledCompaction instead of DateTiered. You can see the data volumes just continue to grow.

Screenshot 2015-02-03 08.05.18

Screenshot 2015-02-03 08.08.26

After 24 hours the leveledcompaction table has 10x the data on disk on a single node. Sadness.

Screenshot 2015-02-03 22.13.32



30 hours later the cluster running the LeveledCompaction 30 minute TTL inserts died a horrible death. It filled up all available disk space on a couple nodes and started crashing the Cassandra process. </endofcluster>

Screenshot 2015-02-04 09.05.01

If you want to come work on high-scale distributed systems, we’re hiring!


here is a create table script if you’re interested in a test on your own hardware.

	id text,
	metatype text,
	event_time timestamp,
	rawbytes blob,
	PRIMARY KEY ((id, metatype), event_time)
  gc_grace_seconds = 0 AND
  compaction={'class': 'DateTieredCompactionStrategy','tombstone_compaction_interval': '1', 'tombstone_threshold': '.01', 'timestamp_resolution':'MICROSECONDS', 'base_time_seconds':'3600', 'max_sstable_age_days':'365'} AND
  compression={'sstable_compression': 'LZ4Compressor'};
session.Query(`INSERT INTO ttl_leveled (id, metatype, event_time, rawbytes) VALUES (?, ?, ?, ?) USING TTL 1800`,
		randomId, metaType, eventTime, rawBytes).Exec()

Leveled Compaction Table Create:

CREATE TABLE ttl_leveled (
  id text,
  metatype text,
  event_time timestamp,
  rawbytes blob,
  PRIMARY KEY ((id, metatype), event_time)
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=0 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'tombstone_threshold': '.01', 'tombstone_compaction_interval': '1', 'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

© 2017 Jim Plush: Blog

Theme by Anders NorenUp ↑