Cassandra’s DateTieredCompaction Strategy does not work as billed

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

UPDATE: While at first DTCS seemed like a workable solution, after running it under real world workloads, while needing to scale up the cluster, DTCS broke down completely. I recommend looking at TWCS by Jeff Jirsa of CrowdStrike. It’s what DTCS should have been. Reference: https://issues.apache.org/jira/browse/CASSANDRA-9666

 

I’ve been looking at throwing Cassandra at a use case I had come up. Storing billions of items a day but only needing to keep that data for a couple weeks on a rolling window. I was obviously nervous about having to TTL out that large of a data volume on a rolling window. Basically write(2TB), delete(2TB) in the same day. I started on the hunt on the latest C* docs and came across a gem that was recently released in the 2.0.11 release contributed by Spotify: DateTieredCompactionStrategy.  This strategy is great if you’re looking to have a table that is just on a rolling time window. Use cases like time series and analytics where every write has the same ttl and it comes in a forward only manner, meaning you’re not backfilling data later. The feature that really interested me was that it can look at an SSTable to determine if the whole table is out of expiry vs having to spend CPU cycles on merging and dropping data. rm *foo-Data.db. which should make large scale TTL’ing much less I/O and CPU intensive. I wanted to see if it lived up to the hype so I set up a cluster of 3 m3.2xlarge machines and created brand new keyspaces and a new table defined later in the article. I fired up another loader machine to act as a writer and tuned it to write 10,000 events per second to the new cluster on a continuous stream for 12-24 hours. Here’s what I set up

  • 3 node dse 4.60 test cluster with Cassandra 2.0.11
  • created a new table with the DateTieredCompactionStrategy
  • inserter in golang using gocql writing 10,000 inserts per second causing a load of 3 on an 8 core box
  • each insert had a 30 minute TTL with “USING TTL 1800”

I started the test hoping to see a new saw tooth pattern of SSTables on disk and disk space consumed by that column family. Instead to my shock it just kept going up and up all day. I scoured the docs but came up empty. Datastax was kind enough to point out it may be related to the “tombstone_compaction_interval” which defaults to 1 day. That basically means an SSTable won’t be a candidate for deletion until after that date. Once I changed that setting to 1 everything seemed to work like a champ. This was before I set the interval for testing

Screenshot 2015-01-30 16.01.30

after!

Screenshot 2015-01-30 16.02.13

You can also see with DTCS it results in a nice sawtooth on the SSTable count.

Screenshot 2015-01-30 16.04.02

 

For a comparison, below is the exact same test with LeveledCompaction instead of DateTiered. You can see the data volumes just continue to grow.

Screenshot 2015-02-03 08.05.18

Screenshot 2015-02-03 08.08.26

After 24 hours the leveledcompaction table has 10x the data on disk on a single node. Sadness.

Screenshot 2015-02-03 22.13.32

 

**UPDATE**

30 hours later the cluster running the LeveledCompaction 30 minute TTL inserts died a horrible death. It filled up all available disk space on a couple nodes and started crashing the Cassandra process. </endofcluster>

Screenshot 2015-02-04 09.05.01

If you want to come work on high-scale distributed systems, we’re hiring!

http://www.crowdstrike.com/senior-software-engineer/

 

here is a create table script if you’re interested in a test on your own hardware.

  CREATE TABLE IF NOT EXISTS ttl_test (
	id text,
	metatype text,
	event_time timestamp,
	rawbytes blob,
	PRIMARY KEY ((id, metatype), event_time)
) WITH
  CLUSTERING ORDER BY (event_time DESC) AND
  gc_grace_seconds = 0 AND
  compaction={'class': 'DateTieredCompactionStrategy','tombstone_compaction_interval': '1', 'tombstone_threshold': '.01', 'timestamp_resolution':'MICROSECONDS', 'base_time_seconds':'3600', 'max_sstable_age_days':'365'} AND
  compression={'sstable_compression': 'LZ4Compressor'};
session.Query(`INSERT INTO ttl_leveled (id, metatype, event_time, rawbytes) VALUES (?, ?, ?, ?) USING TTL 1800`,
		randomId, metaType, eventTime, rawBytes).Exec()

Leveled Compaction Table Create:

CREATE TABLE ttl_leveled (
  id text,
  metatype text,
  event_time timestamp,
  rawbytes blob,
  PRIMARY KEY ((id, metatype), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC) AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=0 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'tombstone_threshold': '.01', 'tombstone_compaction_interval': '1', 'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};


Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

1 Comment

  1. make sure to experiment with max_sstable_age_days and instead of paying the penalty of USING TTL 1800…set a default_time_ttl for the table.

    Also lots of fixes in 4.7 so make sure to run latest to leverage lots of improvements around dtcs.

    Good luck
    O

Leave a Reply

Your email address will not be published.

*

© 2017 Jim Plush: Blog

Theme by Anders NorenUp ↑