Taking a complete backup of Cassandra DB

Since there is no direct way to take the DB backup of entire Cassandra with all the key spaces, this blog will help.

Taking a complete backup of Cassandra DB

Cassandra by default provides a way to take backups of individual keyspaces with the following command.

COPY table_name to 'table_name.csv' WITH HEADER=TRUE

Doing this for for an entire Cassandra cluster with multiple keyspaces (tables) will be difficult. Following set of commands can make this easier for you

First, create a folder to store the backup CSV files

mkdir cassandrabkp

cd into the folder

cd cassandrabkp

Assuming that you are using bash, and you have installed cqlsh and sed, the following command will find the tables and use them one by one to create a backup CSV file for each keyspace.

for i in $(cqlsh 192.168.134.132 -e "DESCRIBE SCHEMA" | grep "TABLE" | sed 's/CREATE TABLE //g' | sed 's/ (//g') ; do echo "cqlsh 192.168.134.132 -e \"COPY $i to '$i.csv' WITH HEADER=TRUE\"";done | bash

The above process will take some time depending on the size.

Now you can cd out of the folder and then compress it to save space. Since it is just texts, from my experience, you can compress a 13GB backup folder to 2.5GB

cd ..

tar -cvzf cassandrabkp.tar.gz cassandrabkp/

You can extract it later with the following command

tar -xvzf cassandrabkp.tar.gz