This is a summary of System Insights of Cassandra [Youtube]
Apache Cassandra is a NoSQL, column database designed for high write and read throughputs. It was initially created at facebook, and modelled after AWS’s dynamoDB and Googles big query.
Eg: We store protobuf messages in the Cassandra
Key Features:
- Highly available
- Access to data queries drives the modelling of the tables, because you cannot do joins across nodes (you have to store the data in such a way that you can access whatever you need from one node)
- Column store: the rows can have n number/types of columns but they must contain one primary/partition key.
- Inner workings: Cassandra uses LSM tree and SS Table based storage engine. LSM tree is an in-memory balances Binary Search Tree and you can think of SS table as a sorted (in-order snapshot of the BST and a point)
- Partitioning: based on partition key, uses consistent hashing so that their is minimal rebalancing on addition and deduction of nodes.
- Replication: makes the system resilient
- Data store: each row has a Primary Key composed of 0-or more Partition Key and 0 or more composite keys. For example PRIMARY KEY ((app, env), city). The data is…