Cassandra Administration

Big Data & Analytics

Course Description

Get Deep Understanding of Apache Cassandra Administration

The open source distributed database management system – Cassandra, is designed to provide high availability with no point of failure while handling massive data sizes across various commodity servers.

24 Hours

The Cassandra training course is designed to build a deep understanding of Apache Cassandra for processing very large volumes of data streaming at high speeds to retrieving valuable insights from this data.

Basic knowledge of Linux

Introduction to Big Data / NoSQL

  • A brief into NoSQL
  • CAP theorem
  • When to use NoSQL
  • Columnar storage
  • NoSQL ecosystem

Cassandra Basics

  • Architecture and Design
  • Cassandra nodes, clusters, datacenters
  • Keyspaces, tables, rows and columns
  • Partitioning, replication, tokens
  • Quorum and consistency levels

Data Modeling basic to advanced

  • A brief into CQL
  • CQL Datatypes
  • Creating keyspaces & tables
  • Choosing columns and types
  • Choosing primary keys
  • Data layout for rows and columns
  • Time to live (TTL)
  • Querying with CQL
  • CQL updates
  • Collections (list / map / set)
  • Creating and using secondary indexes
  • Composite keys (partition keys and clustering keys)
  • Time series data
  • Best practices for time series data
  • Counters
  • Lightweight transactions (LWT)
  • Labs : creating and using indexes; modeling time series data

Cassandra Internals

  • Deep dive into the Cassandra design
  • Sstables, memtables, commit log

Administration Cassandra

  • Hardware selection
  • Cassandra distributions
  • Cassandra Nodes Communication
  • Writing and Reading data to/from the storage engine
  • Data directories
  • Anti-entropy operations
  • Cassandra Compaction
  • Choosing and Implementing compaction strategies
  • Cassandra best practices for garbage collection, composition, etc
  • Troubleshooting tools and tips