Cassandra Interview Questions
1. Explain what is Cassandra?
Cassandra is an open-source data storage system developed at Facebook for inbox search and designed for storing and managing large amounts of data across commodity servers.
2. What is the use of Cassandra and why to use Cassandra?
Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure. The various factors responsible for using Cassandra are:
It is fault-tolerant and consistent
Gigabytes to petabytes scalabilities
It is a column-oriented database
No single point of failure
No need for separate caching layer
Flexible schema design
It has flexible data storage, easy data distribution, and fast writes
It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties
Multi-data center and cloud capable
Data compression
3. Explain what is composite type in Cassandra?
In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type:
Row Key
Column Name
4. How Cassandra stores data?
All data is stored as bytes. When you specify validator, Cassandra ensures those bytes are encoded as per requirement then comparator orders the column based on the ordering specific to the encoding while composite is just byte arrays with a specific encoding, for each component, it stores a two-byte length followed by the byte encoded component followed by a termination bit.
5. Mention what are the main components of the Cassandra Data Model?
The main components of Cassandra Data Model are:
Cluster
Keyspace
Column
Column & Family
6. Explain what is a column family in Cassandra?
Column family in Cassandra is referred for a collection of Rows.
7. Explain what is a cluster in Cassandra?
A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container that arranges the nodes in a ring format and assigns data to them. These nodes have a replica that takes charge in case of data handling failure.
8. List out the other components of Cassandra?
The other components of Cassandra are:
Node
Data Center
Cluster
Commit log
Mem-table
SSTable
Bloom Filter
9. Explain what is a keyspace in Cassandra?
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consists of one keyspace per node.
10. What is the syntax to create keyspace in Cassandra?
Syntax for creating keyspace in Cassandra is:
CREATE KEYSPACE <identifier> WITH <properties>
11. Mention what are the values stored in the Cassandra Column?
In Cassandra Column basically, there are three values:
Column Name
Value
Time Stamp
12. Mention when you can use Alter keyspace?
ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.
13. Explain what is Cassandra-Cqlsh?
Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:
Define a schema
Insert a data and
Execute a query
14. Mention what does the shell commands “Capture” and “Consistency” determines?
There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” display the current consistency level or set a new consistency level.
15. What is mandatory while creating a table in Cassandra?
While creating a table primary key is mandatory, it is made up of one or more columns of a table.
16. Mention what needs to be taken care while adding a Column?
While adding a column you need to take care that the
Column name is not conflicting with the existing column names
Table is not defined with compact storage option
17. Mention what is Cassandra- CQL collections?
Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in following ways:
List: It is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements)
SET: It is used for group of elements to store and returned in sorted orders (holds repeating elements)
MAP: It is a data type used to store a key-value pair of elements
18. Explain how Cassandra writes data?
Cassandra writes data in three components:
Commitlog write
Memtable write
SStable write
Cassandra first writes data to a commit log and then to an in-memory table structure memtable and at last in SStable
19. Explain what is Memtable in Cassandra?
Cassandra writes the data to a in memory structure known as Memtable
It is an in-memory cache with content stored as key/column
By key Memtable data are sorted
There is a separate Memtable for each ColumnFamily, and it retrieves column data from the key
20. Explain what is SStable consist of?
SStable consist of mainly 2 files:
Index file ( Bloom filter & Key offset pairs)
Data file (Actual column data)
21. Explain what is Bloom Filter is used for in Cassandra?
A bloom filter is a space efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra it is used to save IO when performing a KEY LOOKUP.
22. Explain how Cassandra writes changed data into commitlog?
Cassandra concatenate changed data to commitlog
Commitlog acts as a crash recovery log for data
Until the changed data is concatenated to commitlog write operation will be never considered successful
Data will not be lost once commitlog is flushed out to file
23. Explain how Cassandra delete Data?
SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.
24. What is CQLSH? And why is it used?
Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:
Define a schema
Insert a data, and
Execute a query
25. What is a YAML file in Cassandra?
The cassandra.yaml file is the main configuration file for Cassandra. After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.
26. What are durable writes?
Durable Writes provides a means to instruct Cassandra whether to use commitlog for updates on the current KeySpace or not.
This option is not mandatory. The default value for durable writes is TRUE.
27. Differentiate between Static and Dynamic CQL Tables.
A Static Table uses a relatively static set of column names and is similar to Relational Database Table.
A dynamic table allows you to pre-compute result sets and stores them in a single row for efficient data retrieval.
28. Differentiate between Drop and Truncate in CQLSH
The Drop table command drops specified table including all the data from the keyspace.
The Truncate table command is used to truncate a table and deletes all the rows of the table permanently.
29. What is Gossip Protocol?
Gossip Protocol in Cassandra is a peer-to-peer communication protocol in which nodes can choose among themselves with whom they want to exchange their state information. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster.
No comments:
Post a Comment