For BE/B.Tech/BCA/MCA/ME/M.Tech Major/Minor Project for CS/IT branch at minimum price Text Message @ 9424820157

Cassandra Interview Questions

 Cassandra Interview Questions



1. Explain what is Cassandra?

Cassandra is an open-source data storage system developed at Facebook for inbox search and designed for storing and managing large amounts of data across commodity servers.


2. What is the use of Cassandra and why to use Cassandra?

Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure.  The various factors responsible for using Cassandra are:

It is fault-tolerant and consistent

Gigabytes to petabytes scalabilities

It is a column-oriented database

No single point of failure

No need for separate caching layer

Flexible schema design

It has flexible data storage, easy data distribution, and fast writes

It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties

Multi-data center and cloud capable

Data compression


3. Explain what is composite type in Cassandra?

In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type:

Row Key

Column Name


4. How Cassandra stores data?

All data is stored as bytes. When you specify validator, Cassandra ensures those bytes are encoded as per requirement then comparator orders the column based on the ordering specific to the encoding while composite is just byte arrays with a specific encoding, for each component, it stores a two-byte length followed by the byte encoded component followed by a termination bit.


5. Mention what are the main components of the Cassandra Data Model?

The main components of Cassandra Data Model are:

Cluster

Keyspace

Column

Column & Family


6. Explain what is a column family in Cassandra?

Column family in Cassandra is referred for a collection of Rows.


7. Explain what is a cluster in Cassandra?

A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container that arranges the nodes in a ring format and assigns data to them.  These nodes have a replica that takes charge in case of data handling failure.


8. List out the other components of Cassandra?

The other components of Cassandra are:

Node

Data Center

Cluster

Commit log

Mem-table

SSTable

Bloom Filter


9. Explain what is a keyspace in Cassandra?

In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consists of one keyspace per node.


10. What is the syntax to create keyspace in Cassandra?

Syntax for creating keyspace in Cassandra is:

CREATE KEYSPACE <identifier> WITH <properties>


11. Mention what are the values stored in the Cassandra Column?

In Cassandra Column basically, there are three values:

Column Name

Value

Time Stamp


12. Mention when you can use Alter keyspace?

ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.


13. Explain what is Cassandra-Cqlsh?

Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:

Define a schema

Insert a data and

Execute a query


14. Mention what does the shell commands “Capture” and “Consistency” determines?

There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” display the current consistency level or set a new consistency level.


15. What is mandatory while creating a table in Cassandra?

While creating a table primary key is mandatory, it is made up of one or more columns of a table.


16. Mention what needs to be taken care while adding a Column?

While adding a column you need to take care that the

Column name is not conflicting with the existing column names

Table is not defined with compact storage option


17. Mention what is Cassandra- CQL collections?

Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in following ways:

List: It is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements)

SET: It is used for group of elements to store and returned in sorted orders (holds repeating elements)

MAP: It is a data type used to store a key-value pair of elements


18. Explain how Cassandra writes data?

Cassandra writes data in three components:

Commitlog write

Memtable write

SStable write

Cassandra first writes data to a commit log and then to an in-memory table structure memtable and at last in SStable


19. Explain what is Memtable in Cassandra?

Cassandra writes the data to a in memory structure known as Memtable

It is an in-memory cache with content stored as key/column

By key Memtable data are sorted

There is a separate Memtable for each ColumnFamily, and it retrieves column data from the key


20. Explain what is SStable consist of?

SStable consist of mainly 2 files:

Index file ( Bloom filter & Key offset pairs)

Data file (Actual column data)


21. Explain what is Bloom Filter is used for in Cassandra?

A bloom filter is a space efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra it is used to save IO when performing a KEY LOOKUP.


22.  Explain how Cassandra writes changed data into commitlog?

Cassandra concatenate changed data to commitlog

Commitlog acts as a crash recovery log for data

Until the changed data is concatenated to commitlog write operation will be never considered successful

Data will not be lost once commitlog is flushed out to file


23. Explain how Cassandra delete Data?


SSTables are immutable and cannot remove a row from SSTables.  When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.


24.  What is CQLSH? And why is it used?

Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:

Define a schema

Insert a data, and

Execute a query


25. What is a YAML file in Cassandra?

The cassandra.yaml file is the main configuration file for Cassandra. After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.


26. What are durable writes?

Durable Writes provides a means to instruct Cassandra whether to use commitlog for updates on the current KeySpace or not.

This option is not mandatory. The default value for durable writes is TRUE.


27. Differentiate between Static and Dynamic CQL Tables.

A Static Table uses a relatively static set of column names and is similar to Relational Database Table.

A dynamic table allows you to pre-compute result sets and stores them in a single row for efficient data retrieval.


28. Differentiate between Drop and Truncate in CQLSH

The Drop table command drops specified table including all the data from the keyspace.

The Truncate table command is used to truncate a table and deletes all the rows of the table permanently.


29. What is Gossip Protocol?

Gossip Protocol in Cassandra is a peer-to-peer communication protocol in which nodes can choose among themselves with whom they want to exchange their state information. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster.


No comments:

Post a Comment



Please go through below tutorials:


Mule 4 Tutorials

DEPLOY TO CLOUDHUB C4E CLIENT ID ENFORCEMENT CUSTOM POLICY RABBIT MQ INTEGRATION
XML TO JSON WEBSERVICE CONSUMER VM CONNECTOR VALIDATION UNTIL SUCCESSFUL
SUB FLOW SET & REMOVE VARIABLE TRANSACTION ID SCATTER GATHER ROUND ROBIN
CONSUME REST WEBSERVICE CRUD OPERATIONS PARSE TEMPLATE OBJECT TO JSON LOAD STATIC RESOURCE
JSON TO XML INVOKE IDEMPOTENT FILTER FOR EACH FLAT TO JSON
FIXWIDTH TO JSON FIRST SUCCESSFUL FILE OPERATIONS EXECUTE ERROR HANDLING
EMAIL FUNCTIONALITY DYNAMIC EVALUATE CUSTOM BUSINESS EVENT CSV TO JSON COPYBOOK TO JSON
CHOICE ASYNC

Widely used Connectors in Mule 3

CMIS JETTY VM CONNECTOR SALESFORCE POP3
JMS TCP/IP WEBSERVICE CONSUMER QUARTZ MONGO DB
FILE CONNECTOR DATABASE CONNECTOR


Widely used Scopes in Mule 3

SUB FLOW REQUEST REPLY PROCESSOR CHAIN FOR EACH CACHE
ASYNC TCP/IP COMPOSITE SOURCE POLL UNTIL SUCCESSFUL
TRANSACTIONAL FLOW

Widely used Components in Mule 3

EXPRESSION CXF SCRIPT RUBY PYTHON
JAVASCRIPT JAVA INVOKE CUSTOM BUSINESS EVENT GROOVY
ECHO LOGGER


Widely used Transformers in Mule 3

MONGO DB XSLT TRANSFORMER REFERENCE SCRIPT RUBY
PYTHON MESSAGE PROPERTIES JAVA TRANSFORMER GZIP COMPRESS/UNCOMPRESS GROOVY
EXPRESSION DOM TO XML STRING VALIDATION COMBINE COLLECTIONS BYTE ARRAY TO STRING
ATTACHMENT TRANSFORMER FILE TO STRING XML TO DOM APPEND STRING JAVASCRIPT
JSON TO JAVA COPYBOOK TO JSON MAP TO JSON JSON TO XML FLATFILE TO JSON
FIXWIDTH TO JSON CSV TO JSON


Widely used Filters in Mule 3

WILDCARD SCHEMA VALIDATION REGEX PAYLOAD OR
NOT MESSAGE PROPERTY MESSAGE IDEMPOTENT FILTER REFERNCE
EXPRESSION EXCEPTION CUSTOM AND


Exception Strategy in Mule 3

REFERENCE EXCEPTION STRATEGY CUSTOM EXCEPTION STRATEGY CHOICE EXCEPTION STRATEGY CATCH EXCEPTION STRATEGY GLOBAL EXCEPTION STRATEGY


Flow Control in Mule 3

CHOICE COLLECTION AGGREGATOR COLLECTION SPLITTER CUSTOM AGGREGATOR FIRST SUCCESSFUL
MESSAGE CHUNK AGGREGATOR MESSAGE CHUNK SPLITTER RESEQUENCER ROUND ROBIN SOAP ROUTER