For BE/B.Tech/BCA/MCA/ME/M.Tech Major/Minor Project for CS/IT branch at minimum price Text Message @ 9424820157

Mphasis Hadoop Interview Questions

  Mphasis Hadoop Interview Questions






1. What are the three modes in which Hadoop can run?
The three modes in which Hadoop can run are :
1) Standalone mode: This is the default mode. It uses the local FileSystem and a single Java process to run the Hadoop services.
2) Pseudo-distributed mode: This uses a single-node Hadoop deployment to execute all Hadoop services.
3) Fully-distributed mode: This uses separate nodes to run Hadoop master and slave services.

2. What are the differences between regular FileSystem and HDFS?
1) Regular FileSystem: In regular FileSystem, data is maintained in a single system. If the machine crashes, data recovery is challenging due to low fault tolerance. Seek time is more and hence it takes more time to process the data.
2) HDFS: Data is distributed and maintained on multiple systems. If a DataNode crashes, data can still be recovered from other nodes in the cluster. Time taken to read data is comparatively more, as there is local data read to the disc and coordination of data from multiple systems.

3. What happens when two clients try to access the same file in the HDFS?
HDFS supports exclusive write only. When the first client contacts the “NameNode” to open the file for writing, the “NameNode” grants a lease to the client to create this file. When the second client tries to open the same file for writing, the “NameNode” will notice that the lease for the file is already granted to another client, and will reject the open request for the second client.

4. What is a checkpoint?
In brief, “Checkpointing” is a process that takes an FsImage, edit log and compacts them into a new FsImage. Thus, instead of replaying an edit log, the NameNode can load the final in-memory state directly from the FsImage. This is a far more efficient operation and reduces NameNode startup time. Checkpointing is performed by Secondary NameNode.

5. How is HDFS fault tolerant? 
When data is stored over HDFS, NameNode replicates the data to several DataNode. The default replication factor is 3. You can change the configuration factor as per your need. If a DataNode goes down, the NameNode will automatically copy the data to another node from the replicas and make the data available. This provides fault tolerance in HDFS.

6. Can NameNode and DataNode be a commodity hardware? 
The smart answer to this question would be, DataNodes are commodity hardware like personal computers and laptops as it stores data and are required in a large number. But from your experience, you can tell that, NameNode is the master node and it stores metadata about all the blocks stored in HDFS. It requires high memory (RAM) space, so NameNode needs to be a high-end machine with good memory space.

7. What does ‘jps’ command do? 
The ‘jps’ command helps us to check if the Hadoop daemons are running or not. It shows all the Hadoop daemons i.e namenode, datanode, resourcemanager, nodemanager etc. that are running on the machine.

8. How do you define “Rack Awareness” in Hadoop?
Rack Awareness is the algorithm in which the “NameNode” decides how blocks and their replicas are placed, based on rack definitions to minimize network traffic between “DataNodes” within the same rack. Let’s say we consider replication factor 3 (default), the policy is that “for every block of data, two copies will exist in one rack, third copy in a different rack”. This rule is known as the “Replica Placement Policy”.

9. What is the purpose of “RecordReader” in Hadoop?
The “InputSplit” defines a slice of work, but does not describe how to access it. The “RecordReader” class loads the data from its source and converts it into (key, value) pairs suitable for reading by the “Mapper” task. The “RecordReader” instance is defined by the “Input Format”.

10. What is a “Combiner”? 
A “Combiner” is a mini “reducer” that performs the local “reduce” task. It receives the input from the “mapper” on a particular “node” and sends the output to the “reducer”. “Combiners” help in enhancing the efficiency of “MapReduce” by reducing the quantum of data that is required to be sent to the “reducers”.

11. What do you know about “SequenceFileInputFormat”?
“SequenceFileInputFormat” is an input format for reading within sequence files. It is a specific compressed binary file format which is optimized for passing the data between the outputs of one “MapReduce” job to the input of some other “MapReduce” job.
Sequence files can be generated as the output of other MapReduce tasks and are an efficient intermediate representation for data that is passing from one MapReduce job to another.


You might like this:

For all other Interview Questions - Click here


Labels:
Mphasis Hadoop Interview Questions, Hadoop interview questions for 5 years experience, Hadoop scenario based interview questions, Hadoop interview questions for 7 years experience, Hadoop technical Interview questions

No comments:

Post a Comment



Please go through below tutorials:


Mule 4 Tutorials

DEPLOY TO CLOUDHUB C4E CLIENT ID ENFORCEMENT CUSTOM POLICY RABBIT MQ INTEGRATION
XML TO JSON WEBSERVICE CONSUMER VM CONNECTOR VALIDATION UNTIL SUCCESSFUL
SUB FLOW SET & REMOVE VARIABLE TRANSACTION ID SCATTER GATHER ROUND ROBIN
CONSUME REST WEBSERVICE CRUD OPERATIONS PARSE TEMPLATE OBJECT TO JSON LOAD STATIC RESOURCE
JSON TO XML INVOKE IDEMPOTENT FILTER FOR EACH FLAT TO JSON
FIXWIDTH TO JSON FIRST SUCCESSFUL FILE OPERATIONS EXECUTE ERROR HANDLING
EMAIL FUNCTIONALITY DYNAMIC EVALUATE CUSTOM BUSINESS EVENT CSV TO JSON COPYBOOK TO JSON
CHOICE ASYNC

Widely used Connectors in Mule 3

CMIS JETTY VM CONNECTOR SALESFORCE POP3
JMS TCP/IP WEBSERVICE CONSUMER QUARTZ MONGO DB
FILE CONNECTOR DATABASE CONNECTOR


Widely used Scopes in Mule 3

SUB FLOW REQUEST REPLY PROCESSOR CHAIN FOR EACH CACHE
ASYNC TCP/IP COMPOSITE SOURCE POLL UNTIL SUCCESSFUL
TRANSACTIONAL FLOW

Widely used Components in Mule 3

EXPRESSION CXF SCRIPT RUBY PYTHON
JAVASCRIPT JAVA INVOKE CUSTOM BUSINESS EVENT GROOVY
ECHO LOGGER


Widely used Transformers in Mule 3

MONGO DB XSLT TRANSFORMER REFERENCE SCRIPT RUBY
PYTHON MESSAGE PROPERTIES JAVA TRANSFORMER GZIP COMPRESS/UNCOMPRESS GROOVY
EXPRESSION DOM TO XML STRING VALIDATION COMBINE COLLECTIONS BYTE ARRAY TO STRING
ATTACHMENT TRANSFORMER FILE TO STRING XML TO DOM APPEND STRING JAVASCRIPT
JSON TO JAVA COPYBOOK TO JSON MAP TO JSON JSON TO XML FLATFILE TO JSON
FIXWIDTH TO JSON CSV TO JSON


Widely used Filters in Mule 3

WILDCARD SCHEMA VALIDATION REGEX PAYLOAD OR
NOT MESSAGE PROPERTY MESSAGE IDEMPOTENT FILTER REFERNCE
EXPRESSION EXCEPTION CUSTOM AND


Exception Strategy in Mule 3

REFERENCE EXCEPTION STRATEGY CUSTOM EXCEPTION STRATEGY CHOICE EXCEPTION STRATEGY CATCH EXCEPTION STRATEGY GLOBAL EXCEPTION STRATEGY


Flow Control in Mule 3

CHOICE COLLECTION AGGREGATOR COLLECTION SPLITTER CUSTOM AGGREGATOR FIRST SUCCESSFUL
MESSAGE CHUNK AGGREGATOR MESSAGE CHUNK SPLITTER RESEQUENCER ROUND ROBIN SOAP ROUTER