Mphasis Hadoop Interview Questions

Mphasis Hadoop Interview Questions

1. What are the three modes in which Hadoop can run?

The three modes in which Hadoop can run are :

1) Standalone mode: This is the default mode. It uses the local FileSystem and a single Java process to run the Hadoop services.

2) Pseudo-distributed mode: This uses a single-node Hadoop deployment to execute all Hadoop services.

3) Fully-distributed mode: This uses separate nodes to run Hadoop master and slave services.

2. What are the differences between regular FileSystem and HDFS?

1) Regular FileSystem: In regular FileSystem, data is maintained in a single system. If the machine crashes, data recovery is challenging due to low fault tolerance. Seek time is more and hence it takes more time to process the data.

2) HDFS: Data is distributed and maintained on multiple systems. If a DataNode crashes, data can still be recovered from other nodes in the cluster. Time taken to read data is comparatively more, as there is local data read to the disc and coordination of data from multiple systems.

3. What happens when two clients try to access the same file in the HDFS?

HDFS supports exclusive write only. When the first client contacts the “NameNode” to open the file for writing, the “NameNode” grants a lease to the client to create this file. When the second client tries to open the same file for writing, the “NameNode” will notice that the lease for the file is already granted to another client, and will reject the open request for the second client.

4. What is a checkpoint?

In brief, “Checkpointing” is a process that takes an FsImage, edit log and compacts them into a new FsImage. Thus, instead of replaying an edit log, the NameNode can load the final in-memory state directly from the FsImage. This is a far more efficient operation and reduces NameNode startup time. Checkpointing is performed by Secondary NameNode.

5. How is HDFS fault tolerant?

When data is stored over HDFS, NameNode replicates the data to several DataNode. The default replication factor is 3. You can change the configuration factor as per your need. If a DataNode goes down, the NameNode will automatically copy the data to another node from the replicas and make the data available. This provides fault tolerance in HDFS.

6. Can NameNode and DataNode be a commodity hardware?

The smart answer to this question would be, DataNodes are commodity hardware like personal computers and laptops as it stores data and are required in a large number. But from your experience, you can tell that, NameNode is the master node and it stores metadata about all the blocks stored in HDFS. It requires high memory (RAM) space, so NameNode needs to be a high-end machine with good memory space.

7. What does ‘jps’ command do?

The ‘jps’ command helps us to check if the Hadoop daemons are running or not. It shows all the Hadoop daemons i.e namenode, datanode, resourcemanager, nodemanager etc. that are running on the machine.

8. How do you define “Rack Awareness” in Hadoop?

Rack Awareness is the algorithm in which the “NameNode” decides how blocks and their replicas are placed, based on rack definitions to minimize network traffic between “DataNodes” within the same rack. Let’s say we consider replication factor 3 (default), the policy is that “for every block of data, two copies will exist in one rack, third copy in a different rack”. This rule is known as the “Replica Placement Policy”.

9. What is the purpose of “RecordReader” in Hadoop?

The “InputSplit” defines a slice of work, but does not describe how to access it. The “RecordReader” class loads the data from its source and converts it into (key, value) pairs suitable for reading by the “Mapper” task. The “RecordReader” instance is defined by the “Input Format”.

10. What is a “Combiner”?

A “Combiner” is a mini “reducer” that performs the local “reduce” task. It receives the input from the “mapper” on a particular “node” and sends the output to the “reducer”. “Combiners” help in enhancing the efficiency of “MapReduce” by reducing the quantum of data that is required to be sent to the “reducers”.

11. What do you know about “SequenceFileInputFormat”?

“SequenceFileInputFormat” is an input format for reading within sequence files. It is a specific compressed binary file format which is optimized for passing the data between the outputs of one “MapReduce” job to the input of some other “MapReduce” job.

Sequence files can be generated as the output of other MapReduce tasks and are an efficient intermediate representation for data that is passing from one MapReduce job to another.

You might like this:

1. TCS Hadoop Interview Questions

2. Infosys Hadoop Interview Questions

3. Wipro Hadoop Interview Questions

4. Tech Mahindra Hadoop Interview Questions

5. HCL Hadoop Interview Questions

6. IBM Hadoop Interview Questions

7. LTI Hadoop Interview Questions

8. Oracle Hadoop Interview Questions

9. Capgemini Hadoop Interview Questions

10. Cognizant Hadoop Interview Questions

11. Accenture Hadoop Interview Questions

12. Hexaware Hadoop Interview Questions

13. DXC Hadoop Interview Questions

14. CSC Hadoop Interview Questions

15. Barclays Hadoop Interview Questions

16. Virtusa Hadoop Interview Questions

17. Deloitte Hadoop Interview Questions

For all other Interview Questions - Click here

Labels:

Mphasis Hadoop Interview Questions, Hadoop interview questions for 5 years experience, Hadoop scenario based interview questions, Hadoop interview questions for 7 years experience, Hadoop technical Interview questions

DEPLOY TO CLOUDHUB	C4E	CLIENT ID ENFORCEMENT	CUSTOM POLICY	RABBIT MQ INTEGRATION
XML TO JSON	WEBSERVICE CONSUMER	VM CONNECTOR	VALIDATION	UNTIL SUCCESSFUL
SUB FLOW	SET & REMOVE VARIABLE	TRANSACTION ID	SCATTER GATHER	ROUND ROBIN
CONSUME REST WEBSERVICE	CRUD OPERATIONS	PARSE TEMPLATE	OBJECT TO JSON	LOAD STATIC RESOURCE
JSON TO XML	INVOKE	IDEMPOTENT FILTER	FOR EACH	FLAT TO JSON
FIXWIDTH TO JSON	FIRST SUCCESSFUL	FILE OPERATIONS	EXECUTE	ERROR HANDLING
EMAIL FUNCTIONALITY	DYNAMIC EVALUATE	CUSTOM BUSINESS EVENT	CSV TO JSON	COPYBOOK TO JSON
CHOICE	ASYNC

CMIS	JETTY	VM CONNECTOR	SALESFORCE	POP3
JMS	TCP/IP	WEBSERVICE CONSUMER	QUARTZ	MONGO DB
FILE CONNECTOR	DATABASE CONNECTOR

SUB FLOW	REQUEST REPLY	PROCESSOR CHAIN	FOR EACH	CACHE
ASYNC	TCP/IP	COMPOSITE SOURCE	POLL	UNTIL SUCCESSFUL
TRANSACTIONAL	FLOW

EXPRESSION	CXF	SCRIPT	RUBY	PYTHON
JAVASCRIPT	JAVA	INVOKE	CUSTOM BUSINESS EVENT	GROOVY
ECHO	LOGGER

MONGO DB	XSLT	TRANSFORMER REFERENCE	SCRIPT	RUBY
PYTHON	MESSAGE PROPERTIES	JAVA TRANSFORMER	GZIP COMPRESS/UNCOMPRESS	GROOVY
EXPRESSION	DOM TO XML	STRING VALIDATION	COMBINE COLLECTIONS	BYTE ARRAY TO STRING
ATTACHMENT TRANSFORMER	FILE TO STRING	XML TO DOM	APPEND STRING	JAVASCRIPT
JSON TO JAVA	COPYBOOK TO JSON	MAP TO JSON	JSON TO XML	FLATFILE TO JSON
FIXWIDTH TO JSON	CSV TO JSON

For BE/B.Tech/BCA/MCA/ME/M.Tech Major/Minor Project for CS/IT branch at minimum price Text Message @ 9424820157

Mphasis Hadoop Interview Questions

No comments:

Post a Comment

Please go through below tutorials:

Mule 4 Tutorials

Widely used Connectors in Mule 3

Widely used Scopes in Mule 3

Widely used Components in Mule 3

Widely used Transformers in Mule 3

Widely used Filters in Mule 3

Exception Strategy in Mule 3

Flow Control in Mule 3

WILDCARD	SCHEMA VALIDATION	REGEX	PAYLOAD	OR
NOT	MESSAGE PROPERTY	MESSAGE	IDEMPOTENT	FILTER REFERNCE
EXPRESSION	EXCEPTION	CUSTOM	AND

CHOICE	COLLECTION AGGREGATOR	COLLECTION SPLITTER	CUSTOM AGGREGATOR	FIRST SUCCESSFUL
MESSAGE CHUNK AGGREGATOR	MESSAGE CHUNK SPLITTER	RESEQUENCER	ROUND ROBIN	SOAP ROUTER