fbpx Skip to content

Knowledge Byte: What Is Hadoop and How Has It Been Used?


Paulo Guimarães


If you have the remotest interest in expanding your big data knowledge, you must have encountered the word Hadoop before. What is it precisely?

● It is an open-source framework for large scale data processing.

● It allows for large data sets to be processed on commodity hardware, as opposed to high-end servers.

● A typical Hadoop installation can grow as necessary and does not require a large up-front investment.

● It achieves this by splitting the data processing task amongst the available machines and coordinating their action.

Usage Example

An analogy will help in illustrating how Hadoop works. Let’s imagine that we need to count a big bag of beans. We can hire an expert bean counter at $250/hour to do the job for us. We will give the bag to the counter and after some time they will come back with the final result. Let’s say the bag contains 100,000 beans. Alternatively, we can ask a group of ten first graders to do the same job for us. In this case, we will split the bag into ten piles of beans with approximately equal sizes. Each of our pupils then will count their pile and give the results to their teacher. The teacher will total their results and provide us with the final number. If in the process one of the pupils needs to leave, the teacher will split his or her pile amongst the remaining students.

Hadoop works by splitting a large data set into smaller parts, distributing each one of these to be processed on a different machine and then assembling the output from each processing node into a single set of results. In the process, Hadoop manages the fault-tolerance of the cluster and coordinates the actions of individual nodes. For example, if a node fails, the tasks assigned to it will be re-distributed amongst the remaining nodes by Hadoop.


Use Cases


● 532 nodes cluster (8 * 532 cores, 5.3PB)

● Heavy usage of Java MapReduce, Apache Pig, Apache Hive, and Apache HBase

● Used it for search optimization and research


● 100 nodes

● Dual quad-core Xeon L5520@2.13 GHz, 24GB RAM, 8TB storage

● Used for charts calculation, royalty reporting, log analysis, A/B testing, and large scale audio feature analysis over millions of tracks (Source).

Related products to help you upskill

Never miss an interesting article

Get our latest news, tutorials, guides, tips & deals delivered to your inbox.

Please enter your name.
Please enter a valid email address.
Please check the required field.
Something went wrong. Please check your entries and try again.

Keep learning


The Big Data Approach to Hiring Professionals

No company in the world succeeds without the right people. Until a fully capable AI-driven company comes along in the heart of it all, it...

10 Most Helpful Tools For Big Data Professionals

Advanced problems need advanced solutions, this is what we all have heard at some point in time. Businesses also require modern solutions to cater to...
Why Digital Transformation is Crucial for Efficient Marketing

Why Digital Transformation is Crucial for Efficient Marketing

Digital transformation has enabled enterprises to achieve incredible results. Enterprises all over the world are constantly working to achieve end-to-end digital transformation that offers value...

COVID-19 Response

Digital transformation has never been more relevant than today. Until the end of June, we offer all our industry-recognized certification courses for 50% off. Improve your skills and stand out from the crowd.

Scroll To Top