Monday, 25 June 2018

Big Data Hadoop Tutorial Hadoop - Big Data Solutions

Hadoop - Big Data Solutions

by Mumbai ACademics on 04:20 in Big Data Hadoop Tutorial

Big Data Challenges

The major challenges associated with big data are as follows:

Capturing data
Curation
Storage
Searching
Sharing
Transfer
Analysis
Presentation

To fulfill the above challenges, organizations normally take the help of enterprise servers.

Hadoop - Big Data Solutions

Traditional Enterprise Approach

In this approach, an enterprise will have a computer to store and process big data. For storage purpose, the programmers will take the help of their choice of database vendors such as Oracle, IBM, etc. In this approach, the user interacts with the application, which in turn handles the part of data storage and analysis.

Limitation

This approach works fine with those applications that process less voluminous data that can be accommodated by standard database servers, or up to the limit of the processor that is processing the data. But when it comes to dealing with huge amounts of scalable data, it is a hectic task to process such data through a single database bottleneck.

Google’s Solution

Google solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns them to many computers, and collects the results from them which when integrated, form the result dataset.

Hadoop

Using the solution provided by Google, Doug Cutting and his team developed an Open Source Project called HADOOP.

Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel with others. In short, Hadoop is used to develop applications that could perform complete statistical analysis on huge amounts of data.

Hadoop - Introduction to Hadoop

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

Hadoop Architecture

At its core, Hadoop has two major layers namely:

Processing/Computation layer (MapReduce), and
Storage layer (Hadoop Distributed File System).

MapReduce

MapReduce is a parallel programming model for writing distributed applications devised at Google for efficient processing of large amounts of data (multiterabyte data-sets), on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. The MapReduce program runs on Hadoop which is an Apache open-source framework.

Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and provides a distributed file system that is designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. It is highly fault-tolerant and is designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications having large datasets.

Apart from the above-mentioned two core components, Hadoop framework also includes the following two modules:

Hadoop Common : These are Java libraries and utilities required by other Hadoop modules.
Hadoop YARN : This is a framework for job scheduling and cluster resource management.

Monday, 25 June 2018

Hadoop - Big Data Solutions

Big Data Challenges

Hadoop - Big Data Solutions

Traditional Enterprise Approach

Limitation

Google’s Solution

Hadoop

Hadoop - Introduction to Hadoop

Hadoop Architecture

MapReduce

Hadoop Distributed File System

No comments:

Post a Comment

Search This Blog

Categories

About

Contact Form

Popular Posts

Recent

Popular

Comments

Blog Archive

Tags