request a call

What will you get?

15+ hours of high quality video content

One on one session with a personal mentor

75+ hours of labs and projects

Certification and job assistance

No constraint of time and place


  • Introduction to Big Data and Hadoop
      Introduction to Big Data
    • What is Big Data?
    • What is Big Data problem?
    • Why we shouldn't ignore Big Data?
    • 4 V's
      Introduction to Hadoop
    • What is Hadoop?
    • History of Hadoop
    • Basic Block Diagram
    • Hadoop 1.0 - Architecture, Characteristics
    • Problems with Hadoop 1.0
    • Hadoop 2.x - Architecture, Characteristics
    • Difference between Hadoop 1. 0 and Hadoop 2.x
  • Introduction to HDFS
    • Features
    • Architecture
    • Data Storage Unit - Block
    • Using HDFS practically
    • HDFS Fault Tolerance (Hadoop 1.0)
    • HDFS High Availability (Hadoop 2.x)
    • Scaling HDFS horizontally - HDFS Federation (Hadoop 2.x)
    • Case Study - HDFS in industry
  • Hadoop File Systems
    • Various implementations using URI, and other tools
    • File system in userspace
    • Data Flows
    • Parallel Copying
    • Keeping HDFS cluster balanced
    • Hadoop Archives
  • Hadoop IO and Avro
    • Why is Hadoop I/O needed?
    • Data Integrity
    • Compression in HDFS
    • Serialization
    • Serialization Frameworks
    • Apache Avro
  • Introduction to MR
    • MR Architecture and terminology (MRV1)
    • MR Architecture and terminology (MRV2)
    • Difference Between MRV1 and MRV2
    • Writing simple MR programs
    • Understanding the concept of Key, Value using examples
    • Data Flow in MR
    • Executing MR Jobs
    • HDFS + MR (Interaction between MR and HDFS)
    • Different MR setups
    • Case Study
  • Dive in MR
    • Detailed study of MapReduce 1.0 (Classic)
    • Problems with Classic MapReduce
    • Detailed study of YARN (MapReduce 2)
    • Handling Failures in MR V1 and MR V2
    • Different schedulers - FIFO Scheduling (MR1), Fair Scheduler (MR2), Capacity Scheduler(MR2)
    • Task Execution- Task Execution Environment, Speculative Execution, Output Committers, Task JVM Reuse, Skipping Bad Records
    • MapReduce Types and Formats - Input Formats, Output Formats
    • MapReduce Features- Counters, Sorting, Joins
  • Features of MR and Oozie
    • What is workflow?
    • Fundamentals of workflow management
    • Introduction to Apache Oozie
    • Writing and Understanding flows
    • Case Study
  • Apache Flume
    • Introduction to Flume
    • Flume Architecture
    • Flume Events
    • Flume Models
    • Data Flow
    • Flume Goals
    • Case Study
  • Apache Sqoop
    • Background
    • What is structured data?
    • Fundamentals of Apache Sqoop
    • Sqoop Import and Export
    • Installation and Using Sqoop Practically
    • Sqoop 1 vs Sqoop 2
    • Case Study
  • Apache Hive
    • Background
    • Origins of HIVE
    • What is HIVE?
    • Hive : System Architecture and Components
    • Hive Data Models
    • Introduction to Hive QL
    • Installation and Practical Hands on Hive Limitations
    • Case Study
  • Apache Pig
    • Need of PIG
    • Pig vs MR
    • What is PIG?
    • Use case of PIG
    • Where NOT to use PIG?
    • Architecture of PIG
    • Data models of Pig
    • Pig Commands
    • Installation and Use of Pig Commands
    • UDF in Pig
    • Pig vs SQL, Pig vs HIVE
    • Committers of Pig
    • Case Study
    • Performance Tradeoff
  • Apache Hbase and NoSQL
    • Background
    • Fundamentals of NoSQL
    • Four emerging NoSQL Categories
    • Introduction to HBase
    • HBase Vs RDBMS
    • HBase Architecture
    • HBase Data Models
    • Installation and Using HBase
    • HBase Web UI
    • Case Study
  • Apache Zookeeper
  • 3 Projects on MR

Sample Content

  • Module 1: Big Data and Hadoop

  • Module 2:Hadoop distributed file systems

  • Module 3: File systems in Hadoop


Analyse NBA game records for detailed insight Industry / Area - Sports

About NBA statistics - In professional basketball, the most commonly used statistical benchmark for comparing the overall value of players is called efficiency. It is a composite basketball statistic that is derived from basic individual statistics: points, rebounds, assists, steals, blocks, turnovers and shot attempts. The efficiency stat, in theory, accounts for both a player's offensive contributions (points, assists) and his defensive contributions (steals, blocks), but it is generally thought that efficiency ratings favor offense-oriented players over those who specialize in defense, as defense is difficult to quantify with currently tabulated statistics. The NBA statistics is all about finding efficiency of teams, players, coaches etc.
Data - The sample data provides details about coaches, players, matches, goals, teams, seasons etc. The specifications of terms used in this data can be understood using following wiki link

Purchasing Power Parity data analysis Industry / Area - Economics

About PPP - PPP is an economic theory that estimates the amount of adjustment needed on the exchange rate between countries in order for the exchange to be equivalent to each currency's purchasing power.
Data - The sample data provides details about PPP for every country for 25 years. The file PPP_data.dat, contains 4 fields - Country Name specifies the name of Country; Country ID gives unique ID for every country; Year specifies the year of recording; PPP indicates the rate for that year.

City population by sex, city and city type. Industry / Area - Demographic, Govt and Social

About project - The United Nations Statistics Division collects, compiles and disseminates official demographic and social statistics on a wide range of topics. Data have been collected since 1948 through a set of questionnaires dispatched annually to over 230 national statistical offices and have been published in the Demographic Yearbook collection. The Demographic Yearbook disseminates statistics on population size and composition, births, deaths, marriage and divorce, as well as respective rates, on an annual basis. The Demographic Yearbook census datasets cover a wide range of additional topics including economic activity, educational attainment, household characteristics, housing characteristics, ethnicity, language, foreign-born and foreign population.
Data - The data available in file unsd-citypopulation-year-fm.csv contains the city population data for various countries over a wide period. The file gives details for both the sexes. There are 11 fields in the file. The important ones are: Country, Year, Area, Sex, City, City Type, Source Year and Value.

Trade and Transport data analysis. Industry / Area - Govt and Transportation

About project - UN/LOCODE, the United Nations Code for Trade and Transport Locations, is a geographic coding scheme developed and maintained by United Nations Economic Commission for Europe (UNECE). UN/LOCODE assigns codes to locations used in trade and transport with functions such as seaports, rail and road terminals, airports, Postal Exchange Office and border crossing points.
Data - The sample data being used in this project, finds its source at UNECE. There are five files in this data set, namely - code-list, country-codes, function-classifiers, subdivision-codes, and status-indicators. Please refer the given link to understand the entries in the data sets:

Standard and Poor’s 500 stock market index analysis. Industry / Area - Finance

About S&P - The S&P 500 stock market index, maintained by S&P Dow Jones Indices, comprises 502 common stocks issued by 500 large-cap companies and traded on American stock exchanges, and covers about 75 percent of the American equity market by capitalization. The index is weighted by free-float market capitalization, so more valuable companies account for relatively more of the index. The index constituents and the constituent weights are updated regularly using rules published by S&P Dow Jones Indices. Although the index is called the S&P "500", the index contains 502 stocks because the S&P decided to include both of the two different classes of shares that were created from the stock split by Google in March 2014 and Discovery Communications in July 2014, and will contain 505 stocks in September 2015.
Data - Detailed information on the S&P 500 can be obtained from its official webpage on the Standard and Poor's website - it's free but registration is required. There are two files in this dataset - 1. constituents list and 2. constituents financials.

Research Team

You could say there are 3 different kinds of contributors to this team. The first kind are computer science researchers from across the country. Meenal, who is pursuing her PHD in High Performance Computing has led a team of more than 5 research faculty members from across the country and created the core content of the course. The second type of contributors are experienced Hadoop trainers. Spearheaded by Venkat, who has been one of the early experts of Hadoop in India. Venkat has reviewed, tested and improved the work done by the research team over the months by working closely with them. Venkat is now a senior faculty for live sessions and project guidance. The third and most important contributors are a network of more than 50 industry professionals from across the globe who have reviewed and contributed to this course through case-studies and practical problem solving. It is because of this network our course offers the most industry - relevant projects.

How do you want to learn?

This is the most comprehensive course that covers all the topics as per the latest syllabus.
  • Self-Paced Course 6000.0

    You will get the following content for the hadoop course

    • 15+ hours of Video lectures (online & DVD)
    • Study Material in the e-books for 13 modules
    • Complete Doubt solving and faculty guidance via mail


  • Guided Course 15000.0

    This package includes the complete hadoop course with an added advantage of job assistance.

    • 120+ hours of learning experience
    • 15+ hours of video lectures (online & DVD)
    • Study Material in e-books for 13 modules
    • Personal mentor faculty to guide you through the lab and project work
    • 13 hours of one-on-one interaction time with faculty
    • 35 Labs and Mini Projects
    • 3 Major Projects on real industry data
    • Completely flexible, faculty interaction scheduling
    • Job assistance and Interview preparation


Why should you learn from ufaber?

Other Classes Ufaber course
Video content You will get nothing or screen capture videos High production quality, graphics videos
Faculty student ratio Minimum 1-15 1 to 1 learning
Interaction in live class Maximum 5% 100% interactive classes
Pace of learning Very rigid system, scheduled classes Completely customized, as per your needs
Quality of Projects Simple, standard copy book data sets Industry data sets, continuously updated
Faculty answerability Limited, freelance faculty Dedicated full time researchers
Mentoring Not possible at all Continuous mentoring, at every stage of the course
Ultimate Course offering Certificate Job Readiness
Placement assistance Negligible From resume to interview preparation


  • Raghava Chandra
    Associate Technical Consultant at 42Hertz

    The schedule and the process of learning is fairly planned based on my timings. There is quick and good response on clarification of my doubts in my assignments. Overall, I'm happy with the course.
  • Aniket Mazumder
    Technical Lead at Ericson

    uFaber course content is very organised, and each and every topic in Hadoop is nicely explained. Instructor (Meenal Borkar) is very supportive and she has good in-depth knowledge of Hadoop concepts.

Frequently Asked Questions

    Is the hype around Big Data and Hadoop justified?
    Hadoop is no more just a hype it is reality. It is touted to be a 50 billion market in a few years. That will be the fastest any industry has ever grown. Every industry which has data in any process of its operation will sooner or later move to Big Data capability.
  • How is Hadoop currently used in the industry?
    Currently Hadoop is adapted in two broad kind of scenarios by companies.
    a) Where Big data is a problem: These are companies where the volume of data itself is huge or the data is very unstructured and the thus associated costs were always a problem. Hadoop here enters and allows such companies to either scale up their data capability in cost effective manner or does data crunching and cleaning so it allows only relevant data to be stored.
    b) Where Big data brings in opportunities:These are companies deal with massive data but now want to use that data to benefit their business. They could use Hadoop installations for customer analytics, predictive modeling, recommendation engines, process optimization etc.
  • What kind of industries and sectors are using Hadoop?
    • Customer Risk Analysis; Fraud Detection; Market Risk Modeling; Trade Performance Analytics
    • Energy and Sciences
    • Genome sequencing, Weather analysis, prediction; Utilities and Power Grid; Biodiversity Indexing; Network Failures; Seismic Data analysis
    • Retail and Manufacturing
    • Customer Churn; Brand and Sentiment Analysis; Point of Sales; Pricing Models ; Customer Loyalty; Targeted Offers
    • Web, ecommerce, Social networking
    • Online Media ; Mobile; Online Gaming; Search Quality
    Is this a online video conferencing course?
    Live sessions through Video conferencing is one of the many methods used in this course.
  • What other methods are used in this course? How does the course work?
    This course has 4 kinds of learning resources, high quality concept videos, live online lectures, lab sessions and Project work.
  • Are the live lectures one on one?
    Yes, live lectures are one on one interaction between you and the faculty.
  • What is the frequency of live lectures?
    Live lectures are scheduled as per the needs of the student. They can be scheduled after a thery video or to guide you through a lab session or to help you whenever you get stuck while doing a project.
  • Can the videos be referred at a later stage?
    Yes! all lectures, interaction and lab sessions are available for later viewing.
  • How are lab sessions handled in this course?
    Lab means, when you run codes on your own Hadoop cluster. You will always be given recorded video guides of lab assignments. You will then be given a new data set to work on your own Hadoop Cluster. During the lab session, your faculty/mentor will have discussions with you whenever needed.
  • How is Project work handled?
    For a project, you shall be given a real data set, and project-problem statement. You would also be told the various stages of the project. While you are working on the project, you will submit out.
  • How is Project work handled?
    For a project, you shall be given a real data set, and project-problem statement. You would also be told the various stages of the project. While you are working on the project, you will submit outputs at every stage for evaluation. Wherever needed, the faculty will hold a live session for a discussion.
  • What is different about the Projects in this course?
    This course has taken industry case studies and very practical problems as the project statements. Most of them have been contributed by industry experts from across the world. While you are reading this, our faculty team is continuously working to find better case studies and projects for you from professional networks.
  • How is this course better in terms of practical experience?
    The projects you will do with us are never simple bookish problems. These data sets are very recent and collated from professional network and are continuously upgraded. We give you the most challenging and relevant exposure in Hadoop, just like your recruited would want you to know.
  • What are the kind of profiles who can take this course?
    CSE graduates, Senior Architects, Data Warehousing professionals and Java Developers.
  • Does this course need any prior knowledge of Java?
    This course can be done by people with knowledge of Java or Python, with equal ease. The MapReduce layer of hadoop needs a different kind of programming paradigm called Functional Programming and it is nothing like the regular OOP in java and hence it's a level ground for everybody. If there is an extra reading required, we would provide you additional material.
  • Can a person with no prior knowledge about NoSQL go for this course?
    Yes. Our expert would cover essential parts of NoSQL and other database types when necessary.
  • Can this course be taken by someone who wants to pursue purely the admin domain?
    Yes! This is a complete professional Hadoop Engineer course. IT covers all fundamentals essential for both Administrator and Developer profiles. You could say that for Administrators some modules will be more important than others and vice versa for developes.
  • Is there a certification provided in this course?
    Yes. You shall be given a certificate of achievement by Ufaber after successfully completing the course, however our focus is more on making you job ready and shining your experience with our projects, which we have curated from across the world.
    What are the kind of jobs available in Big Data Hadoop?
    Currently indsutries are looking for the following profiles - Hadoop Admin - Hadoop Architect - MapReduce Developer - Data Scientist
  • Is there any job assistance provided after this course?
    Ufaber provides complete job support by CV review and feedback Job alerts and openings Interview preparation
  • What are the companies in India that have Hadoop teams and departments?
    Software companies with offices in India have all developed functional departments and are expanding at a rapid rate additionally all big data intensive big companies from telecom , banking and retails sector have started doing that Infosys,Wipro, IBM, TCS, Mahindra tech, Microsoft, Google, Amazon etc Airtel, Vodafone, Reliance telecom Future group, Tata retails, Reliance retail Banks, Analytics companies, Finance and Insurance companies