Take Control of Your Big Data with HUE in Cloudera CDH

Working with Big Data is no small task. Jumpstart your Hadoop skills by loading, visualizing, analyzing, and searching your data using Cloudera HUE, the Hadoop User Experience. Take control of your Big Data!
Course info
Rating
(16)
Level
Beginner
Updated
Apr 24, 2017
Duration
2h 54m
Table of contents
Course Overview
Why HUE? The Case for Hadoop User Experience
Getting Your Data Ready: Loading, Processing & Browsing with HUE
Yet More Data Loading & Processing with HUE
Exploring Your Data with Query Editors
Understanding Hidden Insights Using HUE's Interactive Search Dashboards
Automation & Scheduling with Apache Oozie & HUE
Administering, Securing, and Extending Cloudera HUE
Final Takeaway
Description
Course info
Rating
(16)
Level
Beginner
Updated
Apr 24, 2017
Duration
2h 54m
Description

Hadoop is a very complex ecosystem with a potentially pretty steep learning curve to get started from scratch. To make adoption easier, several distributions have been created to integrate all key projects and give a turn-key approach, one of the most popular and complete being Cloudera CDH. In this course, Take Control of Your Big Data with HUE in Cloudera CDH, you'll learn how to leverage Hadoop using a relatable data source. First, you'll explore how to work with the major components of a cluster. Next, you'll discover how to load data into your cluster and how to analyze it with query editors. Finally, you'll go one level beyond with interactive dashboards using HUE. By the end of this course, you'll be able to load, process, and analyze your big data using HUE in Cloudera CDH.

About the author
About the author

Xavier is very passionate about teaching, helping others understand search and Big Data. He is also an entrepreneur, project manager, technical author, trainer, and holds a few certifications with Cloudera, Microsoft, and the Scrum Alliance, along with being a Microsoft MVP.

More from the author
Importing Data: Python Data Playbook
Beginner
1h 35m
17 Nov 2018
More courses by Xavier Morera
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi everyone, my name is Xavier Morera. And welcome to my course, Take Control of Your Big Data with HUE in Cloudera CDH. Have you ever noticed how different it is to work with many of the Hadoop top level projects? For example, if you're using high versus fig, or solar, or even using hdfs, well, HUE provides a view in top of Hadoop that gives you an easy to use and familiar UI, to increase your productivity. In this course, we're going to learn how to leverage Hadoop using a relatable data source. Stack overflows data, which will help us understand programming history. Some of the major topics that we will cover include, working with the major components of a cluster like hdfs, jobs, scheduling and using several platforms like fig, hive, impala, solar, hspace and some spark. We will learn how to load data into our cluster, analyze with a query editors and go one level beyond with interactive dashboards. By the end of this course, you will be able to load, process and analyze your big data using HUE in Cloudera CDH. Before beginning this course you should be familiar with how to either set up a cluster or have a pseudo cluster available. Namely, the QuickStart VN. I hope you'll join me on this journey to learn how to take control of your big data with HUE in Cloudera CDH course at Pluralsight. Let's get started.

Yet More Data Loading & Processing with HUE
Yet more data loading and processing with Hue. In the previous module we started with one of the basic tasks of working with big data, namely data loading into Hadoop with a couple of different mechanisms. We saw how Hue makes working with Hadoop feel like a unified user experience. We learned about a couple of available apps, namely the File Browser, the Document Model and the Pig Query Editor, and while we can see that Apache Pig is good, however, there is something that we need to take into account, which is that you need to learn a new syntax. It is not really out of the ordinary, it's not entirely new, but just to know so there's a certain barrier to entry. And, yes, I know, probably no one has died from a learning curve, but what if there was a different approach to big data, one where instead of getting people to learn something new they could use what they know right now, something like SQL, which is highly popular? And that is when some smart people, mainly from Facebook, got together and created Apache Hive. And, how do they compare? Well, let's see it at a very high level. They both generate MapReduce jobs from a high level language, and you don't need to know the nitty-gritty details about MapReduce. But from a language perspective Pig has a procedural dataflow language, remember, Pig Latin, while Hive is declarative, it feels very familiar, it's easier if you know SQL. And something worth mentioning is that it's not that one was better than the other, you were not forced to choose either Pig or Hive, what happened at some point is that Pig used to be more popular among programmers and researchers, while Hive was more popular among panelists who prefer SQL as their weapon of choice.

Exploring Your Data with Query Editors
Exploring Your Data with Query Editors. Relational databases have been incredibly successful for many years, with millions of developers and analysts working on a daily basis with data. And let me add that with the proliferation of new web technologies, many of them powered by JavaScript, allowed the creation of rich and interactive web applications that provided access to an even broader audience that only needs a modern browser to work with these applications. And so we have the HUE Query Editors, a set of apps that let you manage, wrangle, query, explore, understand, and in general work with your big data from a very friendly environment. So it's time to pick up where we left off and see where we can go with the HUE Query Editors.

Understanding Hidden Insights Using HUE's Interactive Search Dashboards
Understanding Hidden Insight In Your Data Using HUE's Search Interactive Dashboards. There is a phrase that I use from time to time, when words are not enough for an idea to be communicated properly, and that phrase is "a picture is worth a thousand words. " Just imagine looking at a very long file, maybe in Excel, maybe just a TSV, and you're trying to figure out what all this data means. Well, it probably is pretty hard, as analyzing data can be a difficult task. You might be presented with a whole bunch of numbers, and fail to find a correlation, missing the opportunity to discover a trend hidden within your data. But visualizing your data can make a huge difference. Analyzing a graph can make it easier to visualize and correlate, especially if it's an interactive dashboard when you can drill down, filter, group, and in general refine your data. In this module, I will start by covering visualizations with parameters, and make way into one of the nicest features of HUE, interactive Search dashboards powered by Solr, which is what we also know as Cloudera Search.

Automation & Scheduling with Apache Oozie & HUE
Automation and Scheduling with Apache Oozie and HUE. A job is a job, it gets things done, and even though you can write a program that can create several Hadoop jobs, there are times that you need to have more than this. You might need a sequence of actions done. For example, a job that after it's completed, an email is sent or a pic script that extracts some data that then you want to process with Hive. Also, you might want certain actions to be taken based on output conditions, and beyond that, you may want them to be triggered based on a schedule. For all I know, you may even want several workflows to work together as a group. In this module, I will introduce you to the world of Oozie, Hadoop's workflow scheduler, which if you have ever worked with Oozie, you might know that it can be a little complex. However, life is different now with HUE, as it brings all this complexity down to earth by providing a great user experience with editors and dashboards. Let's see.

Administering, Securing, and Extending Cloudera HUE
Administering, securing and extending Cloudera HUE. So far our main focus has been how you can leverage Hue's apps to work with the many projects from the Hadoop ecosystem making it easy and enjoyable to work with your big data. You need to work with Hive, there's a Hue app for that. Check the tables in the Metastore, Hue got you covered. Want an interactive dashboard? Hue has also an app for that, the Solr dashboard. Now we will spend some time on some other topics that are important when you're working with Hue. So, we will have an overview on user administration, security, configuration, UI customization, logs, scalability and finally, extensibility. Let's get started.