Hortonworks Developer Course - Hive and Pig

the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
Tomorrow is my last day in the Hive and Pig Developer Course from Hortonworks. This is the third course I have taken from Hortonworks (work has been paying) and I have to say this is by far my favorite one. I have completed the Administration course and Data Science course. In this class you learn how to ingest and process data on the Hortonworks Data Platform. No prior Hadoop knowledge is assumed, but I will say having a background in Hadoop Administration is a huge help.

The course is spread over four days and they pack a lot in, but I didn't feel that it was force fed to me. It is 50% lecture with 50% hands on labs. Your data sets are small (largest is 138MBs), but you don't need huge data sets to get what is going on. Day One consisted of the standard Hadoop introduction and the basics of map reduce. Having completed the Administration course I figured this would be a review, but they delve deeper into the inner workings of Hadoop and that definitely opened my eyes to a number of things. Also, you receive a brief (very brief) introduction to Flume and also cover a fair amount of Sqoop.

Day Two is major coverage of Pig. Mainly, Pig is meant for ETL of data and is meant to take any (basically) data you have and format it for ingestion someplace else (Hive, but you could get send it elsewhere). You can perform analysis with Pig, but there is a bit of a learning curve (not much) as you would be using Pig Latin in order to perform any ETL or analysis. I dealt with Pig in the Data Science course, but this was the true introduction. We covered all the major facets of Pig and performed a number of operations.

Day Three is when you dive into Hive. My class literally only had one other person in it (besides the instructor) and I was interested in Oozie so we did that in the morning. Hive is your "data warehouse" and you utilize an SQL like language in order to perform analysis of data. You can utilize Pig to get data into Hive, but in most cases you are dealing with data from a database and thus you can Sqoop it directly into Hive. I've picked up a lot of Hive on my own, but I am extremely happy I took this course because I found that I have ultimately been doing things in a fashion that is not optimized for my data.

Day Four is some continued Hive along with more Oozie and a few other items.

If you are looking to get into Hadoop and will be utilize Hortonworks I highly suggest taking their courses. I'd suggest the following if you are able to go down the track -

HDP Operations Course
HDP Security Course - this hasn't been officially released, but will 1st Quarter of 2016
HDP Hive and Pig Course
HDP Data Science - definitely a great overview of Data Science, but I wouldn't say you'd have to take it

My aim is to take their Spark with Python course which is suppose to be released in January. Feel free to ask any questions!
WIP:
PHP
Kotlin
Intro to Discrete Math
Programming Languages
Work stuff
Sign In or Register to comment.