Hortonworks Training

the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
Today I completed day one of the Hortonworks Operations Training. It was pretty standard introduce yourself, what you do, and what you planned to do with Hadoop. Once that was done we went over the basic of Hadoop, big data, and some use cases. Near the end we finally deployed a cluster of four nodes and that was definitely enjoyable. So far I feel the training has been really good and useful. The class does actually have a cert related to it and I figure since work has picked up the tab for training that I will take the cert. I'll keep everyone informed of my progress and let me know if you have any questions.
WIP:
PHP
Kotlin
Intro to Discrete Math
Programming Languages
Work stuff

Comments

  • brownwrapbrownwrap Member Posts: 549
    We have a small Hadoop Cluster here that I have access to, but I don't have a clue as to how to use it.
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    Great feedback! Definitely keep us posted on how the course progresses. It's funny you posted about building a four node cluster...working on writing an article about that very task using Ambari within AWS icon_smile.gif
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    Funny you say that as Hortonworks use to provide access to AWS for their training course. But students complained that they couldn't take anything home to practice with thus they switched to VMs. I did learn something not Hadoop related as they used Docker to setup the four nodes within one VM which was pretty cool since it was pretty responsive all things considered.
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    See now there is an organization that clearly doesn't how to leverage the cloud correctly...

    AWS is a great resource for training and allows you/them to spin up awesome demos for real hands on training during the course. If they wanted to allow students to continue working at home they should have everyone sign up for an AWS account in class (if they don't have one) and then add each students user account as an IAM role in Hortonwork's cloud. Then just inform each student they have XX credit for practice at home and once they exceed that their access will be cut off. This would be an awesome approach instead of having to prep and manage VMs constantly.

    </rant>

    The Docker thing doesn't surprise me at all. Docker is an awesome technology (hence their $40 million investment recently). Makes everyone's environment "the same" and you are right it is definitely responsive when done correctly. The downside to Docker is that there are little to no operational tools to be able to manage it in a production environment currently. There are two new orgs that just entered beta to fix this though.
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    Seems they must have read your rant Nightshade! My instructor sent them some emails complaining about the issue and they've appear to have decided to move back to AWS for the class along with sending the students home with the VMs. As for Docker I definitely like it, but it has proved to cause some issues with the environment. Overall I am definitely still a fan and will keep my eyes on it (along with future developments).

    Day Two of the class was just as good as Day One (if not better due to the information covered). I will say that it is a lot of information to cover in four days. Today we covered setting up queues and how much processing power they each get. We also covered moving data from one cluster to another along with what to do if one cluster was on version 1.0 and the version you're moving to is 2.0. The great thing has been being able to get into the troubleshooting aspect of Hadoop. In the beginning I didn't run into issues others in the class did, but eventually I did run into one and the information in the morning proved very useful. I do have to say you better have some decent Linux skills if you plan to take the course.

    We also excuted a number of jobs via YARN which was one of the most interesting things I have seen in a while. We analyzed the US Constitution to do a word count on the document. I found that best part was due to utilizing YARN you can program in any language the Linux box understands. Thus while the program we used was written in Java, I could just as easily write one in Python (yay!). Also the API's are amazing and I could easily write a web client to do basically everything.

    As far as training materials besides the VM they also provide a 500 page manual that shows not only the slides, but a much deeper explaination of the material the slides contain. I will say this is the first time they have done the class with people physically there. Previously it has all been remote and with AWS. Personally I enjoy the hands on in person, but we have five students coming in via remote view and it is just about as engaging.

    More to come tomorrow!
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    This is really interesting thanks for the feedback! I'm glad that I'm getting through to them subconsciously lol

    It's really interesting that their training seems so put together given they have no director of training at the moment icon_wink.gif
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    They are very put together with their training and I equate that to really wanting the product along with Hadoop in general to succeed. My instructor said that they flew an engineer in to help him when he started and they seem to be in communication with the training provider in regards to hang ups.
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    the_Grinch wrote: »
    They are very put together with their training and I equate that to really wanting the product along with Hadoop in general to succeed. My instructor said that they flew an engineer in to help him when he started and they seem to be in communication with the training provider in regards to hang ups.

    That's amazing. Definitely good to see them making an effort to really see their product succeed.
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    Day Three has been completed. Today we focused on getting data into and out of the cluster in all different manners. It's amazing how quickly (even on a VM with four nodes and six gigs of ram) data it parsed and stored in Hadoop. I've also never dealt with SQL and was surprised how simple it really was. I could see a number of aspects that will directly apply to what we'll be doing at my job. We're also using Elasticsearch (which is a partner of Hortonworks) and our instructor was able to reach out to them in regards to hosting our Elasticsearch nodes on the same datanodes running within our Hadoop cluster. It seems that this isn't the first time they fielded this request and they provided some basic stats we should follow. We covered Oozie, Pig, and Tez which are some amazing pieces of technology. Lastly we covered how to monitor the cluster.

    All I will say is if you have a chance to learn this technology it's time to jump on board.
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    lol sounds like you've been bitten by the big data bug icon_smile.gif
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    I definitely have! I will say that my only issue is my inability to see the big picture (no pun intended). From an operations side I have no issue I know I can get the cluster up and make sure it works properly. That being said no one in my Division (beyond my fellow team members) will know what to do with it. My counterpart is the architect of the solution, but we really need to map out and figure out what people need then use our cluster to get it to them.
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    The biggest flaw in DS is not asking the right question. If you don't pose the correct question from the start then your results will either be incorrect or useless. One of the benefits that was born from the right of big data tools is the ability to pose questions, test your hypothesis, and fail fast if you are going in the wrong direction (this explicitly lends itself to the whole agile movement). That being said, the most successful big data implementations are the ones where the developers/ops/BI folks understand the business and it's needs. Without that fundamental understanding you can write M/R jobs till you are blue in the face, but your insights will never get you or the business anywhere.

    Definitely not a dig on you directly, but from experience most businesses are getting to a similar point you are at where the cluster is running and some data sources are connected, but no one knows what questions to ask. It's kind of ironic and a little funny if you ask me that everyone has all of this "data" and doesn't really have a clue what to do with it. The larger the business (which usually implies larger departmental disconnect) the worse this problem becomes.
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    I definitely see your point and we're definitely working on it. In our meetings I've already pointed out (and management has agreed) that we will need to sit with the stakeholders to see exactly what "business" questions they have. We do have a few questions right now that we can work on to get started. A large part of it is showing success with these items and then kicking it up another notch. It will take time no doubt. The other part is another member of my team is a database guy by trade. He is very use to doing data analysis and in gathering requirements. Our plan is to send him along with another guy who has a decent knowledge of databases for the Hortonworks Data Analytics training. This makes sure that it isn't just us up to the task of getting this going and places the people with the right skills in the right place. Thanks for all the advice thus far!
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    No worries icon_smile.gif

    Enjoying the daily feedback from their course too!
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    Sorry about not posting about the last day, got home and had to deal with being away for a week. The last day we covered backing up Hadoop. This mainly dealt with metadata found on the Namenode. Next we covered setting up Rack Awareness within Hadoop. By default that it sees everything as one rack, but in the event that you have more then one rack network speeds need to be taken into account. Following that we covered setting up high availability on the Namenode. This allowed us to test by taking one Namenode down and having the standy by Namenode take it's place just about immediately. Finally we covered setting up authentication via Kerberos. My instructor forewarned us that this lab was almost never completed by a class. I think much of this was due to the Docker program that they used to setup the VM's.

    I'd highly reccomend the training if you a preparing to deploy your first cluster. Obviously you could probably figure most of this out, it was great to get a step by step guide and by pass some of the common pitfalls one would encounter.
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
Sign In or Register to comment.