Deployed Hadoop (HDP 2.2) Today!

the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
In January I made a solid attempt to deploy Hadoop and while I was ultimately successful I ran into a number of issues that just wouldn't do. I finished up another project and spent the past few days prepping my environment. As I have said before, deploying Hadoop will show you every infrastructure mistake you have made and then some. This time I did two things:

1. Wrote a Python script - with this script I tested to make sure NTP was being used, forward and reverse DNS was working, SELinux is disabled, iptables were off, and made sure all those settings jived together (it's a crude script, but I'd be happy to share it if people want it).

2. Ran the bash script written by Hortonworks - this tests a lot of stuff and pointed out a couple of items that I needed to fix. It is on GITHub and I highly recommend using it.

Ultimately, of the 10 servers I am using, 8 came up with little or no issue. 2 of them (ironically the last two I had setup) are having issues which I tracked down to being due to a directory being read only. Unfortunately I don't see a way to fix it and I think I will have to format them. This isn't bad per say because I need to show other members of my team how it all works so that will be a good thing.

The only other issue I have is that the HBase master will not start. Now it seems from the logs that it cannot find any Live datanodes and thus can't copy a configuration file to at least one of them. The odd thing is I have 6 and they are reporting as ok (except for the two I need to redo), but in Ambari they are not showing as Live or Dead. I've posted on the forums for Hortonworks and I am hoping to have an answer when I return to work on Monday.

Brave new world ahead of us!
WIP:
PHP
Kotlin
Intro to Discrete Math
Programming Languages
Work stuff

Comments

  • hurricane1091hurricane1091 Member Posts: 919 ■■■■□□□□□□
    Hadoop is interesting. Not up my alley and I have zero exposure to it, but am sure it takes a great deal of skill to get involved with it. Good work.
  • MutataMutata Member Posts: 176
    Ive installed and configured hadoop several times.

    The question still is.. what does it do.
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    Well I will give an example of what we are planning on doing with it. We have 16 providers who give us reports in a required format. This requires that they perform ETL to get it to us. The problem is we don't have the ability to independently verify that the calculations are being done properly. With Hadoop we will use Hive to connect to the databases and pull the data. From there we can apply a schema on read and perform the calculations ourselves. With the raw data and working with the provider we'll know that the data is correct. Plus we have a few other things in mind as well, but that is our current objective.
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
  • the_Grinchthe_Grinch Member Posts: 4,165 ■■■■■■■■■■
    When I deployed the cluster I had three main issues:

    1. No Datanodes showing as live
    2. HBase Master would not start
    3. Two of my nodes would not start HDFS

    Problem 3 is a permissions issue on the folder it is trying to use and I am unable to fix the problem. I'm just going to wipe them and start fresh, which is actually good because I need to update my documentation on deploying a server anyhow and need to document how to add a new server into the cluster.

    I started to troubleshoot problem 2 and in the logs I found that it was unable to replicate it's config to at least one of the Datanodes (which is required) thus it would not start. This means, in theory, if I fix problem 1 then problem 2 will go away.

    I review the logs for problem 1 and find out that for some reason the NameNode cannot resolve the IP's to hostnames. Now, in the documentation it stated that if you setup DNS there is no need to edit the /etc/hosts files. But in doing some research I found out that Hadoop does not always use DNS it might just go to the /etc/hosts and that's it. So I edit /etc/hosts with the IP's and FQDN's for all the servers in the cluster. Bam problem 1 is solved! Go to start the HBase master and bam problem 2 is gone! (Also, I will note that the Datanodes could not resolve the hostname for the NameNode either)

    Figured I'd update everyone in case anyone runs into the same issue.
    WIP:
    PHP
    Kotlin
    Intro to Discrete Math
    Programming Languages
    Work stuff
Sign In or Register to comment.