基于cloudera CDH5的环境搭建
1、分别配置三台机子hosts映射文件,并关闭iptables。配置如图,以129为master,128和131为slave1和slave2,并注释掉可能存在的IPV6的配置项信息。修改三台机器的hostname,改成对应的master、slave1和slave2命令:vi /ets/hostname通过运行service iptables status来确保三台机器的iptables状态为unrecognized service

2、安装jdk,最好是安装1.7以上的版本,因为1.7以下运行maven3.0.5会有问题。
3、配置3台机器的ssh服务,实现无密码登陆效果。1、增加一个用户组用户,用于hadoop运行及访问。 root@ubuntu:~# sudo addgroup hadoop root@ubuntu:~# sudo adduser –ingroup hadoop hadoop注:建议将hadoop用户加到sodu列表:vi /etc/sudoers (hadoop ALL=(ALL) ALL)2、 Ⅰ).用hadoop登入master,cd到用户目录下,如/home/hadoop/ 运行ssh-keygen –t rsa(连续3次回车即可) Ⅱ).ssh拷贝到其他机器上scp ~/.ssh/id_rsa.pub hadoop@slave1:~/temp_keyscp ~/.ssh/id_rsa.pub hadoop@slave2:~/temp_key Ⅲ).登入都各server上创建并改变.ssh权限 chmod700~/.ssh Ⅳ).转换内容及改变权限 cat ~/temp_key >>~/.ssh/authorized_keys chmod600~/.ssh/authorized_keys Ⅴ).验证:从master上ssh salve1和slave2 ,看看能不能直接登入,如果直接能登入不需要输入密码,则表示配置成功。
4、配置hadoop。在master上操作以下步骤。1、将hadoop-2.3.0-cdh5.0.0-src.tar.gz解压到/usr/cdh下,配置HADOOP_HOME环境。修改/etc/profile文件,添加export HADOOP_HOME=/usr/cdh/hadoop-2.3.0-cdh5.0.0,在export PATH下添加$HADOOP_HOME/bin:$HADOOP_HOME/sbin的配置;2、修改$HADOOP_HOME/etc/hadoop下的core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml四个配置文件。 core-site.xml<configuration> <property> <name>io.native.lib.available</name> <value>true</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> <description>The name of the default file system.Either the literal string "local" or a host:port for NDFS.</description> <final>true</final> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop/hadoop-hadoop</value> </property></configuration> hdfs-site.xml<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/usr/cdh/hadoop/dfs/name</value> <description>Determines where on the local filesystem the DFS name node should store the name table.If this is a comma-delimited list of directories,then name table is replicated in all of the directories,for redundancy.</description> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/cdh/hadoop/dfs/data</value> <description>Determines where on the local filesystem an DFS data node should store its blocks.If this is a comma-delimited list of directories,then data will be stored in all named directories,typically on different devices.Directories that do not exist are ignored. </description> <final>true</final> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permission</name> <value>false</value> </property></configuration> mapred-site.xml<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.job.tracker</name> <value>hdfs://master:9001</value> <final>true</final> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1536</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1024M</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>3072</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx1024M</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>512</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>100</value> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>50</value> </property> <property> <name>mapred.system.dir</name> <value>/tmp/hadoop/mapred/system</value> <final>true</final> </property> <property> <name>mapred.local.dir</name> <value>/tmp/hadoop/mapred/local</value> <final>true</final> </property></configuration> yarn-site.xml<configuration><!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.address</name> <value>master:8080</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8081</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8082</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property></configuration>3、 运行前的准备a) 在$HADOOP_HOME/bin下运行hdfs namenode –formatb) 在$HADOOP_HOME/sbin下运行./start-all.sh如遇namenode或datanode无法正常启动,则需要查询日志是否因为namenode和datanode的缓存引起。其缓存文件的位置配置是在$HADOOP_HOME/etc/hadoop/hdfs-site.xml中配置如果以上在本机运行正常,通过jps命令看到NameNode、SecondaryNameNode、ResourceManager、NodeManager、DataNode,则表示运行正常,配置正常。继续以下操作。 1、修改$HADOOP_HOME/etc/hadoop/savles文件,修改其内容为slave1slave2 2、复制到slave1和slave2。先在slave1和slave2上分别添加/usr/cdh文件夹,然后运行scp -r/usr/cdh/hadoop-2.3.0-cdh5.0.0 hadoop@slave1:/usr/cdhscp -r/usr/cdh/hadoop-2.3.0-cdh5.0.0 hadoop@slave2:/usr/cdh 3、修改slave1和slave2上的/usr/cdh的所属用户和运用权限为700注******************************************************************:使用hadoop fs -ls 查看文件系统的时候会遇到报错WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable原因是缺少libhadoop.so文件 在src目录或者hadoop-common子项目中重新build,命令:mvn package -DskipTests -Pdist,native,docs -Dtar再次遇到报错[ERROR] class file for org.mortbay.component.AbstractLifeCycle not found这次是遇到BUG了按照https://issues.apache.org/jira/browse/HADOOP-10110官方说明在hadoop-common-project/hadoop-auth/pom.xml文件中添加 <dependency> <groupId>org.mortbay.jetty</groupId> <artifactId>jetty-util</artifactId> <scope>test</scope> </dependency>再次编译遇到报错Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hadoop-common:这是没有安装zlib1g-dev的关系,这个可以 使用apt-get安装 最后把生成的.so文件全部拷贝到lib/native/目录下,再次运行hadoop fs -ls没有报错信息
5、配置zookeeper。以下部署均在master上操作。1、解压zookeeper-3.4.猱蝰逾鸾5.tar.gz到/usr/cdh下。修改配置文件$ZOOKEEPER_HOME/conf/zoo.cfg文件,内容如下:# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.dataDir=/usr/cdh/zookeeper-3.4.5/datadataLogDir=/usr/cdh/zookeeper-3.4.5/log# the port at which the clients will connectclientPort=2181## Be sure to read the maintenance section of the# administrator guide before turning on autopurge.##http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1server.1=master:2881:3881server.2=slave1:2882:3882server.3=slave2:2883:3883建立对应的dataDir和dataLogDir文件夹,在dataDir指定的文件夹内新增名为myid的文件,对应的内容就是server.1/2/3后的数字(1/2/3)。master在$ZOOKEEPER_HOME/data目录下新建myid,其值为1。2、复制到slave1和slave2。先在slave1和slave2上分别添加/usr/cdh文件夹,然后运行scp -r/usr/cdh/zookeeper-3.4.5 hadoop@slave1:/usr/cdhscp -r/usr/cdh/zookeeper-3.4.5 hadoop@slave2:/usr/cdh3、修改slave1和slave2上的/usr/cdh的所属用户和运用权限为700分别在master、slave1和slave2上运行$ZOOKEEPER_HOME/bin/zkServer.sh文件,运行zookeeper。如果三台机器均正常出现了QuorumPeerMain,则表示zookeeper运行正常。
6、配置hbase。以下配置均在master上操作。1、配置hbase环境。修改/etc/pro酆璁冻嘌file文件,添加剧安颌儿如下内容:export HBASE_HOME=/usr/cdh/hbase-0.96.1.1-cdh5.0.0,在export PATH=后添加$HBASE_HOME/bin2、修改$HBASE_HOME/conf/hbase-site.xml文件,修改内容如下:<configuration><property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.master</name> <value>master:60000</value> </property><property> <name>hbase.zookeeper.quorum</name> <value>master,slave1,slave2</value> </property><property> <name>hbase.cluster.distributed</name> <value>true</value> </property><property> <name>hbase.zookeeper.property.dataDir</name> <value>/usr/cdh/zookeeper-3.4.5/data</value> </property></configuration>3、修改$HBASE_HOME/conf/hbase-env.shexport JAVA_HOME=/usr/java/jdk1.7.0_51export HBASE_CLASSPATH=/usr/cdh/hadoop-2.3.0-cdh5.0.0/etc/hadoopexport HBASE_MANAGES_ZK=false4、修改$HBASE_HOME/conf/regionservers,内容如下:masterslave1slave25、复制到slave1和slave2。先在slave1和slave2上分别添加/usr/cdh文件夹,然后运行scp -r/usr/cdh/hbase-0.96.1.1-cdh5.0.0 hadoop@slave1:/usr/cdhscp -r/usr/cdh/hbase-0.96.1.1-cdh5.0.0 hadoop@slave2:/usr/cdh6、修改slave1和slave2上的/usr/cdh的所属用户和运用权限为700
7、以上如果配置正确,接下来就是要测试总体情况了。启动顺序为hadoop>zookeeper>hbase。启动hadoop:在master上运行start-all.sh;启动zookeeper:在三台机器上分别运行/usr/cdh/zookeeper-3.4.5/bin/zkServer.sh;启动hbase:在master上运行star-hbase.sh;为防止hbase的master单点问题,可在slave1或slave2上运行/usr/cdh/hbase-0.96.1.1-cdh5.0.0/bin/hbase-daemon.sh start master