合规国际互联网加速 OSASE为企业客户提供高速稳定SD-WAN国际加速解决方案。 广告
[搭建hadoop-2.6.0-cdh5.4.7伪分布式](https://blog.liyang.io/108.html) [CDH版下载地址](http://archive.cloudera.com/cdh5/cdh/5/) 先来看看什么是CDH,为什么选择CHD版的Hadoop。 ## CDH 属于Hadoop的一个发行版。 Hadoop有以下发行版: - Apache Hadoop - Cloudera’s Distribution Including Apache Hadoop(CDH) - Hortonworks Data Platform (HDP) - MapR - EMR CDH版有以下优点: - 版本划分清晰 - 版本更新速度快 - 支持Kerberos安全认证 - 文档清晰 - 支持多种安装方式(Cloudera Manager方式) ## 安装hadoop-2.6.0-cdh5.4.0 首先到指定网站下载安装包 [CDH版下载地址](http://archive.cloudera.com/cdh5/cdh/5/) 解压下载的安装包 ### 配置伪集群 - 1、进入 hadoop-2.6.0-cdh5.4.0/etc/hadoop - 2、编辑 hadoop-env.sh ``` vi hadoop-env.sh ``` - 3、修改JAVA_HOME的配置为 ``` export JAVA_HOME=/opt/tools/jdk1.8.0_131 ``` - 4、编辑core-site.xml,添加如下配置: ``` <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node2:9000</value> </property> </configuration> ``` node2说明,如果没有配置hosts,请将node2换成IP地址:wp保存并退出。 - 5、编辑hdfs-site.xml,添加如下配置 ``` <configuration> <property> <!--开启web hdfs--> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/cdh/hadoop/name</value> <description> namenode 存放name table(fsimage)本地目录(需要修改)</description> </property> <property> <name>dfs.namenode.edits.dir</name> <value>${dfs.namenode.name.dir}</value> <description>namenode存放 transactionfile(edits)本地目录(请自行修改)</description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/cdh/hadoop/data</value> <description>datanode存放block本地目录(请自行修改)</description> </property> </configuration> ``` 以上配置完成,还需要创建文件夹 ``` mkdir -p cdh/hadoop/name mkdir cdh/hadoop/data ``` - 6、配置mapred-site.xml ``` cp mapred-site.xml.template mapred-site.xml 之后加入以下配置 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> ``` - 7、编辑yarn-site.xml ``` <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> ``` 到此,所有配置都已完成。 ### 格式化HDFS ``` bin/hdfs namenode -format ``` 看到如下信息: ``` ************************************************************/ 15/09/22 14:59:46 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 15/09/22 14:59:46 INFO namenode.NameNode: createNameNode [-format] 15/09/22 14:59:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/09/22 14:59:49 WARN common.Util: Path /opt/cdh/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration. 15/09/22 14:59:49 WARN common.Util: Path /opt/cdh/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration. Formatting using clusterid: CID-41ea6672-a32e-4b16-b704-962381ed409a 15/09/22 14:59:49 INFO namenode.FSNamesystem: No KeyProvider found. 15/09/22 14:59:49 INFO namenode.FSNamesystem: fsLock is fair:true 15/09/22 14:59:49 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 15/09/22 14:59:49 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 15/09/22 14:59:49 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 15/09/22 14:59:49 INFO blockmanagement.BlockManager: The block deletion will start around 2015 九月 22 14:59:49 15/09/22 14:59:49 INFO util.GSet: Computing capacity for map BlocksMap 15/09/22 14:59:49 INFO util.GSet: VM type = 64-bit 15/09/22 14:59:49 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB 15/09/22 14:59:49 INFO util.GSet: capacity = 2^21 = 2097152 entries 15/09/22 14:59:50 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 15/09/22 14:59:50 INFO blockmanagement.BlockManager: defaultReplication = 1 15/09/22 14:59:50 INFO blockmanagement.BlockManager: maxReplication = 512 15/09/22 14:59:50 INFO blockmanagement.BlockManager: minReplication = 1 15/09/22 14:59:50 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 15/09/22 14:59:50 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false 15/09/22 14:59:50 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 15/09/22 14:59:50 INFO blockmanagement.BlockManager: encryptDataTransfer = false 15/09/22 14:59:50 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 15/09/22 14:59:50 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE) 15/09/22 14:59:50 INFO namenode.FSNamesystem: supergroup = supergroup 15/09/22 14:59:50 INFO namenode.FSNamesystem: isPermissionEnabled = true 15/09/22 14:59:50 INFO namenode.FSNamesystem: HA Enabled: false 15/09/22 14:59:50 INFO namenode.FSNamesystem: Append Enabled: true 15/09/22 14:59:50 INFO util.GSet: Computing capacity for map INodeMap 15/09/22 14:59:50 INFO util.GSet: VM type = 64-bit 15/09/22 14:59:50 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB 15/09/22 14:59:50 INFO util.GSet: capacity = 2^20 = 1048576 entries 15/09/22 14:59:50 INFO namenode.NameNode: Caching file names occuring more than 10 times 15/09/22 14:59:50 INFO util.GSet: Computing capacity for map cachedBlocks 15/09/22 14:59:50 INFO util.GSet: VM type = 64-bit 15/09/22 14:59:50 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB 15/09/22 14:59:50 INFO util.GSet: capacity = 2^18 = 262144 entries 15/09/22 14:59:50 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 15/09/22 14:59:50 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 15/09/22 14:59:50 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000 15/09/22 14:59:50 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 15/09/22 14:59:50 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 15/09/22 14:59:50 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 15/09/22 14:59:50 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 15/09/22 14:59:50 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 15/09/22 14:59:50 INFO util.GSet: Computing capacity for map NameNodeRetryCache 15/09/22 14:59:50 INFO util.GSet: VM type = 64-bit 15/09/22 14:59:50 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB 15/09/22 14:59:50 INFO util.GSet: capacity = 2^15 = 32768 entries 15/09/22 14:59:50 INFO namenode.NNConf: ACLs enabled? false 15/09/22 14:59:50 INFO namenode.NNConf: XAttrs enabled? true 15/09/22 14:59:50 INFO namenode.NNConf: Maximum size of an xattr: 16384 15/09/22 14:59:51 INFO namenode.FSImage: Allocated new BlockPoolId: BP-314159059-192.168.1.3-1442905191056 15/09/22 14:59:51 INFO common.Storage: Storage directory /opt/cdh/hadoop/name has been successfully formatted. 15/09/22 14:59:51 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 15/09/22 14:59:51 INFO util.ExitUtil: Exiting with status 0 15/09/22 14:59:51 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at node2/192.168.1.3 ************************************************************/ ``` 如果不报错,则格式化成功。 然后分别启动HDFS和Yarn: ``` sbin/start-dfs.sh sbin/start-yarn.sh ``` 启动过程没有错误则启动成功。 ### 验证 - 使用jps可以查看相关进程 显示如下: ``` nova@ubuntu208:~$ jps 7667 Jps 28532 DataNode 28742 SecondaryNameNode 29319 NodeManager 28376 NameNode 29018 ResourceManager ``` - 管理地址 yarn: [http://txy.quartz.ren:8088/cluster](http://txy.quartz.ren:8088/cluster) hdfs状态: [http://txy.quartz.ren:50070/dfshealth.html#tab-overview](http://txy.quartz.ren:50070/dfshealth.html#tab-overview)