IT虾米网

安装hadoop-3.2.1mac伪分布式

developer 2021年11月03日 大数据 484 0

大数据相关整理

一、安装

从官网下载hadoop包,http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

#解压,路径为/Users/zheng/hadoop/hadoop-3.2.1 
$ tar -zxvf hadoop-3.2.1.tar.gz 
 
#设置环境变量 
$ vim  /etc/profile 
#加入以下设置 
export HADOOP_HOME=/Users/zheng/hadoop/hadoop-3.2.1 
export PATH=$PATH:$HADOOP_HOME/bin 
 
#生效 
$ source /etc/profile 
 
#检查环境变量是否设置成功 
$ hadoop version 
#以下则表示设置成功 
Hadoop 3.2.1 
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842 
Compiled by rohithsharmaks on 2019-09-10T15:56Z 
Compiled with protoc 2.5.0 
From source with checksum 776eaf9eee9c0ffc370bcbc1888737 
This command was run using /Users/zheng/hadoop/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar 
 

二、参数设置

core-site.xml:集群全局参数,定义系统级别的参数,如HDFS URL 、Hadoop的临时目录等

# 修改/hadoop-3.2.1/etc/hadoop/core-site.xml 
<configuration> 
	<!--文件系统主机和端口--> 
	<property> 
	     <name>fs.defaultFS</name> 
	     <value>hdfs://localhost:9000</value> 
	</property> 
	 
	<!--指定hadoop运行时产生文件的存放目录,不需要提交创建好,后续会自动生成--> 
	<property> 
          <name>hadoop.tmp.dir</name> 
          <value>file:/Users/zheng/hadoop/tmp</value> 
    </property> 
</configuration> 

hdfs-site.xml:namenode,datanode存放位置、文件副本的个数、文件的读取权限等

# 修改/hadoop-3.2.1/etc/hadoop/hdfs-site.xml 
<configuration> 
	<property> 
	     <name>dfs.replication</name> 
	     <value>1</value> 
	</property> 
	<property> 
	     <name>dfs.permissions</name> 
	     <value>false</value> 
	</property> 
	<!--不需要提交创建好,后续会自动生成--> 
	<property> 
	     <name>dfs.namenode.name.dir</name> 
	     <value>file:/Users/zheng/hadoop/dfs/name</value> 
	</property> 
	<!--不需要提交创建好,后续会自动生成--> 
	<property> 
	     <name>dfs.datanode.data.dir</name> 
	     <value>file:/Users/zheng/hadoop/dfs/data</value> 
	</property> 
</configuration> 

mapred-site.xml:Mapreduce参数

# 修改/hadoop-3.2.1/etc/hadoop/mapred-site.xml 
<configuration> 
	<property> 
	      <name>mapreduce.framework.name</name> 
	      <value>yarn</value> 
	</property> 
</configuration> 

yarn-site.xml:集群资源管理系统参数,ResourceManager ,nodeManager的通信端口,web监控端口等

# 修改/hadoop-3.2.1/etc/hadoop/yarn-site.xml 
<configuration> 
	<property> 
	      <name>yarn.nodemanager.aux-services</name> 
	      <value>mapreduce_shuffle</value> 
	</property> 
</configuration> 

三、初始化hdfs

# 格式化HDFS,可以执行hdfs namenode -format,如果这个命令不行执行以下命令 
cd /hadoop/hadoop-3.2.1/bin 
./hdfs namenode -format 

四、启动hadoop

cd /hadoop/hadoop-3.2.1/sbin 
# 不知道启动哪个就启动start-all.sh把所有的都启动了 
./start-all.sh 
 
 
#分开启动 
./hadoop-daemon.sh start datanode 
./hadoop-daemon.sh start secondarynamenode 
./hadoop-daemon.sh start namenode 
./yarn-daemon.sh start resourcemanager 
./yarn-daemon.sh start nodemanager 
./mr-jobhistory-daemon.sh  start historyserver 
报如下错误: 
WARNING: Attempting to start all Apache Hadoop daemons as zheng in 10 seconds. 
WARNING: This is not a recommended production deployment configuration. 
WARNING: Use CTRL-C to abort. 
Starting namenodes on [zheng-2.local] 
zheng-2.local: ERROR: Cannot set priority of namenode process 3282 
Starting datanodes 
Starting secondary namenodes [account.jetbrains.com] 
account.jetbrains.com: ERROR: Cannot set priority of secondarynamenode process 3524 
2020-04-09 12:01:03,083 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
Starting resourcemanager 
Starting nodemanagers 
 

如何查错:进入/hadoop/hadoop-3.2.1/logs可看到启动日志。
这里namenode启动报错,可以看见以下错误:

java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/Users/zheng/hadoop/tmp has no authority. 
        at org.apache.hadoop.hdfs.DFSUtilClient.getNNAddress(DFSUtilClient.java:780) 
        at org.apache.hadoop.hdfs.DFSUtilClient.getNNAddressCheckLogical(DFSUtilClient.java:809) 
        at org.apache.hadoop.hdfs.DFSUtilClient.getNNAddress(DFSUtilClient.java:771) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:545) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:676) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:696) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:926) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1692) 
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759) 
 
解决: 
本以为是/hadoop/tmp文件权限问题,chmod 777 /hadoop/tmp赋权以后还是不行,最后发现之前配置core-site.xml设置复制粘贴错了, 	 
<property> 
          <name>hadoop.tmp.dir</name> 
          <value>file:/Users/zheng/hadoop/tmp</value> 
</property>  
修改参数后需要重新hdfs namenode -format一下文件,再重新启动 

确认启动成功

#启动成功后执行jps检查下是否成功启动 
$ jps 
17521 SecondaryNameNode 
17717 ResourceManager 
17369 DataNode 
17820 NodeManager 
17262 NameNode 
17886 Jps 

五、web访问

hadoop web页面默认地址:http://localhost:9870/
在这里插入图片描述
yarn默认地址:http://localhost:8088
在这里插入图片描述

六、补充

1、需要提前装好jdk,如何安装这个自行百度
2、ssh localhost

#ssh登录,本地执行以下命令测试是否能登录,如果没配置过好像会报权限之类的问题 
ssh localhost 
 
# 如有问题执行以下命令添加权限 
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa 
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
chmod 0600 ~/.ssh/authorized_keys 

评论关闭
IT虾米网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!

spark sql查询hive表实现脱敏