欢迎来到Hadoop

What Is Apache Hadoop? Hadoop是一个可靠的、可扩展的、分布式计算的开源软件。 Hadoop是一个分布式处理大数据的框架。它被设计成从一台到上千台不等的服务器,每个服务器都提供本地计算和存储的能力。它并非依赖于...

Hadoop2.9.0安装

参考 https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html 1、下载并解压 2、设置环境变量...

hadoop fs命令

本文参考链接:https://www.cnblogs.com/cjsblog/p/8094146.html...

hadoop wordcount

本文参考链接:https://www.cnblogs.com/cjsblog/p/8097766.html...

HDFS Architecture

http://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html Introduction Hadoop分布式文件系统被设计运行在普...

MapReduce

http://hadoop.apache.org/docs/r2.9.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html Overvi...

MapReduce Tutorial(划重点)

Mapper Mapper的maps阶段将输入键值对经过计算得到中间结果键值对,框架会将中间结果按照key进行分组,然后传递给reducer以决定最终的输出。用户可以通过Job.setGroupingComparatorCla...

HDFS Federation

http://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/Federation.html Background HDFS有两个主要的层: Nam...

HDFS High Availability Using the Quorum Journal Manager

http://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html 背景 在Hadoop 2.0.0...

Hive安装

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-SimpleExampleUseCases 解压,并配置环境变量 在con...