1.1 大數據從0到1環境搭建HADOOP偽分布式 hadoop-3.2.1

標簽：大數據

文章目錄

1、虛擬機安裝
2、虛擬機信息獲取
3、操作系統
4、shh工具
5、查看 rpm -qa | grep pdsh 木得
6、 yum安裝很多木得，先安裝rpm包
7、安裝或查看java -version
8、變更JAVA 為linux版本支持更穩定
9、新建hadoop用戶以及用戶組，給予sudo權限
10、建個hadoop文件夾授權給hadoop用戶
11、用hadoop用戶操作解壓
12、后續直接參照官網即可：
13、安裝成功

1、虛擬機安裝

VMware-workstation-full-15.5.0-14665864 網盤秘鑰

2、虛擬機信息獲取

在這里插入圖片描述

獲取網關

在這里插入圖片描述

3、操作系統

192.168.188.100 H1 CentOS 7 64 位按照網上教程安裝我安裝的有桌面的
http://mirrors.aliyun.com/centos/
注意網絡配置外面設置
在這里插入圖片描述

里面設置網卡路由網關和先前一致
在這里插入圖片描述

4、shh工具

里面查看下是否安裝 rpm -qa | grep ssh
在這里插入圖片描述
Windows安裝Xshell工具連linux

Windows安裝Xftp外面可以界面傳輸包到linux

5、查看 rpm -qa | grep pdsh 木得

yum install pdsh 安裝失敗

6、 yum安裝很多木得，先安裝rpm包

yum -y install epel-release
yum clean all && yum makecache
再試：yum install pdsh 安裝成功

7、安裝或查看java -version

在這里插入圖片描述

8、變更JAVA 為linux版本支持更穩定

rpm -qa | grep java
rpm -e --nodeps java-1.7.0-openjdk-headless-1.7.0.221-2.6.18.1.el7.x86_64
rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.222.b03-1.el7.x86_64
rpm -e --nodeps java-1.7.0-openjdk-1.7.0.221-2.6.18.1.el7.x86_64
rpm -e --nodeps java-1.8.0-openjdk-1.8.0.222.b03-1.el7.x86_64
mkdir -p /usr/local/java
tar -zxvf jdk-8u231-linux-x64.tar.gz  -C /usr/local/java
vim /etc/profile

底部插入

export JAVA_HOME=/usr/local/java/jdk1.8.0_231
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile

在這里插入圖片描述

9、新建hadoop用戶以及用戶組，給予sudo權限

sudo usermod -a -G hadoop hadoop
sudo nano /etc/sudoers
在　　root　　ALL=(ALL)　　ALL　　下面添加：（復制里面的不然容易出錯）
hadoop ALL=(ALL) ALL

10、建個hadoop文件夾授權給hadoop用戶

mkdir -p /usr/local/hadoop
chown -R hadoop:hadoop /usr/local/hadoop

11、用hadoop用戶操作解壓

su hadoop
tar -zxvf hadoop-3.2.1.tar.gz -C /usr/local/hadoop

12、后續直接參照官網即可：

Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows: set to the root of your Java installation

 export JAVA_HOME=/usr/java/latest

Try the following command:

  $ bin/hadoop

Standalone Operation
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration
Use the following:
etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

Execution
The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

Format the filesystem:

  $ bin/hdfs namenode -format

Start NameNode daemon and DataNode daemon:

  $ sbin/start-dfs.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
在這里插入圖片描述
3. Browse the web interface for the NameNode; by default it is available at:
a. NameNode - http://localhost:9870/ 在虛擬機里面瀏覽器登錄

Make the HDFS directories required to execute MapReduce jobs:

  $ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>

Copy the input files into the distributed filesystem:

  $ bin/hdfs dfs -mkdir input
  $ bin/hdfs dfs -put etc/hadoop/*.xml input

Run some of the examples provided:

  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'

Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

  $ bin/hdfs dfs -get output output
  $ cat output/*

or
View the output files on the distributed filesystem:

 $ bin/hdfs dfs -cat output/*

When you’re done, stop the daemons with:

  $ sbin/stop-dfs.sh

      本地   /usr/local/hadoop/hadoop-3.2.1/sbin/./stop-dfs.sh

YARN on a Single Node
You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.
The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

Configure parameters as follows:
etc/hadoop/mapred-site.xml:
cd /usr/local/hadoop/hadoop-3.2.1/etc/hadoop/

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

etc/hadoop/yarn-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh

/usr/local/hadoop/hadoop-3.2.1/sbin/./start-yarn.sh

在這里插入圖片描述

/usr/local/hadoop/hadoop-3.2.1/sbin/./stop-all.sh
/usr/local/hadoop/hadoop-3.2.1/sbin/./start-all.sh

在這里插入圖片描述 3. Browse the web interface for the ResourceManager; by default it is available at:
o ResourceManager - http://localhost:8088/

4. Run a MapReduce job.
5. When you’re done, stop the daemons with:
$ sbin/stop-yarn.sh

13、安裝成功

本文鏈接：https://blog.csdn.net/tanxiang21/article/details/103937441

智能推薦

安裝Hadoop3.2.1（很多坑）

安裝Hadoop3.2.1（很多坑）從官網下載hadoop包,hadoop-3.2.1.tar.gz ,342.56M 931KB/s 用時 8m 19s 解壓,路徑為/home/wang/hadoop/hadoop-3.2.1 設置環境變量加入以下設置生效不成功的話，修改權限，改密碼：123456 檢查環境變量是否設置成功報錯：ERROR: JAVA_HOME is not set ...

Ubuntu 20.04.1 LTS 安裝 Hadoop3.2.1

文章目錄大數據學習之路之基于Ubuntu20.04.1 LTS安裝 Hadoop3.2.1 環境準備軟件準備解壓Hadoop文件設置Hadoop單機環境(獨立模式) 配置Hadoop環境變量啟動hadoop 停止Hadoop 總結大數據學習之路之基于Ubuntu20.04.1 LTS安裝 Hadoop3.2.1 大數據學習之路之基于Ubuntu20.04.1 LTS安裝 Hadoop3...

Hadoop3.2.1環境下安裝HBase

環境準備 Linux：CentOS Linux release 7.2.1511 (Core) # 使用 cat /etc/redhat-release 命令查看 JDK：jdk1.8.0_211 Hadoop：3.2.1 Zookeeper：3.4.14 HBase：1.4.13 安裝步驟下載安裝包修改配置進入hbase的conf目錄，修改如下幾個配置文件。 hbase-env.sh &n...

mac搭建hadoop3.2.1——偽分布模式

1 homebrew安裝安裝完畢后使用brew doctor查看是否安裝成功 2 設置ssh免密登錄因為hadoop中，在啟動datanode、namenode時都需要免密登錄，不設置，則會出現Permission denied的錯誤提示，導致無法啟動DataNode。設置免密登錄：之后再使用ssh localhost命令，直接可以登錄： 3 安裝Hadoop 首先要在電腦中成功安裝jdk...

Hadoop3.2.1版本的環境搭建

最近有人提出能不能發一些大數據相關的知識，No problem ！今天先從安裝環境說起，搭建起自己的學習環境。 Hadoop的三種搭建方式以及使用環境：單機版適合開發調試；偽分布式適合模擬集群學習；完全分布式適用生產環境。這篇文件介紹如何搭建完全分布式的hadoop集群，一個主節點，兩個數據節點。先決條件準備3臺服務器虛擬機、物理機、云上實例均可，本篇使用Openstack私有云里...