Spark standalone deployment guide

spark standalone deployment guide¶

1. environment preparation¶

Upload the following packages to the server

jdk-8u192-linux-x64.tar.gz
spark-2.4.1-bin-hadoop2.7.tar.gz

unzip

tar xvf jdk-8u192-linux-x64.tar.gz -C /data/projects/common
tar xvf spark-2.4.1-bin-hadoop2.7.tar.gz -C /data/projects/common

configure/etc/profile

export JAVA_HOME=/data/projects/common/jdk1.8.0_192
export PATH=$JAVA_HOME/bin:$PATH
export SPARK_HOME=/data/projects/common/spark-2.4.1-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

2. Run the spark service¶

Start master

source /etc/profile
cd /data/projects/common/spark-2.4.1-bin-hadoop2.7 && ./sbin/start-master.sh

Start the spark node service
```
cd /data/projects/common/spark-2.4.1-bin-hadoop2.7 && ./sbin/start-slave.sh spark://node:7077
```
Note: node is the machine name, please change it according to the actual machine name

3. spark test¶

cd /data/projects/common/spark-2.4.1-bin-hadoop2.7/bin
./pyspark --master local[2]
# Import data
distFile = sc.textFile("/etc/profile")
# Count the number of rows
distFile.count()

If the file line count is successfully returned, the deployment is successful.

Last update: 2022-01-27