本文共 2245 字,大约阅读时间需要 7 分钟。
安装JDK
安装Java Development Kit(JDK)是大数据开发的基础。以下是手动安装步骤:sudo apt-get install java-dev
export JAVA_HOME=/usr/lib/jvm/java export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH 保存后执行:
source ~/.bashrc
java -version
配置SSH免密登录
SSH免密登录是开发过程中的常用需求。以下是配置步骤:sudo apt-get install openssh-server
ssh-keygen -t rsa
cat ./id_rsa.pub >> ./authorized_keys
ssh localhost
安装Hadoop
Hadoop是大数据处理的核心框架。以下是手动安装步骤:sudo tar -zxvf hadoop-2.6.5.tar.gz -C /usr/local
export HADOOP_HOME=/usr/local/hadoop export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bashrc ./bin/hdfs namenode -format ./sbin/start-dfs.sh jps
export JAVA_HOME=/usr/lib/jvm/java
hadoop.tmp.dir file:/usr/local/hadoop/tmp fs.defaultFS hdfs://localhost:9000
dfs.replication 1 dfs.namenode.name.dir file:/usr/local/hadoop/tmp/dfs/name dfs.datanode.data.dir file:/usr/local/hadoop/tmp/dfs/data
安装Scala
Scala是大数据处理的高级语言。以下是手动安装步骤:sudo apt-get install scala
export SCALA_HOME=/usr/share/scala-2.11
scala -version
安装Spark
Spark是大数据处理的通用框架。以下是手动安装步骤:tar zxvf spark-2.3.1-bin-hadoop2.7.tgz
export SPARK_HOME=/usr/local/spark
cd /usr/local/spark/bin ./pyspark
from pyspark import SparkContext sc = SparkContext() lines = sc.textFile("/usr/local/spark/README.md") lines.count() lines.first() 以上就是Ubuntu环境下大数据开发的完整安装指南。从JDK到Hadoop、Scala、Spark,每一步都详细指导,帮助您快速搭建开发环境。
转载地址:http://uzwh.baihongyu.com/