基本上安裝相當容易,不容易的地方在 Map/Reduce 相關的程式。
在開始前,建議先閱讀 HDFS Architecture 以了解設定上相關角色。
目前安裝在 Fedora 12 上,套件採用最新版的 hadoop-0.20.2
預計 master node: f180, slave node: f172, f173
Step1. 套件下載與設定
套件下載 (所有節點都執行)
# mkdir /usr/src/hadoop/
# cd /usr/src/hadoop/
# wget http://ftp.twaren.net/Unix/Web/apache/hadoop/core/stable/hadoop-0.20.2.tar.gz
# tar xvf hadoop-0.20.2.tar.gz
# cd hadoop-0.20.2
環境設定 (所有節點都執行)
指定 JAVA 目錄 (必要!)
# cat conf/hadoop-env.sh (Fedora 12 目前使用 openjdk)
-# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
+export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk
指定結點資訊
# cat conf/core-site.xml
# cat conf/hdfs-site.xml
# cat conf/mapred-site.xml
設定 master node (只在 master 執行)
[root@f180 hadoop-0.20.2]# cat conf/masters
f180
設定 slave nodes (只在 master 執行)
[root@f180 hadoop-0.20.2]# cat conf/slaves
f172
f173
建立暫存空間
[root@f180 hadoop-0.20.2]# mkfs.btrfs /dev/cciss/c0d1p1
[root@f180 hadoop-0.20.2]# mkdir /mnt/btrfs/hadoop/
[root@f180 hadoop-0.20.2]# mount -t btrfs /dev/cciss/c0d1p1 /mnt/btrfs/hadoop/
[root@f172 hadoop-0.20.2]# mkfs.btrfs /dev/cciss/c0d4p1
[root@f172 hadoop-0.20.2]# mkdir /mnt/btrfs/hadoop/
[root@f172 hadoop-0.20.2]# mount -t btrfs /dev/cciss/c0d4p1 /mnt/btrfs/hadoop/
[root@f173 hadoop-0.20.2]# mkfs.btrfs /dev/cciss/c0d4p1
[root@f173 hadoop-0.20.2]# mkdir /mnt/btrfs/hadoop/
[root@f173 hadoop-0.20.2]# mount -t btrfs /dev/cciss/c0d4p1 /mnt/btrfs/hadoop/
Step2. 初始化檔案系統
[root@f180 hadoop-0.20.2]# bin/hadoop namenode -format
10/05/14 17:17:43 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = f180.twaren.net/211.79.x.180
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/14 17:17:43 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
10/05/14 17:17:43 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/14 17:17:43 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/14 17:17:43 INFO common.Storage: Image file of size 94 saved in 0 seconds.
10/05/14 17:17:44 INFO common.Storage: Storage directory /mnt/btrfs/hadoop/hadoop-root/dfs/name has been successfully formatted.
10/05/14 17:17:44 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at f180.twaren.net/211.79.x.180
************************************************************/
Step3. 啟動 DFS daemon
[root@f180 hadoop-0.20.2]# bin/start-dfs.sh
starting namenode, logging to /usr/src/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-f180.twaren.net.out
f173.twaren.net: starting datanode, logging to /usr/src/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-f173.twaren.net.out
f172.twaren.net: starting datanode, logging to /usr/src/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-f172.twaren.net.out
f180.twaren.net: starting secondarynamenode, logging to /usr/src/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-f180.twaren.net.out
Step4. 啟動 Map/Reduce daemon
[root@f180 hadoop-0.20.2]# bin/start-mapred.sh
starting jobtracker, logging to /usr/src/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-f180.twaren.net.out
f173.twaren.net: starting tasktracker, logging to /usr/src/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-f173.twaren.net.out
f172.twaren.net: starting tasktracker, logging to /usr/src/hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-f172.twaren.net.out
Step5. 運算測試!
幾本上到 Step4 整個設定就算完成了,你可以檢查 logs 目錄相關紀錄,看是否有 errors 產生。
目前 hadoop 附的範例測試程式有 word count, 可到 gutenberg.org 下載一些電子書來計算。
目前抓了六份文件
[root@f180 hadoop-0.20.2]# ls -al /tmp/gutenberg/
total 6196
drwxr-xr-x 2 root root 4096 2010-05-14 17:29 .
drwxrwxrwt. 8 root root 4096 2010-05-14 17:20 ..
-rw-r--r-- 1 root root 343694 2007-12-03 23:28 132.txt
-rw-r--r-- 1 root root 1945731 2007-04-14 04:34 19699.txt
-rw-r--r-- 1 root root 674762 2007-01-22 18:56 20417.txt
-rw-r--r-- 1 root root 1573044 2008-08-01 20:31 4300.txt
-rw-r--r-- 1 root root 1391706 2009-08-14 07:19 7ldvc10.txt
-rw-r--r-- 1 root root 393995 2009-03-18 19:51 972.txt
計算前須將測試檔案載入 HDFS 中
[root@f180 hadoop-0.20.2]# bin/hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg
[root@f180 hadoop-0.20.2]# bin/hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2010-05-14 17:29 /user/root/gutenberg
[root@f180 hadoop-0.20.2]# bin/hadoop dfs -ls gutenberg
Found 6 items
-rw-r--r-- 2 root supergroup 343694 2010-05-14 17:29 /user/root/gutenberg/132.txt
-rw-r--r-- 2 root supergroup 1945731 2010-05-14 17:29 /user/root/gutenberg/19699.txt
-rw-r--r-- 2 root supergroup 674762 2010-05-14 17:29 /user/root/gutenberg/20417.txt
-rw-r--r-- 2 root supergroup 1573044 2010-05-14 17:29 /user/root/gutenberg/4300.txt
-rw-r--r-- 2 root supergroup 1391706 2010-05-14 17:29 /user/root/gutenberg/7ldvc10.txt
-rw-r--r-- 2 root supergroup 393995 2010-05-14 17:29 /user/root/gutenberg/972.txt
執行 Map/Reduce !
[root@f180 hadoop-0.20.2]# bin/hadoop jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output
10/05/14 17:33:51 INFO input.FileInputFormat: Total input paths to process : 6
10/05/14 17:33:52 INFO mapred.JobClient: Running job: job_201005141720_0001
10/05/14 17:33:53 INFO mapred.JobClient: map 0% reduce 0%
10/05/14 17:34:05 INFO mapred.JobClient: map 33% reduce 0%
10/05/14 17:34:08 INFO mapred.JobClient: map 66% reduce 0%
10/05/14 17:34:11 INFO mapred.JobClient: map 100% reduce 0%
10/05/14 17:34:14 INFO mapred.JobClient: map 100% reduce 33%
10/05/14 17:34:20 INFO mapred.JobClient: map 100% reduce 100%
10/05/14 17:34:22 INFO mapred.JobClient: Job complete: job_201005141720_0001
10/05/14 17:34:22 INFO mapred.JobClient: Counters: 17
10/05/14 17:34:22 INFO mapred.JobClient: Job Counters
10/05/14 17:34:22 INFO mapred.JobClient: Launched reduce tasks=1
10/05/14 17:34:22 INFO mapred.JobClient: Launched map tasks=6
10/05/14 17:34:22 INFO mapred.JobClient: Data-local map tasks=6
10/05/14 17:34:22 INFO mapred.JobClient: FileSystemCounters
10/05/14 17:34:22 INFO mapred.JobClient: FILE_BYTES_READ=4241310
10/05/14 17:34:22 INFO mapred.JobClient: HDFS_BYTES_READ=6322932
10/05/14 17:34:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6936977
10/05/14 17:34:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1353587
10/05/14 17:34:22 INFO mapred.JobClient: Map-Reduce Framework
10/05/14 17:34:22 INFO mapred.JobClient: Reduce input groups=123471
10/05/14 17:34:22 INFO mapred.JobClient: Combine output records=185701
10/05/14 17:34:22 INFO mapred.JobClient: Map input records=124099
10/05/14 17:34:22 INFO mapred.JobClient: Reduce shuffle bytes=2695475
10/05/14 17:34:22 INFO mapred.JobClient: Reduce output records=123471
10/05/14 17:34:22 INFO mapred.JobClient: Spilled Records=477165
10/05/14 17:34:22 INFO mapred.JobClient: Map output bytes=10427755
10/05/14 17:34:22 INFO mapred.JobClient: Combine input records=1067656
10/05/14 17:34:22 INFO mapred.JobClient: Map output records=1067656
10/05/14 17:34:22 INFO mapred.JobClient: Reduce input records=185701
運算結果產生於 gutenberg-output
[root@f180 hadoop-0.20.2]# bin/hadoop dfs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2010-05-14 17:29 /user/root/gutenberg
drwxr-xr-x - root supergroup 0 2010-05-14 17:34 /user/root/gutenberg-output
[root@f180 hadoop-0.20.2]# bin/hadoop dfs -ls gutenberg-output
Found 2 items
drwxr-xr-x - root supergroup 0 2010-05-14 17:33 /user/root/gutenberg-output/_logs
-rw-r--r-- 2 root supergroup 1353587 2010-05-14 17:34 /user/root/gutenberg-output/part-r-00000
將運算結果由 HDFS 取回
[root@f180 hadoop-0.20.2]# mkdir /tmp/gutenberg-output
[root@f180 hadoop-0.20.2]# bin/hadoop dfs -getmerge gutenberg-output /tmp/gutenberg-output
[root@f180 hadoop-0.20.2]# head /tmp/gutenberg-output/gutenberg-output
" 34
"'Course 1
"'Spells 1
"'Tis 1
"'Twas 1
"'Twere 1
"'army' 1
"(1) 1
"(Lo)cra" 1
"13 4
最後,Hadoop 提供幾個 Web UI 介面給大家參考。
http://f180:50030/ - web UI for MapReduce job tracker(s)
http://f180:50060/ - web UI for task tracker(s)
http://f180:50070/ - web UI for HDFS name node(s)