2009年6月2日 星期二

GFS install on Gentoo Linux

GFS, 一套由 GPL 變成 commercial, 後來又被 Red Hat 買下來變成 GPL 的一套 shared disk file system.

之前我以為這是要錢的,後來經過某長輩 (z1x) 的指正,才知道原來要錢的是,RHCS (Red Hat Cluster Suite),這套通通幫你編好相關的 binary 了,你只要圖形介面按一按,就能 create 你要的 cluster 環境了 (好像某長輩也需要這東西),其實介面就是 Conga 弄出來的。

本來以為 GFS 的運作會像前幾篇介紹的 Gluster, PVFS, Lustre 一樣,後來發現不太像,在這邊花費了相當多的時間。

目前測試的環境:
kernel: 2.6.29-gentoo-r5
gfs: 2.03.09
openais: 0.80.3
cluster-2, cluster-3 都有 linux kernel 某個版本以上的限定,請參考GFS官方網站
不然可能會有類似下面的錯誤訊息:
cluster-3.0.0.rc2 # ./configure
Configuring Makefiles for your system...
Checking tree: nothing to do
Checking kernel:
 Current kernel version: 2.6.28
 Minimum kernel version: 2.6.29
 FAILED!


Step1.
因為 GFS 是 over 在 OpenAIS framwork 上,所以,嗯,非裝不可。
你可以自行從 source 編譯,不過它有 depend Corosync 甚至 nss (Network Security Services) library 或像是 ldap, slang blahblah header files, 我懶,所以直接從 gentoo portage 安裝 :D
$ emerge -uD sys-cluster/openais


Step2.
kernel 編譯,請記得把 multicast, nbd (Network block device support), gfs2, lock_dlm, dlm 都編成模組。

編好後重開機,用新 kernel.


Step3.
安裝相關 userspace 套件。
$ emerge -uD sys-libs/slang (rgmanager need)
$ emerge -uD sys-cluster/rgmanager
$ emerge -uD sys-fs/gfs2

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild  N    ] sys-cluster/cman-lib-2.03.09  1,743 kB
[ebuild  N    ] dev-python/pexpect-2.3  USE="-doc -examples" 148 kB
[ebuild  N    ] sys-cluster/ccs-2.03.09  0 kB
[ebuild  N    ] sys-cluster/openais-0.80.3-r1  USE="-debug" 468 kB
[ebuild  N    ] sys-cluster/dlm-lib-2.03.09  0 kB
[ebuild  N    ] sys-cluster/dlm-2.03.09  0 kB
[ebuild  N    ] sys-cluster/cman-2.03.09-r1  0 kB
[ebuild  N    ] perl-core/libnet-1.22  USE="-sasl" 67 kB
[ebuild  N    ] dev-perl/Net-SSLeay-1.35  130 kB
[ebuild  N    ] virtual/perl-libnet-1.22  0 kB
[ebuild  N    ] dev-perl/Net-Telnet-3.03-r1  35 kB
[ebuild  N    ] sys-cluster/fence-2.03.09-r1  0 kB
[ebuild  N    ] sys-fs/gfs2-2.03.09  USE="-doc" 0 kB

Total: 13 packages (13 new), Size of downloads: 2,588 kB

$ emerge -uD gnbd-kernel
$ emerge -uD sys-block/nbd (如果你 gnbd-kernel 編的過的話,請優先用 gnbd Orz...)
or
$ emerge -uD sys-cluster/gnbd (如果你 gnbd-kernel 編不過的話,請用 nbd = =")


Step4.
載入模組。
$ depmod -a
$ modprobe gfs2
$ modprobe configfs
$ modprobe dlm
$ modprobe lock_dlm
$ modprobe nbd
$ lsmod
Module                  Size  Used by
nbd                    10084  0
lock_dlm               14116  0
dlm                   112656  10 lock_dlm
configfs               22668  2 dlm
gfs2                  332196  1 lock_dlm
$ dmesg
GFS2 (built Jun  1 2009 17:46:14) installed
DLM (built Jun  1 2009 17:45:59) installed
Lock_DLM (built Jun  1 2009 17:46:25) installed
nbd: registered device at major 43


Step5.
組態設定。
$ cat /etc/cluster/cluster.conf (僅需放在其中一個 node, cluster 啟動時會自動複製到其他 node)

$ cat /etc/ais/openais.conf
totem {
        version: 2
        secauth: off
        threads: 0
        nodeid: 2
        interface {
                ringnumber: 0
                bindnetaddr: 140.110.x.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}


Step6.
啟動服務。
你可以簡單的用 /etc/init.d/gfs2
$ /etc/init.d/gfs2 start
 * Loading dlm kernel module ... [ ok ]
 * Loading lock_dlm kernel module ... [ ok ]
 * Mounting ConfigFS ... [ ok ]
 * Starting ccsd ... [ ok ]
 * Starting cman ... [ ok ]
 * Waiting for quorum (300 secs) ... [ ok ]
 * Starting groupd ... [ ok ]
 * Starting fenced ... [ ok ]
 * Joining fence domain ... [ ok ]
 * Starting dlm_controld ... [ ok ]
 * Starting gfs_controld ... [ ok ]
 * Starting gfs2 cluster:
 * Loading gfs2 kernel module ... [ ok ]
或是以下方式啟動 debug 模式:
$ mount -t configfs none /sys/kernel/config
$ ccsd -n
$ cman_tool join -d
$ groupd -D
$ fenced -D
$ dlm_controld -D
$ gfs_controld -D
$ fence_tool join


Step7.
測試!
$ ccs_test connect
Connect successful.
 Connection descriptor = 1950
$ cman status
Version: 6.2.0
Config Version: 1
Cluster Name: mycluster
Cluster Id: 56756
Cluster Member: Yes
Cluster Generation: 216
Membership state: Cluster-Member
Nodes: 3
Expected votes: 1
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 7
Flags: Dirty
Ports Bound: 0
Node name: node26
Node ID: 2
Multicast addresses: 226.94.1.1
Node addresses: 140.110.x.26
$ cman_tool services
type             level name     id       state
fence            0     default  00010002 none
[2 3 4]
$ cman_tool nodes
Node  Sts   Inc   Joined               Name
   2   M    208   2009-06-02 19:00:44  node26
   3   M    212   2009-06-02 19:03:57  node27
   4   M    216   2009-06-02 19:06:37  node28


Step8.
格式化 partition 與掛載。
$ mkfs -t gfs2 -p lock_dlm -t mycluster:testgfs2 -j 4 /dev/cciss/c0d1p1
This will destroy any data on /dev/cciss/c0d1p1.
  It appears to contain a LVM2_member raid.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/cciss/c0d1p1
Blocksize:                 4096
Device Size                33.91 GB (8890316 blocks)
Filesystem Size:           33.91 GB (8890316 blocks)
Journals:                  4
Resource Groups:           136
Locking Protocol:          "lock_dlm"
Lock Table:                "mycluster:testgfs2"
$ mount -t gfs2 -v /dev/cciss/c0d1p1 /mnt/gfs
/sbin/mount.gfs2: mount /dev/cciss/c0d1p1 /mnt/gfs
/sbin/mount.gfs2: parse_opts: opts = "rw"
/sbin/mount.gfs2:   clear flag 1 for "rw", flags = 0
/sbin/mount.gfs2: parse_opts: flags = 0
/sbin/mount.gfs2: parse_opts: extra = ""
/sbin/mount.gfs2: parse_opts: hostdata = ""
/sbin/mount.gfs2: parse_opts: lockproto = ""
/sbin/mount.gfs2: parse_opts: locktable = ""
/sbin/mount.gfs2: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs2: write "join /mnt/gfs gfs2 lock_dlm mycluster:testgfs2 rw /dev/cciss/c0d1p1"
/sbin/mount.gfs2: message from gfs_controld: response to join request:
/sbin/mount.gfs2: lock_dlm_join: read "0"
/sbin/mount.gfs2: message from gfs_controld: mount options:
/sbin/mount.gfs2: lock_dlm_join: read "hostdata=jid=0:id=262146:first=1"
/sbin/mount.gfs2: lock_dlm_join: hostdata: "hostdata=jid=0:id=262146:first=1"
/sbin/mount.gfs2: lock_dlm_join: extra_plus: "hostdata=jid=0:id=262146:first=1"
/sbin/mount.gfs2: mount(2) ok
/sbin/mount.gfs2: lock_dlm_mount_result: write "mount_result /mnt/gfs gfs2 0"
/sbin/mount.gfs2: read_proc_mounts: device = "/dev/cciss/c0d1p1"
/sbin/mount.gfs2: read_proc_mounts: opts = "rw,hostdata=jid=0:id=262146:first=1"
$ df -h
/dev/cciss/c0d1p1      34G  518M   34G   2% /mnt/gfs


Step9.
disk share.
這邊使用的是 native kernel nbd module, 建議用 gnbd.
9.1. nbd server configuration.
$ cat /etc/nbd-server/config (on node26)
[generic]
[export]
    exportname = /dev/cciss/c0d1p1
        port = 2000
        authfile = /etc/nbd-server/allow
$ cat /etc/nbd-server/allow (on node26)
140.110.x.26
140.110.x.27
140.110.x.28
140.110.x.0/24
9.2. nbd server export.
$ nbd-server (on node26)
9.3. nbd client import.
$ nbd-client node26 2000 /dev/nbd0 (on node27)
Negotiation: ..size = 35561264KB
bs=1024, sz=35561264
9.4. mount!
$ mount -t gfs2 /dev/nbd0 /mnt/gfs (on node27)
$ df -h (on node27)
/dev/nbd0              34G  518M   34G   2% /mnt/gfs
$ cman_tool services (on node27)
type             level name      id       state
fence            0     default   00010002 none
[2 3 4]
dlm              1     testgfs2  00020003 none
[3]
gfs              2     testgfs2  00010003 none
[3]


Step10.
另外一台 client 掛載
$ nbd-client node26 2000 /dev/nbd0 (on node28)
Negotiation: ..size = 35561264KB
bs=1024, sz=35561264
$ mount -t gfs2 /dev/nbd0 /mnt/gfs/ (on node28)
$ df -h (on node28)
Filesystem            Size  Used Avail Use% Mounted on
/dev/nbd0              34G  518M   34G   2% /mnt/gfs
$ cman_tool services (on node28)
type             level name      id       state
fence            0     default   00010002 none
[2 3 4]
dlm              1     testgfs2  00020003 none
[3 4]
gfs              2     testgfs2  00010003 none
[3 4]
$ cman_tool services (on node27)
type             level name      id       state
fence            0     default   00010002 none
[2 3 4]
dlm              1     testgfs2  00020003 none
[3 4]
gfs              2     testgfs2  00010003 none
[3 4]


Step11.
concurrent write test.
$ vim /etc/gfs/concurrent_test.txt (on node28)
$ vim /etc/gfs/concurrent_test.txt (on node27)
E325: ATTENTION
Found a swap file by the name ".concurrent_test.txt.swp"
          owned by: root   dated: Wed Jun  3 07:31:38 2009
         file name: /mnt/gfs/concurrent_test.txt
          modified: YES
         user name: root   host name: node28
        process ID: 4454
While opening file "concurrent_test.txt"

(1) Another program may be editing the same file.
    If this is the case, be careful not to end up with two
    different instances of the same file when making changes.
    Quit, or continue with caution.

(2) An edit session for this file crashed.
    If this is the case, use ":recover" or "vim -r concurrent_test.txt"
    to recover the changes (see ":help recovery").
    If you did this already, delete the swap file ".concurrent_test.txt.swp"
    to avoid this message.

Swap file ".concurrent_test.txt.swp" already exists!
"concurrent_test.txt" [New File]


說明:
*停止服務。
umount [-v] "mountpoint"
nbd-client -d /dev/nbd0
fence_tool leave
cman_tool leave
*更新 cluster.conf.
ccs_tool update foo.conf (記得更新 config_version)

革命尚未成功,完整的架構是 gfs2 + gnbd + clvm...

Update:
gnbd 不 support 了... 請看 這裡 與 那裡

沒有留言:

張貼留言