- 积分
- 16840
在线时间 小时
最后登录1970-1-1
|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?开始注册
x
介绍及特点. e p2 x" E# t
Pacemaker:工作在资源分配层,提供资源管理器的功能
. J* ^2 U' _: r) m2 d9 Q8 h7 Q6 | Corosync:提供集群的信息层功能,传递心跳信息和集群事务信息' S, h! I: K1 a' g6 `% h
Pacemaker + Corosync 就可以实现高可用集群架构
1 _4 _2 k# Z! r) j/ F5 k / w: I; m/ J R+ q
集群搭建0 C( u+ s* j& U* n0 z7 ]% J
以下三个节点都需要执行:
, R: G+ z! s! g6 i, D" M" K0 u* ~3 b% w& v) J
# yum install pcs -y
0 h* b4 A7 `! A# systemctl start pcsd ; systemctl enable pcsd8 H6 I. W/ Y: C4 f
# echo 'hacluster' | passwd --stdin hacluster
6 _0 O/ ~" u4 y& I7 p! D2 s, L# yum install haproxy rsyslog -y: a; d1 I$ ?) O$ ` f
# echo 'net.ipv4.ip_nonlocal_bind = 1' >> /etc/sysctl.conf # 启动服务的时候,允许忽视VIP的存在
" Z5 s3 R( s1 B# echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf # 开启内核转发功能
) v/ ~" R7 P% c; w7 Q& w# sysctl -p
6 S4 v) h* |) Q/ x/ {9 A
; [. o) t; _/ C* o在任意节点创建用于haproxy监控Mariadb的用户
/ o% S, h0 F0 o: P8 |MariaDB [(none)]> CREATE USER 'haproxy'@'%' ;
3 w8 }3 U& w& p6 w配置haproxy用于负载均衡器
1 v' N+ u5 ^ [* ]& Q' J
9 m' R$ X' ?2 ?' @, @[root@controller1 ~]# egrep -v "^#|^$" /etc/haproxy/haproxy.cfg3 y& @' V( l5 R9 {( z9 x' e& N5 \7 G5 @$ g
log 127.0.0.1 local2/ ?- R; h; d% o# o
chroot /var/lib/haproxy1 ~( Q( J' k) h9 S! r' |5 E0 x
pidfile /var/run/haproxy.pid: k9 Q- G& F: X
maxconn 4000
( n& S- w& S) r {" q; Z user haproxy
0 o3 q4 b* I9 f7 e& z/ w group haproxy
" {8 y' C' n g$ N daemon
! L( d7 e6 e+ C1 \ # turn on stats unix socket( ]- {5 W l1 Q( `
stats socket /var/lib/haproxy/stats: q8 m% R+ h) Z; R6 S4 ^) k
defaults Q( p, t' d. C* ?" Q+ a/ s7 `" V3 ]
mode http
: n6 ?3 i# D/ e log global
$ G& a( Z c9 E, W8 L9 u7 d option httplog0 ?6 e \) D1 R
option dontlognull
( E( F4 @5 H" e4 p8 g. p option http-server-close9 d8 N3 k( J2 c, k s' J: \
option forwardfor except 127.0.0.0/86 X J: r2 R' d4 ^
option redispatch# B3 ?& n8 k" Y; b D
retries 3
# e( B! Y; f) d; b7 ^ k timeout http-request 10s
" ?7 i# p' k( Q& l, f' `$ y2 I timeout queue 1m' j$ m# Z% R9 m. [0 y" Y" b
timeout connect 10s
1 F/ W' T' h1 ^8 i! q, `5 s& I, ^0 e timeout client 1m
, _( [% V9 p/ |/ d timeout server 1m8 \. a+ [- ?5 }6 A
timeout http-keep-alive 10s" C7 ~+ ^6 T8 Z# ]
timeout check 10s U' j( `& m. f& c
maxconn 40002 t4 U9 J5 I, C4 z
listen galera_cluster( a# Q3 G8 L, [, A7 v
mode tcp " H" h+ z0 E+ s3 Y t
bind 192.168.0.10:3306
3 J5 p0 y6 T' } balance source
6 x) q2 k3 W% N" G: v option mysql-check user haproxy+ G' I$ [$ Z, h5 O9 {! f I8 j; d
server controller1 192.168.0.11:3306 check inter 2000 rise 3 fall 3 backup
1 i9 {% B9 B! w2 J% D, e2 @ server controller2 192.168.0.12:3306 check inter 2000 rise 3 fall 3 ' J' [/ F: L2 n9 v# d9 H
server controller3 192.168.0.13:3306 check inter 2000 rise 3 fall 3 backup
1 C# U; D+ c4 @$ f* u$ Z/ u
6 p5 ~/ Z, T9 I1 z! Tlisten memcache_cluster
- R' u1 S9 s" M2 @- T/ W mode tcp
8 A. |$ C; l% I$ t6 n$ U, b1 W- @ bind 192.168.0.10:11211' V! j% u2 T3 r2 [* T
balance source6 s! a) q7 k5 h
option tcplog/ k* X) s/ a! h t' q) {6 ~$ }
server controller1 192.168.0.11:11211 check inter 2000 rise 3 fall 3 l0 q' I2 A) y" U3 h5 _ x, t8 \1 s
server controller2 192.168.0.12:11211 check inter 2000 rise 3 fall 31 c) J+ f/ t3 d8 ?5 Y$ q2 C7 g
server controller3 192.168.0.13:11211 check inter 2000 rise 3 fall 30 m. l* _% V" B( e% i2 k9 x
' Y' \0 d( l6 I, k0 O, c
3 q3 e+ d$ E* a- u$ G t* ^9 N: O注意:) ?9 U. b2 B" @6 V7 A3 E0 Y
(1)确保haproxy配置无误,建议首先修改ip和端口启动测试是否成功。' T% Y7 ?; s( o( }' _. r9 Q
(2)Mariadb-Galera和rabbitmq默认监听到 0.0.0.0 修改调整监听到本地 192.168.0.x
7 [+ P" I X$ P) l5 @/ s: c (3)将haproxy正确的配置拷贝到其他节点,无需手动启动haproxy服务
0 F! ^* W7 B# C7 |% P为haproxy配置日志(所有controller节点执行):1 w% J/ m* a8 C/ y( l
# b+ {7 y: }4 L0 _, M. a9 o" E" Q# vim /etc/rsyslog.conf
2 g9 `' D4 N% Q7 R: T…
_) G2 l6 q; a( p. M5 e1 H$ModLoad imudp; F/ k+ Q4 I5 h; g* }
$UDPServerRun 514( u0 p* w( P: Z6 ]$ F8 @( A
…
1 _6 j% |* R" ^: dlocal2.* /var/log/haproxy/haproxy.log7 k/ x7 d6 f" p' G. n# Q
…
! F) q- }+ q" |) i* A0 D/ f6 V# v! V( ~* M8 m" p
# mkdir -pv /var/log/haproxy/
: h9 |- p5 ~6 w' O/ o, Gmkdir: created directory ‘/var/log/haproxy/’
W" X& g+ p3 x6 F" z
$ v$ `1 ~- t" F% _, G* d8 T% e5 e# systemctl restart rsyslog' t3 L I- u/ i/ e
1 |" p4 C: t' k* X. D4 f! C2 x1 o/ G3 \启动haproxy进行验证操作:
# p% h: K6 S' d: A' o. f8 C) q+ o
# systemctl start haproxy( C* m. f, A+ _% Q4 ^7 K' R
[root@controller1 ~]# netstat -ntplu | grep ha
, @7 S( H: U, x; F# ^" ?& h; b( \tcp 0 0 192.168.0.10:3306 0.0.0.0:* LISTEN 15467/haproxy * r2 k6 F* X5 W" R/ H5 Y
tcp 0 0 192.168.0.10:11211 0.0.0.0:* LISTEN 15467/haproxy ' P6 T$ J0 F' a
udp 0 0 0.0.0.0:43268 0.0.0.0:* 15466/haproxy; \/ K0 `/ d# {4 z/ u0 e
( A: N- f8 R/ I" ]验证成功,关闭haproxy
" x2 m, l+ f# X# ~# systemctl stop haproxy; h" m z/ b( J1 l# d1 S! p! F* F
9 i6 w/ L" ]. `# ]
: M' x7 }, ?( I0 R( f6 b在controller1节点上执行:
5 M4 R1 J/ k- S* K) {[root@controller1 ~]# pcs cluster auth controller1 controller2 controller3 -u hacluster -p hacluster --force
( [2 ~8 P/ z9 p3 k: m4 `controller3: Authorized
B ^# ^/ _, Q, Q ~controller2: Authorized$ b& |9 B5 o9 q
controller1: Authorized# f9 L) R& l: h
创建集群:
7 v! w6 z# Y9 P% D5 _) U) B# j" D/ f8 P4 N
[root@controller1 ~]# pcs cluster setup --name openstack-cluster controller1 controller2 controller3 --force# y2 h+ s- H# V j; m
Destroying cluster on nodes: controller1, controller2, controller3...: r" x7 i$ m4 k5 ~
controller3: Stopping Cluster (pacemaker)..." \" Z; T9 f/ h, G* A
controller2: Stopping Cluster (pacemaker)...
; ^3 a7 N* {% o) f' V$ H( j+ pcontroller1: Stopping Cluster (pacemaker)...; A9 J0 G0 K8 ^; M. ]6 O
controller3: Successfully destroyed cluster
. X$ f3 z1 y7 w0 @controller1: Successfully destroyed cluster& y8 j5 U4 R! g: Y" b" k
controller2: Successfully destroyed cluster9 y- a5 z2 U; A" Z
5 @/ l* p( }' R, c1 S, vSending 'pacemaker_remote authkey' to 'controller1', 'controller2', 'controller3'
3 l8 c2 S( {# [6 }controller3: successful distribution of the file 'pacemaker_remote authkey'* I9 ?4 z8 ?5 P3 m
controller1: successful distribution of the file 'pacemaker_remote authkey'/ v9 i, h7 A: \+ J l
controller2: successful distribution of the file 'pacemaker_remote authkey'
* Y: _7 L5 a8 o% v$ w- D. ?1 ^: dSending cluster config files to the nodes...
3 d' h$ \% d3 D( e6 O/ Lcontroller1: Succeeded5 T) `2 O/ q3 u0 G( w
controller2: Succeeded
+ h0 J" J, r; Q! Q9 Dcontroller3: Succeeded: j5 ^. f1 e2 P" c
% r) O4 F+ s& ~8 {
Synchronizing pcsd certificates on nodes controller1, controller2, controller3...
2 i4 p3 e- R0 `controller3: Success
4 l! l& @- ^. u% E& ^5 kcontroller2: Success8 M/ E+ ^4 L) S; z0 e: i4 S( P
controller1: Success
* D' P) v5 E2 P, x% ^3 A5 pRestarting pcsd on the nodes in order to reload the certificates...2 \/ [3 R O0 l2 M0 O8 L
controller3: Success$ o) v+ ?/ Q2 X0 w" l+ j5 W, z
controller2: Success
+ A" S% T+ y& A* ocontroller1: Success9 J1 Y1 H5 u5 a9 Z! _+ q. n
{8 g: r* i& u4 [% w+ H: p/ t2 N# {3 x启动集群的所有节点:0 z* v ]2 K& z# I
& G% E3 d5 j: J
[root@controller1 ~]# pcs cluster start --all: _! H, v0 {; v. [
controller2: Starting Cluster...
]3 U: ]0 ^, T T3 N1 e2 p5 V- J0 ?controller1: Starting Cluster...4 Y" y6 i9 V3 S+ N8 r1 K5 j
controller3: Starting Cluster...
5 Y! H1 r! u/ V i( e[root@controller1 ~]# pcs cluster enable --all' A1 B r7 e4 o+ `
controller1: Cluster Enabled6 h6 u( Y& G3 [
controller2: Cluster Enabled
5 z6 T* i# E a) p9 ^5 Mcontroller3: Cluster Enabled
- Y1 h% l# A+ j: M* ^, t& g0 H& L8 ^' W7 m$ r. d0 x
查看集群信息:& M: d, v4 X' U6 W8 B% d
% R% {( A; L9 T c' t) `
[root@controller1 ~]# pcs status
/ ~& W/ o M- p/ SCluster name: openstack-cluster, y" W. F9 i9 R( p+ b- Y6 G
WARNING: no stonith devices and stonith-enabled is not false# g a; y4 Y2 R6 z) ?; o! f
Stack: corosync* N7 {3 I, s! Q' |& \+ t6 b, z
Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
/ W, D3 \: ~& R" A' s# Z+ MLast updated: Thu Nov 30 19:30:43 2017
9 x( A- Z' Z- [( YLast change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller3* s% ?. Q+ j) x
- u0 F, B6 _! ^, S4 ?+ W3 nodes configured
* m5 O' f, R' y+ C! p0 resources configured7 G% ]/ I- F& F$ X/ ^$ _# d
" s& p' G, [$ b" m
Online: [ controller1 controller2 controller3 ]4 G) `7 ^- D) L1 X
' o: c0 ]6 y( G" @
No resources% Q$ d! n, x: w6 A: `7 M
8 O% T* ~! W% k' O8 a" `
; v9 N# m# G3 q" IDaemon Status:. V9 K3 t* T6 _5 m9 c
corosync: active/enabled
. K. z) ^2 b8 Q6 t1 ? Z, D pacemaker: active/enabled
$ t' A! Y! W9 s7 c4 Y% q5 ? pcsd: active/enabled
3 O' s) O9 @4 y6 \9 x/ ?: E[root@controller1 ~]# pcs cluster status$ i5 \7 J& |3 u6 \2 |
Cluster Status:/ j! w* G9 \6 A) N- J9 H
Stack: corosync
2 D, ^" d' ^# i Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum1 l9 F0 P" H0 z- G1 U# b2 U
Last updated: Thu Nov 30 19:30:52 2017
' n! I- @9 N' g7 \: O! Y Last change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller3
/ d2 d9 o1 X, s' m8 a4 g 3 nodes configured
1 W0 d5 C0 W8 ?$ d! A 0 resources configured
- c2 A' e. g1 @! {) F+ c* o6 v. T* v8 W- x8 q% z9 M7 J+ H
PCSD Status:9 h& }. j; D. M5 u# N9 }" g$ Z
controller2: Online' {" J$ b! i$ h: o: n
controller3: Online1 \3 t- \& F7 M# t" W$ I9 V
controller1: Online" S6 J6 I. u. X% `
9 i% \$ w% N2 q& ^5 y9 `三个节点都在线
8 M# b7 k9 g5 y: e$ s默认的表决规则建议集群中的节点个数为奇数且不低于3。当集群只有2个节点,其中1个节点崩坏,由于不符合默认的表决规则, 集群资源不发生转移,集群整体仍不可用。no-quorum-policy="ignore"可以解决此双节点的问题,但不要用于生产环境。换句话说,生 产环境还是至少要3节点。* T; U* V8 Q5 q
pe-warn-series-max、pe-input-series-max、pe-error-series-max代表日志深度。4 D6 i) s- {$ Y& r5 S
cluster-recheck-interval是节点重新检查的频率。
, x- {0 |" I$ |8 L3 c, l[root@controller1 ~]# pcs property set pe-warn-series-max=1000 pe-input-series-max=1000 pe-error-series-max=1000 cluster-recheck-interval=5min3 k- `, x( ]0 v6 C
禁用stonith:
$ g0 p- X# ~1 T2 [8 h# ?& R( Vstonith是一种能够接受指令断电的物理设备,环境无此设备,如果不关闭该选项,执行pcs命令总是含其报错信息。
% X# A, ]! n+ I' ?9 E- G[root@controller1 ~]# pcs property set stonith-enabled=false: k+ G! Z- r3 _
二个节点时,忽略节点quorum功能:4 `% T+ ^; |/ P) K) R
[root@controller1 ~]# pcs property set no-quorum-policy=ignore
. C$ b0 q p6 T/ c+ S验证集群配置信息
2 G, J: p( f; z/ B0 k9 h* f[root@controller1 ~]# crm_verify -L -V5 _4 o! c; R7 L+ |, s( c
为集群配置虚拟 ip6 V' `) `' h5 H S$ F
[root@controller1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \; ~4 j& O ]* y: w
ip="192.168.0.10" cidr_netmask=32 nic=eno16777736 op monitor interval=30s
$ O8 p) o+ M( O9 d, R到此,Pacemaker+corosync 是为 haproxy服务的,添加haproxy资源到pacemaker集群
6 z2 X/ @8 w1 b0 p1 `[root@controller1 ~]# pcs resource create lb-haproxy systemd:haproxy --clone
* |* g5 f" Z/ i5 u; o0 S3 k. N8 U说明:创建克隆资源,克隆的资源会在全部节点启动。这里haproxy会在三个节点自动启动。
4 D1 x7 @( e. i- K$ W查看Pacemaker资源情况& e; y' K2 o4 i! ^/ f0 g8 a
[root@controller1 ~]# pcs resource / e4 K: w% t i/ Q7 ]
ClusterIP (ocf::heartbeat:IPaddr2): Started controller1 # 心跳的资源绑定在第三个节点的0 U8 l% Q% v; V+ ]" N
Clone Set: lb-haproxy-clone [lb-haproxy] # haproxy克隆资源0 k* J v5 b" n) p, c# n% z: ~
Started: [ controller1 controller2 controller3 ]) {% K- T: M$ b2 n3 f0 v: ` E
注意:这里一定要进行资源绑定,否则每个节点都会启动haproxy,造成访问混乱- O" K9 Q; a+ o0 l9 ?3 q; m1 t
将这两个资源绑定到同一个节点上
3 a, S- ~ E+ M* l[root@controller1 ~]# pcs constraint colocation add lb-haproxy-clone ClusterIP INFINITY
- Q# e& A- U3 v4 g绑定成功% K$ y* C8 g" v5 O
[root@controller1 ~]# pcs resource
2 |0 U& h& y/ H' m$ ^ ClusterIP (ocf::heartbeat:IPaddr2): Started controller3" m% e: Z* _! _) E6 s: J b. i
Clone Set: lb-haproxy-clone [lb-haproxy]9 r n, l- C/ f
Started: [ controller1]
! v: n; F) W. g/ q( \0 q/ T Stopped: [ controller2 controller3 ]7 y; v* Y7 T3 H; O$ m, q8 i1 A: j$ t
配置资源的启动顺序,先启动vip,然后haproxy再启动,因为haproxy是监听到vip; l& u8 T' @4 D+ Q+ j2 f
[root@controller1 ~]# pcs constraint order ClusterIP then lb-haproxy-clone' {& L) D7 L. @
手动指定资源到某个默认节点,因为两个资源绑定关系,移动一个资源,另一个资源自动转移。
) M8 w8 ?: X' n! P3 t3 t* D' H
' ~% p; L; r3 L[root@controller1 ~]# pcs constraint location ClusterIP prefers controller18 \- x/ n ?# r' s
[root@controller1 ~]# pcs resource* w# a) l2 e. y- b3 p F
ClusterIP (ocf::heartbeat:IPaddr2): Started controller14 D5 H" ?, c/ G3 g4 B
Clone Set: lb-haproxy-clone [lb-haproxy]
u& Z( S! k( Z$ Y$ s! x' m Started: [ controller1 ]/ d. w1 @' \! E$ f
Stopped: [ controller2 controller3 ]6 h7 w/ [( H) [: N! @
[root@controller1 ~]# pcs resource defaults resource-stickiness=100 # 设置资源粘性,防止自动切回造成集群不稳定
T1 @3 ?& i& A5 j! Z现在vip已经绑定到controller1节点
" J$ ~$ Q' Z2 d6 a2 o8 J* Y" A[root@controller1 ~]# ip a | grep global0 D& i# A% v' W
inet 192.168.0.11/24 brd 192.168.0.255 scope global eno16777736
$ `! N" c/ \5 P. k4 @; M inet 192.168.0.10/32 brd 192.168.0.255 scope global eno16777736# W2 B3 c' \% i
inet 192.168.118.11/24 brd 192.168.118.255 scope global eno33554992) ^7 _- l5 q0 m$ b" A. F3 C
# E4 F4 _- S4 x! ?9 o0 E
尝试通过vip连接数据库: ^/ O* S7 v( C5 P7 ?( T
Controller1:
- r# P L& O6 \4 i3 S
# t0 `' M3 h& M2 f7 w7 U[root@controller1 haproxy]# mysql -ugalera -pgalera -h 192.168.0.107 k3 Y- P. D( l! `) @& R0 h3 p; q
5 k) x O6 ]" ?: |
& H0 {. _, z1 _6 z4 | Controller2:
( e$ u' |4 L) x$ `4 o& m7 [" C8 c; h$ H" L
: {/ A# b( b5 z8 ^
高可用配置成功。. Y5 e% H; ` ?+ `6 O. r* h7 m3 w
* Q7 ?, h3 c& o+ D测试高可用是否正常
2 L( C2 s i- |! d2 C" z在controller1节点上直接执行 poweroff -f
2 i: l- o3 W/ I/ i h# V' E[root@controller1 ~]# poweroff -f C- w$ N/ u" x8 ?' H/ M( a
vip很快就转移到controller2节点上2 f! `7 b6 R$ {0 j P- `7 ~" _
6 ?+ s; v( P: l再次尝试访问数据库) V1 O2 T$ e) \- }* O4 A- {( u( ~& j
* Y/ \8 B4 H8 M1 I7 q( E
; i: e6 k+ Y+ V6 I; x6 o 无任何问题,测试成功。, B/ C, B: H1 n0 w P( P
查看集群信息:
4 N1 c% Y5 ^0 u) H& y# s
( ^4 }% v1 I/ L[root@controller2 ~]# pcs status
2 c, o3 z0 `! PCluster name: openstack-cluster3 B( @. }, v1 U8 E! i
Stack: corosync
! b! K" \; o* _6 S a. W0 E) [Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum2 U, \1 | e( ?# f0 H+ E ^; C" M
Last updated: Thu Nov 30 23:57:28 2017
1 V, k+ u) {' B* A- n7 V& KLast change: Thu Nov 30 23:54:11 2017 by root via crm_attribute on controller1
+ ~0 g! m" o& g8 w" e! e6 V& ^
2 G/ k* K2 { i1 o) {4 {3 nodes configured3 n% U O1 A# @$ ]4 v# a
4 resources configured& W! V3 p: p5 l' U. P( O# ^
# v' @7 C, E* M$ R0 O9 s' a8 [- e
Online: [ controller2 controller3 ]
# }8 D5 u4 v. f% y ^! Y2 e3 N6 pOFFLINE: [ controller1 ] # controller1 已经下线- q5 w$ P6 V4 {" ~3 Z
4 m5 V0 F d( P+ G7 b2 F9 `
Full list of resources:7 B" N* ?8 O5 N/ q( Q1 w4 x
( C; N' U( S! _* ?4 b; Z5 Z% |
ClusterIP (ocf::heartbeat:IPaddr2): Started controller2
3 v3 ]+ ^0 P! Z( H6 |# H Clone Set: lb-haproxy-clone [lb-haproxy]
* {5 H, y2 S) Z) e" N Started: [ controller2 ]3 t3 E! R% i3 z$ m6 B* \ O3 ]
Stopped: [ controller1 controller3 ]" N+ W) _4 k; M' [" W
) Q. L6 x/ C( L+ Y- t/ B' Z/ x# d
Daemon Status:
3 f$ a- f6 {1 E+ O8 }% f) D corosync: active/enabled0 _% \" Q: u8 P
pacemaker: active/enabled) L; q4 C) a9 z
pcsd: active/enabled |
|