- 积分
- 16840
在线时间 小时
最后登录1970-1-1
|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?开始注册
x
介绍及特点
3 _9 q* G$ o; ? Pacemaker:工作在资源分配层,提供资源管理器的功能( D5 r$ q1 W8 L+ _1 m" _
Corosync:提供集群的信息层功能,传递心跳信息和集群事务信息
6 K7 r C6 C; Q+ d7 D: Z5 e3 n# n( N Pacemaker + Corosync 就可以实现高可用集群架构
1 n& C. f/ U8 }; Z; ^$ u& W 4 r" Z& Z2 ~* |) ?7 T& H7 q
集群搭建, H. q, e9 {5 T/ `* i) V
以下三个节点都需要执行:' z; _- {! p, q6 n1 V- n
4 _- S3 O, i o9 M/ g! v3 x/ Y
# yum install pcs -y
* i) |3 J" B, Y9 k( J9 W# systemctl start pcsd ; systemctl enable pcsd; a% I& y4 R6 d& j
# echo 'hacluster' | passwd --stdin hacluster8 r& E* h Q2 }# K5 R
# yum install haproxy rsyslog -y1 T+ O- R0 `4 G! Y
# echo 'net.ipv4.ip_nonlocal_bind = 1' >> /etc/sysctl.conf # 启动服务的时候,允许忽视VIP的存在: B# K; }6 Y+ Y( Q; r, ]1 E
# echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf # 开启内核转发功能
' h# z. d, i; G( P# sysctl -p
$ K. g7 ?% m& O. A5 Q
4 B. S% K" x' Y7 u w2 X2 X' _2 F2 L3 N在任意节点创建用于haproxy监控Mariadb的用户
% ~7 C' L1 v% O: xMariaDB [(none)]> CREATE USER 'haproxy'@'%' ;
' M6 @+ u0 U f( b4 p: O配置haproxy用于负载均衡器* p1 m/ _5 p2 O! w
}: W3 S7 k! p3 B+ u5 C( t[root@controller1 ~]# egrep -v "^#|^$" /etc/haproxy/haproxy.cfg9 N/ Q! |/ Z8 E7 _; Q( E$ d, ~
log 127.0.0.1 local2
! b, p. X8 z# p: d# r1 E chroot /var/lib/haproxy
" x) i/ d7 g/ K' w pidfile /var/run/haproxy.pid& v9 L5 a) e& X5 r6 @+ F
maxconn 40003 B0 ?; H5 P$ Y9 |, l. J, z' O+ |
user haproxy6 ]; s! J( |1 c! a; r: o
group haproxy4 Z: ?$ b# ?# s
daemon5 A5 X) Y5 c& Z" D
# turn on stats unix socket
# s' A6 f& s" g( a% r6 F+ e; s stats socket /var/lib/haproxy/stats% d; I, z) w) M
defaults% i3 A, H% X7 A0 A
mode http
9 \% K0 }" Y5 M+ s+ N: }' `: J7 O log global
# \* r5 @: x$ ]: V8 r1 k) e option httplog N0 O; N# p% Q
option dontlognull
w2 s9 S9 U- G% z option http-server-close
& j5 }) o9 Z/ f" H4 v option forwardfor except 127.0.0.0/82 j* N. `+ P) Q
option redispatch
+ E/ M+ c* `& T- R+ c( X retries 33 q2 C) ~7 N. N
timeout http-request 10s
; z, c; g/ z0 n( U6 n timeout queue 1m B7 [0 n5 y: B! j& e
timeout connect 10s# D2 Q3 S! p, _; X% S
timeout client 1m2 }& n) U9 b n+ u! s8 o
timeout server 1m! c- Z* {6 d A# u) y+ n
timeout http-keep-alive 10s) k0 y& b9 D8 x' k
timeout check 10s
! D' @- u; Q8 Z( w( J, B# P maxconn 40001 Q1 ~" U' u. M5 S4 ?: e0 \& v2 w
listen galera_cluster
+ p4 a+ A* H3 M/ `3 Z mode tcp
; w/ u$ Q: N; f9 c. R bind 192.168.0.10:33060 P# ]2 v8 B' n7 y2 p8 U. G
balance source
( Z- o( e9 F! d/ m option mysql-check user haproxy
" B1 @) y8 l7 C x' r6 c server controller1 192.168.0.11:3306 check inter 2000 rise 3 fall 3 backup
* p1 G+ h6 y4 s+ O/ H server controller2 192.168.0.12:3306 check inter 2000 rise 3 fall 3 U2 u+ D: X* [4 {* P f: }
server controller3 192.168.0.13:3306 check inter 2000 rise 3 fall 3 backup
7 X! I) S# J; W2 D. v
* U2 d8 q4 V& s" x& P# ^4 ylisten memcache_cluster
$ P" C0 T- i5 k% D" Z* z* x7 ~ mode tcp% m7 G( e5 [2 ?5 i! ]
bind 192.168.0.10:11211$ z& v5 q x' a% ]0 a3 `% T
balance source
/ y$ I2 i! u) K8 s option tcplog
* s$ v% f" ]3 T. S3 ] server controller1 192.168.0.11:11211 check inter 2000 rise 3 fall 3
0 e5 H# H. i L0 p6 D' A server controller2 192.168.0.12:11211 check inter 2000 rise 3 fall 3
5 _& k, ?, N& w1 ]/ \' K server controller3 192.168.0.13:11211 check inter 2000 rise 3 fall 3% V0 o1 e5 i z& U+ K5 @+ l
- {; T# e7 F# F: N$ J5 [5 {# a! s & j2 E3 N: X* a9 u0 `% ]* Q
注意:8 a8 I/ k' s7 P" X1 r
(1)确保haproxy配置无误,建议首先修改ip和端口启动测试是否成功。
/ `; }* Y2 u0 k+ v (2)Mariadb-Galera和rabbitmq默认监听到 0.0.0.0 修改调整监听到本地 192.168.0.x/ G0 o2 z* E; U
(3)将haproxy正确的配置拷贝到其他节点,无需手动启动haproxy服务
6 _; j( T+ ^* D为haproxy配置日志(所有controller节点执行):
6 _, p$ W9 }4 Q4 g/ q$ |/ A4 | \1 i4 }0 F( o
# vim /etc/rsyslog.conf; T3 u, I* F$ o/ r+ o* [
…! G. }6 W. N4 d% E
$ModLoad imudp
9 D8 s# V, C; Z. L" ?. u$UDPServerRun 514
$ H/ k; w3 T) Z5 W…$ F! d/ W0 k+ l: G1 B; h- [2 Z3 _
local2.* /var/log/haproxy/haproxy.log+ f, ? ?9 O' k
…
4 G% h3 [3 _0 N
! Q8 U6 p5 z# Z( n# mkdir -pv /var/log/haproxy/# a/ z) N# I. P v* g3 i
mkdir: created directory ‘/var/log/haproxy/’
1 J4 ~" w) y: @2 e$ n+ ?* h |
( j' c' {$ l# W2 p2 G) b( B# systemctl restart rsyslog* Y j- ^2 P: o" b1 a" j# c
0 H5 w: l: b4 i4 Q
启动haproxy进行验证操作:2 `6 B, _" o/ n; u @$ P4 V
6 K/ j- m8 ^1 ?) U6 R1 W4 ~. W# systemctl start haproxy; {% A0 K% C3 \6 v% f$ S- y
[root@controller1 ~]# netstat -ntplu | grep ha. g( b+ V) g) P" ^$ ~% Z! J0 X0 @
tcp 0 0 192.168.0.10:3306 0.0.0.0:* LISTEN 15467/haproxy
2 Y4 X' a4 i$ S0 Q- E1 G1 A8 gtcp 0 0 192.168.0.10:11211 0.0.0.0:* LISTEN 15467/haproxy
" ~ b( A0 v4 ^+ zudp 0 0 0.0.0.0:43268 0.0.0.0:* 15466/haproxy
1 h4 R6 a$ `' `6 Y& x
/ H2 Y0 h, s* T4 L0 l验证成功,关闭haproxy. Z4 s k5 d' p9 ]8 B0 z
# systemctl stop haproxy9 o! @( i) T% N+ X
3 K ]6 S: R5 d4 e; E
7 p( e& h4 [: ~& `( c
在controller1节点上执行:/ I8 q$ \& ?9 E7 i+ O- u
[root@controller1 ~]# pcs cluster auth controller1 controller2 controller3 -u hacluster -p hacluster --force
6 ]" U1 T. E5 l1 ~$ a; Ncontroller3: Authorized& V! l0 P" {' h7 i
controller2: Authorized
1 ?: Z7 y) L* Y, b4 ucontroller1: Authorized. }; j/ p2 ?3 z. m9 {& N( q1 G. y
创建集群:
3 U/ m0 `" y i# o0 Q* V& L
, Y; G' T* I% L) f. i[root@controller1 ~]# pcs cluster setup --name openstack-cluster controller1 controller2 controller3 --force
. [$ N B- N: f5 V0 l$ B$ jDestroying cluster on nodes: controller1, controller2, controller3...4 J) I* ~0 J6 Q9 ~- w0 L
controller3: Stopping Cluster (pacemaker)...- d3 E: \9 D9 K
controller2: Stopping Cluster (pacemaker)...
+ ^9 A& o, y# s- zcontroller1: Stopping Cluster (pacemaker)... l. I! F+ C: }* v6 P9 h
controller3: Successfully destroyed cluster$ c! S ]8 A% I) N* h; {
controller1: Successfully destroyed cluster
. X6 B: v# v; k2 i, @$ scontroller2: Successfully destroyed cluster
" W1 J7 k/ r6 W
1 X; \- c9 i* bSending 'pacemaker_remote authkey' to 'controller1', 'controller2', 'controller3'
+ o, Z: v( h3 h1 r2 Zcontroller3: successful distribution of the file 'pacemaker_remote authkey'9 a4 s! S9 O0 W$ |5 h4 ]
controller1: successful distribution of the file 'pacemaker_remote authkey'
' C: i8 a( S9 C' Q( @6 s% ]. Y+ ccontroller2: successful distribution of the file 'pacemaker_remote authkey'1 B( s9 Q9 \5 ^/ c" ^. i
Sending cluster config files to the nodes...
- p4 _% n! z$ \5 d) B1 qcontroller1: Succeeded) @4 a) _, S% l2 Y: n. {2 D. x0 `
controller2: Succeeded1 b- b3 o0 A# {2 w" h3 ^
controller3: Succeeded
/ \7 V+ w2 \' `! n& M: N* W8 c2 t2 [( R2 Q' n$ d' v* i! B3 `. X
Synchronizing pcsd certificates on nodes controller1, controller2, controller3...% }, I. ]! B3 r/ K! g% M
controller3: Success
" F0 \! G) P" y# G- y8 Zcontroller2: Success( P- J9 u4 B% C
controller1: Success* G" \% }, u6 [0 A3 t6 b
Restarting pcsd on the nodes in order to reload the certificates...
, V, Q3 j+ N$ A9 W% Q, ncontroller3: Success* G0 R. }/ P' s O2 K
controller2: Success
* E- v- e* _& F- S# m/ n6 ^' Lcontroller1: Success
( m! ^; I7 L. a
' ~) E3 P" \/ }) U, B* E启动集群的所有节点:
) P2 ]( H9 M! L4 F( X
$ \7 Y$ E3 N6 g6 V, @' \- v[root@controller1 ~]# pcs cluster start --all1 U2 R% u: d. {3 x9 o# _, c2 u
controller2: Starting Cluster... u) o' v0 m( B
controller1: Starting Cluster...
/ g# s+ E. o* |5 W5 Fcontroller3: Starting Cluster...& X& j5 s, \2 O. ^! v# z
[root@controller1 ~]# pcs cluster enable --all
$ }" n/ v9 f* m% a: T6 T: Ncontroller1: Cluster Enabled, }5 S, c$ |# n z4 n9 I
controller2: Cluster Enabled
/ b: S# E0 d' X. |) A! vcontroller3: Cluster Enabled" i! H7 W! q3 W- G8 m# h
. T- e. _/ Y& \5 t. S; c( Q
查看集群信息:
3 J _* ^ r: N% q3 Z E" T# ]: M% p; j5 U- j& a
[root@controller1 ~]# pcs status
# w% n* t5 F' H5 }0 p9 RCluster name: openstack-cluster/ O8 M0 h" B; ?9 w- j: k
WARNING: no stonith devices and stonith-enabled is not false* l3 @ J' \9 L3 A
Stack: corosync: L4 I/ U2 y, |( d$ @" Y5 S
Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum) F. y% Y: U9 M+ [* q& b. d6 V
Last updated: Thu Nov 30 19:30:43 2017
$ K# S% [7 i! k: N2 K' y4 l9 mLast change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller3
. N* f2 r% \: ~
( K0 J( p7 J# k+ [' u# o2 b+ P: U3 nodes configured
/ N5 c4 a2 \; g) n' {2 |4 U" h0 resources configured
2 a8 E0 W5 Z% G8 [- f$ Q: F7 V* @' ^0 w
Online: [ controller1 controller2 controller3 ]1 D. L. s, U' [ ^5 A! L4 [+ h& |
" B( P8 y+ ^; j% Z! Z5 xNo resources+ j$ t. ^: b8 L8 h9 J
& k' i( W0 U" Y8 z6 `/ A
% h k7 Z# o8 G8 i: _
Daemon Status:- O' p1 T1 d5 I3 U0 J
corosync: active/enabled2 r3 R5 ^1 P" {
pacemaker: active/enabled! t( @" f) U% b3 g6 p
pcsd: active/enabled
1 D& E& j/ h! b( N[root@controller1 ~]# pcs cluster status' k- b2 R5 S) e5 w7 A; b5 Y
Cluster Status:9 M7 Q% o9 \- a& q* P
Stack: corosync
- g- p' D+ w$ l Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum( F# E' U0 S4 _4 a. l( @: D) A
Last updated: Thu Nov 30 19:30:52 2017# E V$ k* F* ^" s5 s$ \' ]8 i! O* U
Last change: Thu Nov 30 19:30:17 2017 by hacluster via crmd on controller3! x; u0 ?2 {; g' f4 |5 b0 E
3 nodes configured
4 k& {5 b& v4 S4 }- o `# H 0 resources configured; n4 h V4 D& R4 E/ Z# x' ?
0 f o* V4 ~/ ^' c4 N
PCSD Status:
' j" D6 Z1 D; Q3 B6 i% B( P controller2: Online/ {" ?8 G% @6 h; M; W+ B
controller3: Online5 \! G* n T/ B0 I- Y% A
controller1: Online! x: v( H1 Y) u8 ^2 {
/ n+ D5 E, Y% U8 X( b三个节点都在线; s6 m1 I4 T& h. y/ d& M
默认的表决规则建议集群中的节点个数为奇数且不低于3。当集群只有2个节点,其中1个节点崩坏,由于不符合默认的表决规则, 集群资源不发生转移,集群整体仍不可用。no-quorum-policy="ignore"可以解决此双节点的问题,但不要用于生产环境。换句话说,生 产环境还是至少要3节点。: P! @- v+ o" C+ Z9 w6 {/ K6 }
pe-warn-series-max、pe-input-series-max、pe-error-series-max代表日志深度。
_9 _* r6 n8 p% T9 o- s3 [/ Bcluster-recheck-interval是节点重新检查的频率。, Q$ j7 U N: U
[root@controller1 ~]# pcs property set pe-warn-series-max=1000 pe-input-series-max=1000 pe-error-series-max=1000 cluster-recheck-interval=5min
# g7 j6 ]. |4 N4 w2 u; U禁用stonith:
7 j, C' I" |+ _( U4 Ystonith是一种能够接受指令断电的物理设备,环境无此设备,如果不关闭该选项,执行pcs命令总是含其报错信息。
! ^; N% d9 F: t+ S; @[root@controller1 ~]# pcs property set stonith-enabled=false1 f& d1 j ^1 {& J; r
二个节点时,忽略节点quorum功能:
, ?+ h: D& V0 w' b7 ?9 ~[root@controller1 ~]# pcs property set no-quorum-policy=ignore
" l" l& F1 i3 ^7 x# U5 N+ k验证集群配置信息. }$ n, |6 T4 [
[root@controller1 ~]# crm_verify -L -V
3 k( h$ @ ?6 q, Q为集群配置虚拟 ip
3 d! l/ l. Y! ]& T( s[root@controller1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
r5 b# H; B( i ip="192.168.0.10" cidr_netmask=32 nic=eno16777736 op monitor interval=30s/ i Q! _0 x# i0 q9 V% ?
到此,Pacemaker+corosync 是为 haproxy服务的,添加haproxy资源到pacemaker集群 h6 Z% G1 Y" M; u
[root@controller1 ~]# pcs resource create lb-haproxy systemd:haproxy --clone
- t2 q$ `9 R! u0 t说明:创建克隆资源,克隆的资源会在全部节点启动。这里haproxy会在三个节点自动启动。
6 U( S8 S$ A9 M- R- y7 F查看Pacemaker资源情况
$ K7 M- u8 O) f/ L; R[root@controller1 ~]# pcs resource ' B6 W% \! s" h5 j
ClusterIP (ocf::heartbeat:IPaddr2): Started controller1 # 心跳的资源绑定在第三个节点的1 s) F: e" i, A* t% n9 }0 l# y5 j- X
Clone Set: lb-haproxy-clone [lb-haproxy] # haproxy克隆资源
* x7 e5 C# ^3 N Started: [ controller1 controller2 controller3 ]
2 H8 L. l' b4 y- {: W6 ~注意:这里一定要进行资源绑定,否则每个节点都会启动haproxy,造成访问混乱
a# V, r# D1 ^# W将这两个资源绑定到同一个节点上 ]" ^% ^7 y E+ [! @
[root@controller1 ~]# pcs constraint colocation add lb-haproxy-clone ClusterIP INFINITY# ~% e) s& z& j* f1 b1 A4 q
绑定成功
0 _( Z# f+ L& C, ^[root@controller1 ~]# pcs resource' k: Y$ S2 ?/ Q( |/ c$ N
ClusterIP (ocf::heartbeat:IPaddr2): Started controller3
5 w# [0 E# h# ?; H7 S" f+ T Clone Set: lb-haproxy-clone [lb-haproxy]
/ [8 Q7 }7 ?& ^2 }1 s Started: [ controller1]
1 |7 u- g6 d" V3 H) U& \ Stopped: [ controller2 controller3 ]
9 V5 X: T3 H! q3 h l配置资源的启动顺序,先启动vip,然后haproxy再启动,因为haproxy是监听到vip
3 j1 |- U, r0 a4 c; L[root@controller1 ~]# pcs constraint order ClusterIP then lb-haproxy-clone) \/ ?4 b. v: U
手动指定资源到某个默认节点,因为两个资源绑定关系,移动一个资源,另一个资源自动转移。
! T" V. l% x, w" s1 Q7 I& e, I% g9 t% h: h% W# a! Z5 C4 i
[root@controller1 ~]# pcs constraint location ClusterIP prefers controller1
% k* S$ v* W3 T* P! {& @$ M[root@controller1 ~]# pcs resource" D$ x x, E/ m3 z- |
ClusterIP (ocf::heartbeat:IPaddr2): Started controller1
$ m% f: {( u) k5 D Clone Set: lb-haproxy-clone [lb-haproxy]( |0 L! R4 K! M {! @4 Y
Started: [ controller1 ]
5 h2 i6 k; J" S Stopped: [ controller2 controller3 ]
( j) Q7 m. i! |: |[root@controller1 ~]# pcs resource defaults resource-stickiness=100 # 设置资源粘性,防止自动切回造成集群不稳定
+ G: k% |) o( v+ t8 l现在vip已经绑定到controller1节点
" |. k. f" [5 r7 q; L[root@controller1 ~]# ip a | grep global
3 f; [' D3 ]% n' l3 S inet 192.168.0.11/24 brd 192.168.0.255 scope global eno16777736
* H" ]6 }% J d( F inet 192.168.0.10/32 brd 192.168.0.255 scope global eno16777736
+ B& y& K! O7 u inet 192.168.118.11/24 brd 192.168.118.255 scope global eno33554992
; [1 G/ G' X3 m( Q0 b: v2 C
& p+ n, C; L" Q+ n# g- _ B+ c尝试通过vip连接数据库
# a# k* K/ @8 X+ m/ lController1:
- @$ p+ a' P( T: Z5 }, H, Y1 g+ w4 B) c/ Z1 G ~ {
[root@controller1 haproxy]# mysql -ugalera -pgalera -h 192.168.0.108 x; i. k9 p8 w1 N" b
$ c8 ? y$ L4 k! O5 O" j
+ ?+ u' Q6 ?0 }0 X$ Z, k& F& G
Controller2:
, a0 E7 N: R3 u% K) D+ g) }" X0 N! \& A
2 D4 r0 {% d1 H% A
高可用配置成功。6 V3 ~1 j' x; L, h y' I. [9 r
9 v6 |2 S$ w9 j2 o& ~测试高可用是否正常
8 f3 v* E* d. u W6 q4 J% r在controller1节点上直接执行 poweroff -f
6 t* S/ U* r. \[root@controller1 ~]# poweroff -f* i3 O6 o9 p, J- z" V
vip很快就转移到controller2节点上
6 G" A7 a- g- w6 g* h) y5 U& U5 k2 }% }& N |5 n
再次尝试访问数据库, t# o) t4 K# z3 p
6 g' K3 B! v; H1 C5 I( I2 [6 k# q
# c) H) F' B1 F4 @
无任何问题,测试成功。
4 @& y& j3 I7 V1 d* V8 W6 l查看集群信息:& i! }7 i6 a. F, D; [
F. q: O4 s- V0 O, | |
[root@controller2 ~]# pcs status ; Y+ [% \0 E: d( e% k2 {
Cluster name: openstack-cluster& C# O& k6 {& D/ l
Stack: corosync
4 g: G! s5 u8 t2 [Current DC: controller3 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
& f h/ Y4 Y' g: e* J3 vLast updated: Thu Nov 30 23:57:28 2017; E& X6 m5 M, [8 H0 {4 [% s
Last change: Thu Nov 30 23:54:11 2017 by root via crm_attribute on controller17 N5 t) T! W5 C. \
6 s! F4 l5 I/ r7 T0 ]; m3 nodes configured
( i( k7 x; y. V! h3 [' s4 T5 q4 resources configured
; F: P# V# M4 `4 [: k, l8 B/ [1 N. r: h: X/ U2 `3 w8 V+ u3 b4 C( g
Online: [ controller2 controller3 ]
, z8 X' H! M, i; w$ lOFFLINE: [ controller1 ] # controller1 已经下线
7 m% |& A, E& [3 l; t$ v" ]( e1 |1 v$ c
Full list of resources:
9 [2 X( ^, _, D- b, V T, S2 e
# s4 [7 F& N! y7 ]$ u; T! t ClusterIP (ocf::heartbeat:IPaddr2): Started controller2
* E: ]& K: ~: M/ V* { Clone Set: lb-haproxy-clone [lb-haproxy]
1 W% Z5 c/ I7 T# c6 _0 l- @1 w Started: [ controller2 ]
$ C! h, l$ ~5 Y2 ?6 E) e Stopped: [ controller1 controller3 ]
1 g2 N0 _0 E* F8 c# x" H/ X6 j Q0 N
Daemon Status:: N( G% {4 M3 [: a' O+ b7 K
corosync: active/enabled7 w9 h) z. u: W: V# X0 r4 I7 w
pacemaker: active/enabled
5 w+ [" X6 R7 S; X pcsd: active/enabled |
|