Ceph (Ceph Homepage - Ceph) is a great way to deploy persistent storage with OpenStack. Ceph can be used as the persistent storage backend with OpenStack Cinder (GitHub - openstack/cinder: OpenStack Block Storage (Cinder)) for:
Without ceph, storage in OpenStack is ephemeral or temporary and will be deleted when we delete a nova VM. Hence, ceph is great for:
Here are some useful links about live migration in OpenStack using ceph:
https://docs.openstack.org/nova/pike/admin/configuring-migrations.html
https://docs.openstack.org/ha-guide/storage-ha-backend.html
https://docs.openstack.org/arch-design/design-storage/design-storage-concepts.html
This blog deploys ceph 0.94 (Hammer stable) with OpenStack Ocala.
Ceph Releases — Ceph Documentation
Below are the configurations needed for ceph, glance, cinder and nova in OpenStack Ocata:
On the OpenStack controller host:
/etc/ceph/ceph.conf:
[global]
osd pool default crush rule = 0
osd pool default size = 3
public_network = 172.18.0.0/16
err to syslog = true
mon host = 172.18.7.160,172.18.7.161,172.18.7.162
auth cluster required = none
osd pool default min size = 0 # 0 means no specific default; ceph will use (pool_default_size)-(pool_default_size/2) so 2 if pool_default_size=3
osd pool default pgp num = 128
auth service required = none
mon client hunt interval = 40
log to syslog = true
auth supported = none
auth client required = none
clog to syslog = true
mon_initial_members = controller1,controller2,controller3
cluster_network = 172.18.0.0/16
log file = /dev/null
max open files = 16229437
fsid = c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5
osd pool default pg num = 128
[client]
rbd default map options = rw
rbd cache writethrough until flush = true
log file = /var/log/rbd-clients/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor
rbd default features = 3 # sum features digits
admin socket = /var/run/ceph/rbd-clients/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
rbd concurrent management ops = 20
rbd default format = 2
rbd cache = true
[mon.controller1]
host = controller1
[mon.controller2]
host = controller2
[mon.controller3]
host = controller3
[mon]
mon osd full ratio = 0.95
mon clock drift warn backoff = 30
mon cluster log file = /dev/null
mon lease = 20
mon osd min down reporters = 7 # number of OSDs per host + 1
mon cluster log to syslog = true
mon osd nearfull ratio = 0.9
mon pg warn max object skew = 10 # set to 20 or higher to disable complaints about number of PGs being too low if some pools have very few objects bringing down the average number of objects per pool. This happens when running RadosGW. Ceph default is 10
mon osd down out interval = 600
mon pg warn max per osd = 0 # disable complains about low pgs numbers per osd
mon clock drift allowed = 0.15
mon lease ack timeout = 40
mon lease renew interval = 12
mon osd report timeout = 300
mon osd allow primary affinity = true
mon accept timeout = 40
[osd]
osd scrub sleep = 0.1
osd recovery threads = 1
osd scrub load threshold = 10.0
osd heartbeat grace = 30
filestore op threads = 2
osd scrub begin hour = 0
osd mon heartbeat interval = 30
osd disk thread ioprio priority = 7
osd mount options xfs = noatime,largeio,inode64,swalloc
osd max backfills = 1
osd objectstore = filestore
osd op threads = 2
osd scrub end hour = 24
osd max scrubs = 1
filestore merge threshold = 40
osd recovery max chunk = 1048576
osd mkfs options xfs = -f -i size=2048
osd recovery max active = 1
osd scrub chunk max = 5
osd deep scrub stride = 1048576
osd disk thread ioprio class = idle
filestore max sync interval = 5
filestore split multiple = 8
osd crush update on start = true
osd recovery op priority = 2
osd deep scrub interval = 2419200
osd mkfs type = xfs
osd journal size = 4096
/etc/ceph/ceph.client.admin.keyring:
[client.admin]
key = ABCDEMtZYzuLGBAAktFtABCDE/ABCDErOs2Nhg==
/etc/glance/glance-api.conf:
[glance_store]
default_store=rbd
stores=rbd, file, http
rbd_store_ceph_conf=/etc/ceph/ceph.conf
rbd_store_pool=nova-images1
rbd_store_chunk_size=8
/etc/glance/glance-scrubber.conf:
[glance_store]
default_store=rbd
stores=rbd, file, http
rbd_store_ceph_conf=/etc/ceph/ceph.conf
rbd_store_pool=nova-images1
rbd_store_chunk_size=8
/etc/cinder/cinder.conf:
[DEFAULT]
default_volume_type=ceph
/etc/cinder/cinder-volume.conf:
[DEFAULT]
enabled_backends=ceph-default
[ceph-default]
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_pool=nova-images1
rbd_store_ceph_conf=/etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot=false
rbd_max_clone_depth=5
rbd_store_chunk_size=4
volume_backend_name=ceph-default
/etc/rabbitmq/rabbitmq-monitor-queues.conf:
autogen.controller.cinder-scheduler 20 5
autogen.cinder-volume.cinder-volume-1.ceph-default 20 5
On the OpenStack compute host:
/etc/ceph/ceph.conf:
[global]
osd pool default crush rule = 0
osd pool default size = 3
public_network = 172.18.0.0/16
err to syslog = true
mon host = 172.18.7.160,172.18.7.161,172.18.7.162
auth cluster required = none
osd pool default min size = 0 # 0 means no specific default; ceph will use (pool_default_size)-(pool_default_size/2) so 2 if pool_default_size=3
osd pool default pgp num = 128
auth service required = none
mon client hunt interval = 40
log to syslog = true
auth supported = none
auth client required = none
clog to syslog = true
mon_initial_members = controller1,controller2,controller3
cluster_network = 172.18.0.0/16
log file = /dev/null
max open files = 16229437
fsid = c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5
osd pool default pg num = 128
[client]
rbd default map options = rw
rbd cache writethrough until flush = true
log file = /var/log/rbd-clients/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor
rbd default features = 3 # sum features digits
admin socket = /var/run/ceph/rbd-clients/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
rbd concurrent management ops = 20
rbd default format = 2
rbd cache = true
[mon.controller1]
host = controller1
[mon.controller2]
host = controller2
[mon.controller3]
host = controller3
[mon]
mon osd full ratio = 0.95
mon clock drift warn backoff = 30
mon cluster log file = /dev/null
mon lease = 20
mon osd min down reporters = 7 # number of OSDs per host + 1
mon cluster log to syslog = true
mon osd nearfull ratio = 0.9
mon pg warn max object skew = 10 # set to 20 or higher to disable complaints about number of PGs being too low if some pools have very few objects bringing down the average number of objects per pool. This happens when running RadosGW. Ceph default is 10
mon osd down out interval = 600
mon pg warn max per osd = 0 # disable complains about low pgs numbers per osd
mon clock drift allowed = 0.15
mon lease ack timeout = 40
mon lease renew interval = 12
mon osd report timeout = 300
mon osd allow primary affinity = true
mon accept timeout = 40
[osd]
osd scrub sleep = 0.1
osd recovery threads = 1
osd scrub load threshold = 10.0
osd heartbeat grace = 30
filestore op threads = 2
osd scrub begin hour = 0
osd mon heartbeat interval = 30
osd disk thread ioprio priority = 7
osd mount options xfs = noatime,largeio,inode64,swalloc
osd max backfills = 1
osd objectstore = filestore
osd op threads = 2
osd scrub end hour = 24
osd max scrubs = 1
filestore merge threshold = 40
osd recovery max chunk = 1048576
osd mkfs options xfs = -f -i size=2048
osd recovery max active = 1
osd scrub chunk max = 5
osd deep scrub stride = 1048576
osd disk thread ioprio class = idle
filestore max sync interval = 5
filestore split multiple = 8
osd crush update on start = true
osd recovery op priority = 2
osd deep scrub interval = 2419200
osd mkfs type = xfs
osd journal size = 4096
/etc/nova/nova.conf:
[libvirt]
images_type=rbd
images_rbd_pool=nova-images1
images_rbd_ceph_conf=/etc/ceph/ceph.conf
The OpenStack controller hosts run ceph-mon, glance (glance-api, glance-registry) and cinder (cinder-api, cinder-scheduler) the following way:
ceph-mon:
/usr/bin/ceph-mon -i controller1 --pid-file /var/run/ceph/mon.controller1.pid -c /etc/ceph/ceph.conf --cluster ceph -f
glance (glance-api, glance-registry):
/usr/bin/python2 /usr/bin/glance-api
/usr/bin/python2 /usr/bin/glance-registry
cinder (cinder-api, cinder-scheduler):
/usr/bin/python2 /usr/bin/cinder-api --config-file /etc/cinder/cinder.conf --config-file /etc/cinder/cinder-api.conf
/usr/bin/python2 /usr/bin/cinder-scheduler --config-file /etc/cinder/cinder.conf --config-file /etc/cinder/cinder-scheduler.conf
The OpenStack controller hosts also run the nova services like nova-api, nova-cert, nova-conductor, nova-scheduler, nova-console, nova-consoleauth and nova-novncproxy.
The OpenStack compute hosts run ceph-osd and nova-compute the following way:
ceph-osd:
/usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f
/usr/bin/ceph-osd -i 5 --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph -f
/usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph -f
/usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf --cluster ceph -f
nova-compute:
/usr/bin/python2 /usr/bin/nova-compute --config-file /etc/nova/nova.conf --config-file /etc/nova/nova-compute.conf
There are two ways to deploy ceph in OpenStack:
If we deploy ceph-osd on the compute hosts, we will lose a ceph host if a compute host crashes. Also, this approach will increase the CPU and memory usage of the compute host as ceph-osd competes with nova-compute on the compute host.
If we deploy ceph-osd on separate hosts and not on compute hosts, it is best to use fast 40 Gig links between the compute hosts and the storage hosts so that the ceph traffic moves fast between compute host and storage hosts. Refer Jumbo Mumbo in OpenStack using Cisco's UCS servers and Nexus 9000 to configure jumbo frames in OpenStack as it increases the throughput of the ceph traffic between the compute host and storage hosts. This approach is expensive as we need separate storage hosts for ceph-osd.
Below are the outputs of some useful ceph commands run on the controller host to check if the ceph cluster in OpenStack is in good condition:
# ceph version
ceph version 0.94.9-9.el7cp (b83334e01379f267fb2f9ce729d74a0a8fa1e92c)
# ceph status
cluster c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5
health HEALTH_WARN
clock skew detected on mon.controller2
1 mons down, quorum 0,1 controller1,controller2
Monitor clock skew detected
monmap e1: 3 mons at {controller1=172.18.7.160:6789/0,controller2=172.18.7.161:6789/0,controller3=172.18.7.162:6789/0}
election epoch 8, quorum 0,1 controller1,controller2
osdmap e172: 16 osds: 16 up, 16 in
pgmap v201041: 320 pgs, 3 pools, 583 MB data, 8218 objects
2746 MB used, 4730 GB / 4733 GB avail
320 active+clean
# ceph osd pool stats
pool rbd id 0
nothing is going on
pool nova-images1 id 1
nothing is going on
pool backups id 2
nothing is going on
# ceph pg dump | head -8
dumped all in format plain
version 201067
stamp 2017-10-05 03:35:55.386523
last_osdmap_epoch 172
last_pg_scan 3
full_ratio 0.95
nearfull_ratio 0.9
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
2.7d 0 0 0 0 0 0 0 0 active+clean 2017-10-04 21:54:02.754318 0'0 172:45 [9,8,14] 9 [9,8,14] 9 0'0 2017-10-04 21:54:02.754202 0'0 2017-09-27 02:42:12.189613
# ceph mon dump
dumped monmap epoch 1
epoch 1
fsid c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5
last_changed 0.000000
created 0.000000
0: 172.18.7.160:6789/0 mon.controller1
1: 172.18.7.161:6789/0 mon.controller2
2: 172.18.7.162:6789/0 mon.controller3
# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
4733G 4730G 2753M 0.06
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 1576G 0
nova-images1 1 583M 0.04 1576G 8218
backups 2 0 0 1576G 0
# ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 'nova-images1' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 172 flags hashpspool stripe_width 0
removed_snaps [1~6,b~8d]
pool 2 'backups' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 3 flags hashpspool stripe_width 0
# ceph -w
cluster c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5
health HEALTH_WARN
clock skew detected on mon.controller2
1 mons down, quorum 0,1 controller1,controller2
Monitor clock skew detected
monmap e1: 3 mons at {controller1=172.18.7.160:6789/0,controller2=172.18.7.161:6789/0,controller3=172.18.7.162:6789/0}
election epoch 8, quorum 0,1 controller1,controller2
osdmap e172: 16 osds: 16 up, 16 in
pgmap v201177: 320 pgs, 3 pools, 664 MB data, 9515 objects
3097 MB used, 4730 GB / 4733 GB avail
320 active+clean
client io 0 B/s wr, 90 op/s
2017-10-05 03:42:07.745506 mon.0 [INF] pgmap v201176: 320 pgs: 320 active+clean; 669 MB data, 3111 MB used, 4730 GB / 4733 GB avail; 377 kB/s wr, 96 op/s
2017-10-05 03:42:08.755441 mon.0 [INF] pgmap v201177: 320 pgs: 320 active+clean; 664 MB data, 3097 MB used, 4730 GB / 4733 GB avail; 0 B/s wr, 90 op/s
# ceph quorum_status
{"election_epoch":8,"quorum":[0,1],"quorum_names":["controller1","controller2"],"quorum_leader_name":"controller1","monmap":{"epoch":1,"fsid":"c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"controller1","addr":"172.18.7.160:6789\/0"},{"rank":1,"name":"controller2","addr":"172.18.7.161:6789\/0"},{"rank":2,"name":"controller3","addr":"172.18.7.162:6789\/0"}]}}
# ceph mon_status
{"name":"controller1","rank":0,"state":"leader","election_epoch":8,"quorum":[0,1],"outside_quorum":[],"extra_probe_peers":["172.18.7.161:6789\/0","172.18.7.162:6789\/0"],"sync_provider":[],"monmap":{"epoch":1,"fsid":"c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"controller1","addr":"172.18.7.160:6789\/0"},{"rank":1,"name":"controller2","addr":"172.18.7.161:6789\/0"},{"rank":2,"name":"controller3","addr":"172.18.7.162:6789\/0"}]}}
# ceph auth list | head -5
installed auth entries:
osd.0
key: AQCCF8tZf1O0HxAAXWSfmzMokX5QSPEHr00nvA==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
# ceph mds stat
e1: 0/0/0 up
# ceph mon stat
e1: 3 mons at {controller1=172.18.7.160:6789/0,controller2=172.18.7.161:6789/0,controller3=172.18.7.162:6789/0}, election epoch 8, quorum 0,1 controller1,controller2
Refer How to attach cinder/ceph XFS volume to a nova instance in OpenStack horizon for the steps how to mount a ceph XFS volume inside a nova VM and read/write data from/to it.
Hope this blog is useful! Please let me know your comments below!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:
Cisco Cloud Native resources: