cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4545
Views
11
Helpful
2
Comments
Vikram Hosakote
Cisco Employee
Cisco Employee

Ceph (Ceph Homepage - Ceph) is a great way to deploy persistent storage with OpenStack. Ceph can be used as the persistent storage backend with OpenStack Cinder (GitHub - openstack/cinder: OpenStack Block Storage (Cinder)) for:

  1. Volumes of nova VMs.
  2. Glance images.

Without ceph, storage in OpenStack is ephemeral or temporary and will be deleted when we delete a nova VM.  Hence, ceph is great for:

  1. Persisting the data in the volume of a nova VM even after we delete the nova VM.
  2. Live migration of a nova VM to a different compute host.
  3. Data backup and replication of the data in a nova VM and glance images.
  4. Data reliability in OpenStack (data will still be available if a nova VM crashes).

Here are some useful links about live migration in OpenStack using ceph:

https://docs.openstack.org/nova/pike/admin/configuring-migrations.html

https://docs.openstack.org/ha-guide/storage-ha-backend.html

https://docs.openstack.org/arch-design/design-storage/design-storage-concepts.html

This blog deploys ceph 0.94 (Hammer stable) with OpenStack Ocala.

Ceph Releases — Ceph Documentation

Below are the configurations needed for ceph, glance, cinder and nova in OpenStack Ocata:


On the OpenStack controller host:


/etc/ceph/ceph.conf:

[global]

osd pool default crush rule = 0

osd pool default size = 3

public_network = 172.18.0.0/16

err to syslog = true

mon host = 172.18.7.160,172.18.7.161,172.18.7.162

auth cluster required = none

osd pool default min size = 0 # 0 means no specific default; ceph will use (pool_default_size)-(pool_default_size/2) so 2 if pool_default_size=3

osd pool default pgp num = 128

auth service required = none

mon client hunt interval = 40

log to syslog = true

auth supported = none

auth client required = none

clog to syslog = true

mon_initial_members = controller1,controller2,controller3

cluster_network = 172.18.0.0/16

log file = /dev/null

max open files = 16229437

fsid = c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5

osd pool default pg num = 128

[client]

rbd default map options = rw

rbd cache writethrough until flush = true

log file = /var/log/rbd-clients/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

rbd default features = 3 # sum features digits

admin socket = /var/run/ceph/rbd-clients/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor

rbd concurrent management ops = 20

rbd default format = 2

rbd cache = true

[mon.controller1]

host = controller1

[mon.controller2]

host = controller2

[mon.controller3]

host = controller3

[mon]

mon osd full ratio = 0.95

mon clock drift warn backoff = 30

mon cluster log file = /dev/null

mon lease = 20

mon osd min down reporters = 7 # number of OSDs per host + 1

mon cluster log to syslog = true

mon osd nearfull ratio = 0.9

mon pg warn max object skew = 10 # set to 20 or higher to disable complaints about number of PGs being too low if some pools have very few objects bringing down the average number of objects per pool. This happens when running RadosGW. Ceph default is 10

mon osd down out interval = 600

mon pg warn max per osd = 0 # disable complains about low pgs numbers per osd

mon clock drift allowed = 0.15

mon lease ack timeout = 40

mon lease renew interval = 12

mon osd report timeout = 300

mon osd allow primary affinity = true

mon accept timeout = 40

[osd]

osd scrub sleep = 0.1

osd recovery threads = 1

osd scrub load threshold = 10.0

osd heartbeat grace = 30

filestore op threads = 2

osd scrub begin hour = 0

osd mon heartbeat interval = 30

osd disk thread ioprio priority = 7

osd mount options xfs = noatime,largeio,inode64,swalloc

osd max backfills = 1

osd objectstore = filestore

osd op threads = 2

osd scrub end hour = 24

osd max scrubs = 1

filestore merge threshold = 40

osd recovery max chunk = 1048576

osd mkfs options xfs = -f -i size=2048

osd recovery max active = 1

osd scrub chunk max = 5

osd deep scrub stride = 1048576

osd disk thread ioprio class = idle

filestore max sync interval = 5

filestore split multiple = 8

osd crush update on start = true

osd recovery op priority = 2

osd deep scrub interval = 2419200

osd mkfs type = xfs

osd journal size = 4096

/etc/ceph/ceph.client.admin.keyring:

[client.admin]

  key = ABCDEMtZYzuLGBAAktFtABCDE/ABCDErOs2Nhg==

/etc/glance/glance-api.conf:

[glance_store]

default_store=rbd

stores=rbd, file, http

rbd_store_ceph_conf=/etc/ceph/ceph.conf

rbd_store_pool=nova-images1

rbd_store_chunk_size=8

/etc/glance/glance-scrubber.conf:

[glance_store]

default_store=rbd

stores=rbd, file, http

rbd_store_ceph_conf=/etc/ceph/ceph.conf

rbd_store_pool=nova-images1

rbd_store_chunk_size=8

/etc/cinder/cinder.conf:

[DEFAULT]

default_volume_type=ceph

/etc/cinder/cinder-volume.conf:


[DEFAULT]

enabled_backends=ceph-default

[ceph-default]

volume_driver=cinder.volume.drivers.rbd.RBDDriver

rbd_pool=nova-images1

rbd_store_ceph_conf=/etc/ceph/ceph.conf

rbd_flatten_volume_from_snapshot=false

rbd_max_clone_depth=5

rbd_store_chunk_size=4

volume_backend_name=ceph-default


/etc/rabbitmq/rabbitmq-monitor-queues.conf:


autogen.controller.cinder-scheduler 20 5

autogen.cinder-volume.cinder-volume-1.ceph-default 20 5


On the OpenStack compute host:


/etc/ceph/ceph.conf:


[global]

osd pool default crush rule = 0

osd pool default size = 3

public_network = 172.18.0.0/16

err to syslog = true

mon host = 172.18.7.160,172.18.7.161,172.18.7.162

auth cluster required = none

osd pool default min size = 0 # 0 means no specific default; ceph will use (pool_default_size)-(pool_default_size/2) so 2 if pool_default_size=3

osd pool default pgp num = 128

auth service required = none

mon client hunt interval = 40

log to syslog = true

auth supported = none

auth client required = none

clog to syslog = true

mon_initial_members = controller1,controller2,controller3

cluster_network = 172.18.0.0/16

log file = /dev/null

max open files = 16229437

fsid = c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5

osd pool default pg num = 128

[client]

rbd default map options = rw

rbd cache writethrough until flush = true

log file = /var/log/rbd-clients/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor

rbd default features = 3 # sum features digits

admin socket = /var/run/ceph/rbd-clients/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor

rbd concurrent management ops = 20

rbd default format = 2

rbd cache = true

[mon.controller1]

host = controller1

[mon.controller2]

host = controller2

[mon.controller3]

host = controller3

[mon]

mon osd full ratio = 0.95

mon clock drift warn backoff = 30

mon cluster log file = /dev/null

mon lease = 20

mon osd min down reporters = 7 # number of OSDs per host + 1

mon cluster log to syslog = true

mon osd nearfull ratio = 0.9

mon pg warn max object skew = 10 # set to 20 or higher to disable complaints about number of PGs being too low if some pools have very few objects bringing down the average number of objects per pool. This happens when running RadosGW. Ceph default is 10

mon osd down out interval = 600

mon pg warn max per osd = 0 # disable complains about low pgs numbers per osd

mon clock drift allowed = 0.15

mon lease ack timeout = 40

mon lease renew interval = 12

mon osd report timeout = 300

mon osd allow primary affinity = true

mon accept timeout = 40

[osd]

osd scrub sleep = 0.1

osd recovery threads = 1

osd scrub load threshold = 10.0

osd heartbeat grace = 30

filestore op threads = 2

osd scrub begin hour = 0

osd mon heartbeat interval = 30

osd disk thread ioprio priority = 7

osd mount options xfs = noatime,largeio,inode64,swalloc

osd max backfills = 1

osd objectstore = filestore

osd op threads = 2

osd scrub end hour = 24

osd max scrubs = 1

filestore merge threshold = 40

osd recovery max chunk = 1048576

osd mkfs options xfs = -f -i size=2048

osd recovery max active = 1

osd scrub chunk max = 5

osd deep scrub stride = 1048576

osd disk thread ioprio class = idle

filestore max sync interval = 5

filestore split multiple = 8

osd crush update on start = true

osd recovery op priority = 2

osd deep scrub interval = 2419200

osd mkfs type = xfs

osd journal size = 4096


/etc/nova/nova.conf:

[libvirt]

images_type=rbd

images_rbd_pool=nova-images1

images_rbd_ceph_conf=/etc/ceph/ceph.conf


The OpenStack controller hosts run ceph-mon, glance (glance-api, glance-registry) and cinder (cinder-api, cinder-scheduler) the following way:

ceph-mon:

/usr/bin/ceph-mon -i controller1 --pid-file /var/run/ceph/mon.controller1.pid -c /etc/ceph/ceph.conf --cluster ceph -f


glance (glance-api, glance-registry):


/usr/bin/python2 /usr/bin/glance-api

/usr/bin/python2 /usr/bin/glance-registry

cinder (cinder-api, cinder-scheduler):


/usr/bin/python2 /usr/bin/cinder-api --config-file /etc/cinder/cinder.conf --config-file /etc/cinder/cinder-api.conf

/usr/bin/python2 /usr/bin/cinder-scheduler --config-file /etc/cinder/cinder.conf --config-file /etc/cinder/cinder-scheduler.conf

The OpenStack controller hosts also run the nova services like nova-api, nova-cert, nova-conductor, nova-scheduler, nova-console, nova-consoleauth and nova-novncproxy.

The OpenStack compute hosts run ceph-osd and nova-compute the following way:

ceph-osd:

/usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f

/usr/bin/ceph-osd -i 5 --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph -f

/usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph -f

/usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf --cluster ceph -f

nova-compute:

/usr/bin/python2 /usr/bin/nova-compute --config-file /etc/nova/nova.conf --config-file /etc/nova/nova-compute.conf

There are two ways to deploy ceph in OpenStack:

  1. Ceph-osd on the compute hosts (the way this blog describes).
  2. Ceph-osd on separate storage hosts and not on compute hosts.

If we deploy ceph-osd on the compute hosts, we will lose a ceph host if a compute host crashes.  Also, this approach will increase the CPU and memory usage of the compute host as ceph-osd competes with nova-compute on the compute host.

If we deploy ceph-osd on separate hosts and not on compute hosts, it is best to use fast 40 Gig links between the compute hosts and the storage hosts so that the ceph traffic moves fast between compute host and storage hosts.  Refer Jumbo Mumbo in OpenStack using Cisco's UCS servers and Nexus 9000 to configure jumbo frames in OpenStack as it increases the throughput of the ceph traffic between the compute host and storage hosts. This approach is expensive as we need separate storage hosts for ceph-osd.

Below are the outputs of some useful ceph commands run on the controller host to check if the ceph cluster in OpenStack is in good condition:

# ceph version

ceph version 0.94.9-9.el7cp (b83334e01379f267fb2f9ce729d74a0a8fa1e92c)

# ceph status

    cluster c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5

     health HEALTH_WARN

            clock skew detected on mon.controller2

            1 mons down, quorum 0,1 controller1,controller2

            Monitor clock skew detected

     monmap e1: 3 mons at {controller1=172.18.7.160:6789/0,controller2=172.18.7.161:6789/0,controller3=172.18.7.162:6789/0}

            election epoch 8, quorum 0,1 controller1,controller2

     osdmap e172: 16 osds: 16 up, 16 in

      pgmap v201041: 320 pgs, 3 pools, 583 MB data, 8218 objects

            2746 MB used, 4730 GB / 4733 GB avail

                 320 active+clean

# ceph osd pool stats

pool rbd id 0

  nothing is going on

pool nova-images1 id 1

  nothing is going on

pool backups id 2

  nothing is going on

# ceph pg dump | head -8

dumped all in format plain

version 201067

stamp 2017-10-05 03:35:55.386523

last_osdmap_epoch 172

last_pg_scan 3

full_ratio 0.95

nearfull_ratio 0.9

pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp

2.7d 0 0 0 0 0 0 0 0 active+clean 2017-10-04 21:54:02.754318 0'0 172:45 [9,8,14] 9 [9,8,14] 9 0'0 2017-10-04 21:54:02.754202 0'0 2017-09-27 02:42:12.189613

# ceph mon dump

dumped monmap epoch 1

epoch 1

fsid c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5

last_changed 0.000000

created 0.000000

0: 172.18.7.160:6789/0 mon.controller1

1: 172.18.7.161:6789/0 mon.controller2

2: 172.18.7.162:6789/0 mon.controller3

# ceph df

GLOBAL:

    SIZE      AVAIL     RAW USED     %RAW USED

    4733G     4730G        2753M          0.06

POOLS:

    NAME             ID     USED     %USED     MAX AVAIL     OBJECTS

    rbd              0         0         0         1576G           0

    nova-images1     1      583M      0.04         1576G        8218

    backups          2         0         0         1576G           0

# ceph osd pool ls detail

pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0

pool 1 'nova-images1' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 172 flags hashpspool stripe_width 0

  removed_snaps [1~6,b~8d]

pool 2 'backups' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 3 flags hashpspool stripe_width 0

# ceph -w

    cluster c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5

     health HEALTH_WARN

            clock skew detected on mon.controller2

            1 mons down, quorum 0,1 controller1,controller2

            Monitor clock skew detected

     monmap e1: 3 mons at {controller1=172.18.7.160:6789/0,controller2=172.18.7.161:6789/0,controller3=172.18.7.162:6789/0}

            election epoch 8, quorum 0,1 controller1,controller2

     osdmap e172: 16 osds: 16 up, 16 in

      pgmap v201177: 320 pgs, 3 pools, 664 MB data, 9515 objects

            3097 MB used, 4730 GB / 4733 GB avail

                 320 active+clean

  client io 0 B/s wr, 90 op/s

2017-10-05 03:42:07.745506 mon.0 [INF] pgmap v201176: 320 pgs: 320 active+clean; 669 MB data, 3111 MB used, 4730 GB / 4733 GB avail; 377 kB/s wr, 96 op/s

2017-10-05 03:42:08.755441 mon.0 [INF] pgmap v201177: 320 pgs: 320 active+clean; 664 MB data, 3097 MB used, 4730 GB / 4733 GB avail; 0 B/s wr, 90 op/s

# ceph quorum_status

{"election_epoch":8,"quorum":[0,1],"quorum_names":["controller1","controller2"],"quorum_leader_name":"controller1","monmap":{"epoch":1,"fsid":"c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"controller1","addr":"172.18.7.160:6789\/0"},{"rank":1,"name":"controller2","addr":"172.18.7.161:6789\/0"},{"rank":2,"name":"controller3","addr":"172.18.7.162:6789\/0"}]}}

# ceph mon_status

{"name":"controller1","rank":0,"state":"leader","election_epoch":8,"quorum":[0,1],"outside_quorum":[],"extra_probe_peers":["172.18.7.161:6789\/0","172.18.7.162:6789\/0"],"sync_provider":[],"monmap":{"epoch":1,"fsid":"c29e5118-e0d3-c1c1-1d3d-fc32b8f011c5","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"controller1","addr":"172.18.7.160:6789\/0"},{"rank":1,"name":"controller2","addr":"172.18.7.161:6789\/0"},{"rank":2,"name":"controller3","addr":"172.18.7.162:6789\/0"}]}}

# ceph auth list | head -5

installed auth entries:

osd.0

  key: AQCCF8tZf1O0HxAAXWSfmzMokX5QSPEHr00nvA==

  caps: [mon] allow profile osd

  caps: [osd] allow *

osd.1

# ceph mds stat

e1: 0/0/0 up

# ceph mon stat

e1: 3 mons at {controller1=172.18.7.160:6789/0,controller2=172.18.7.161:6789/0,controller3=172.18.7.162:6789/0}, election epoch 8, quorum 0,1 controller1,controller2

Refer How to attach cinder/ceph XFS volume to a nova instance in OpenStack horizon for the steps how to mount a ceph XFS volume inside a nova VM and read/write data from/to it.

Hope this blog is useful!  Please let me know your comments below! 

2 Comments
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:


Cisco Cloud Native resources: