Another Nova spec update

I started chasing down the list of spec freeze exceptions that had been requested, and that resulted in the list of specs for Kilo being updated. That updated list is below, but I’ll do a separate post with the exception requests highlighted soon as well.

API

  • Add more detailed network information to the metadata server: review 85673 (approved).
  • Add separated policy rule for each v2.1 api: review 127863 (requested a spec exception).
  • Add user limits to the limits API (as well as project limits): review 127094.
  • Allow all printable characters in resource names: review 126696 (approved).
  • Consolidate all console access APIs into one: review 141065 (approved).
  • Expose the lock status of an instance as a queryable item: review 127139 (abandoned); review 85928 (approved).
  • Extend api to allow specifying vnic_type: review 138808 (requested a spec exception).
  • Implement instance tagging: review 127281 (fast tracked, approved).
  • Implement the v2.1 API: review 126452 (fast tracked, approved).
  • Improve the return codes for the instance lock APIs: review 135506.
  • Microversion support: review 127127 (approved).
  • Move policy validation to just the API layer: review 127160 (approved).
  • Nova Server Count API Extension: review 134279 (fast tracked).
  • Provide a policy statement on the goals of our API policies: review 128560 (abandoned).
  • Sorting enhancements: review 131868 (fast tracked, approved, implemented).
  • Support JSON-Home for API extension discovery: review 130715 (requested a spec exception).
  • Support X509 keypairs: review 105034 (approved).

API (EC2)

  • Expand support for volume filtering in the EC2 API: review 104450.
  • Implement tags for volumes and snapshots with the EC2 API: review 126553 (fast tracked, approved).

Administrative

  • Actively hunt for orphan instances and remove them: review 137996 (abandoned); review 138627.
  • Add totalSecurityGroupRulesUsed to the quota limits: review 145689.
  • Check that a service isn’t running before deleting it: review 131633.
  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705 (abandoned).
  • Implement a daemon version of rootwrap: review 105404 (requested a spec exception).
  • Log request id mappings: review 132819 (fast tracked).
  • Monitor the health of hypervisor hosts: review 137768.
  • Remove the assumption that there is a single endpoint for services that nova talks to: review 132623.

Block Storage

  • Allow direct access to LVM volumes if supported by Cinder: review 127318.
  • Cache data from volumes on local disk: review 138292 (abandoned); review 138619.
  • Enhance iSCSI volume multipath support: review 134299 (requested a spec exception).
  • Failover to alternative iSCSI portals on login failure: review 137468 (requested a spec exception).
  • Give additional info in BDM when source type is “blank”: review 140133.
  • Implement support for a DRBD driver for Cinder block device access: review 134153 (requested a spec exception).
  • Poll volume status: review 142828 (abandoned).
  • Refactor ISCSIDriver to support other iSCSI transports besides TCP: review 130721 (approved).
  • StorPool volume attachment support: review 115716 (approved, requested a spec exception).
  • Support Cinder Volume Multi-attach: review 139580 (approved).
  • Support iSCSI live migration for different iSCSI target: review 132323 (approved).

Cells

Containers Service

Database

  • Develop and implement a profiler for SQL requests: review 142078 (abandoned).
  • Enforce instance uuid uniqueness in the SQL database: review 128097 (fast tracked, approved, implemented).
  • Nova db purge utility: review 132656.
  • Online schema change options: review 102545 (approved).
  • Support DB2 as a SQL database: review 141097 (fast tracked, approved).
  • Validate database migrations and model’: review 134984 (approved).

Hypervisor: Docker

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

  • Allow volumes to be stored on SMB shares instead of just iSCSI: review 102190 (approved, implemented).
  • Instance hot resize: review 141219.

Hypervisor: Ironic

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (fast tracked, approved).
  • Add support for the HTML5 console: review 127283 (requested a spec exception).
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (fast tracked, approved).
  • Enable the mapping of raw cinder devices to instances: review 128697.
  • Implement vSAN support: review 128600 (fast tracked, approved).
  • Support multiple disks inside a single OVA file: review 128691.
  • Support the OVA image format: review 127054 (fast tracked, approved).

Hypervisor: libvirt

Instance features

Internal

  • A lock-free quota implementation: review 135296 (approved).
  • Automate the documentation of the virtual machine state transition graph: review 94835.
  • Fake Libvirt driver for simulating HW testing: review 139927 (abandoned).
  • Flatten Aggregate Metadata in the DB: review 134573 (abandoned).
  • Flatten Instance Metadata in the DB: review 134945 (abandoned).
  • Implement a new code coverage API extension: review 130855.
  • Move flavor data out of the system_metadata table in the SQL database: review 126620 (approved).
  • Move to polling for cinder operations: review 135367.
  • PCI test cases for third party CI: review 141270.
  • Transition Nova to using the Glance v2 API: review 84887 (abandoned).
  • Transition to using glanceclient instead of our own home grown wrapper: review 133485 (approved).

Internationalization

  • Enable lazy translations of strings: review 126717 (fast tracked, approved).

Networking

  • Add a new linuxbridge VIF type, macvtap: review 117465 (abandoned).
  • Add a plugin mechanism for VIF drivers: review 136827 (abandoned).
  • Add support for InfiniBand SR-IOV VIF Driver: review 131729 (requested a spec exception).
  • Neutron DNS Using Nova Hostname: review 90150 (abandoned).
  • New VIF type to allow routing VM data instead of bridging it: review 130732 (approved, requested a spec exception).
  • Nova Plugin for OpenContrail: review 126446 (approved).
  • Refactor of the Neutron network adapter to be more maintainable: review 131413.
  • Use the Nova hostname in Neutron DNS: review 137669.
  • Wrap the Python NeutronClient: review 141108.

Performance

  • Dynamically alter the interval nova polls components at based on load and expected time for an operation to complete: review 122705.

Scheduler

  • A nested quota driver API: review 129420.
  • Add a filter to take into account hypervisor type and version when scheduling: review 137714.
  • Add an IOPS weigher: review 127123 (approved, implemented); review 132614.
  • Add instance count on the hypervisor as a weight: review 127871 (abandoned).
  • Add soft affinity support for server group: review 140017 (approved).
  • Allow extra spec to match all values in a list by adding the ALL-IN operator: review 138698 (fast tracked, approved).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530 (abandoned).
  • Allow the remove of servers from server groups: review 136487.
  • Cache aggregate metadata: review 141846.
  • Convert get_available_resources to use an object instead of dict: review 133728 (abandoned).
  • Convert the resource tracker to objects: review 128964 (fast tracked, approved).
  • Create an object model to represent a request to boot an instance: review 127610 (approved).
  • Decouple services and compute nodes in the SQL database: review 126895 (approved).
  • Distribute PCI Requests Across Multiple Devices: review 142094.
  • Enable adding new scheduler hints to already booted instances: review 134746.
  • Fix the race conditions when migration with server-group: review 135527 (abandoned).
  • Implement resource objects in the resource tracker: review 127609 (approved, requested a spec exception).
  • Improve the ComputeCapabilities filter: review 133534 (requested a spec exception).
  • Isolate Scheduler DB for Filters: review 138444 (requested a spec exception).
  • Isolate the scheduler’s use of the Nova SQL database: review 89893 (approved).
  • Let schedulers reuse filter and weigher objects: review 134506 (abandoned).
  • Move select_destinations() to using a request object: review 127612 (approved).
  • Persist scheduler hints: review 88983.
  • Refactor allocate_for_instance: review 141129.
  • Stop direct lookup for host aggregates in the Nova database: review 132065 (abandoned).
  • Stop direct lookup for instance groups in the Nova database: review 131553 (abandoned).
  • Support scheduling based on more image properties: review 138937.
  • Trusted computing support: review 133106.

Scheduling

Security

  • Make key manager interface interoperable with Barbican: review 140144 (fast tracked, approved).
  • Provide a reference implementation for console proxies that uses TLS: review 126958 (fast tracked, approved).
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507 (approved).

Service Groups

How are we going with Nova Kilo specs after our review day?

Time for another summary I think, because announcing the review day seems to have caused a rush of new specs to be filed (which wasn’t really my intention, but hey). We did approve a fair few specs on the review day, so I think overall it was a success. Here’s an updated summary of the state of play:

API

API (EC2)

  • Expand support for volume filtering in the EC2 API: review 104450.
  • Implement tags for volumes and snapshots with the EC2 API: review 126553 (fast tracked, approved).

Administrative

  • Actively hunt for orphan instances and remove them: review 137996 (abandoned); review 138627.
  • Check that a service isn’t running before deleting it: review 131633.
  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705 (abandoned).
  • Implement a daemon version of rootwrap: review 105404.
  • Log request id mappings: review 132819 (fast tracked).
  • Monitor the health of hypervisor hosts: review 137768.
  • Remove the assumption that there is a single endpoint for services that nova talks to: review 132623.

Block Storage

  • Allow direct access to LVM volumes if supported by Cinder: review 127318.
  • Cache data from volumes on local disk: review 138292 (abandoned); review 138619.
  • Enhance iSCSI volume multipath support: review 134299.
  • Failover to alternative iSCSI portals on login failure: review 137468.
  • Give additional info in BDM when source type is “blank”: review 140133.
  • Implement support for a DRBD driver for Cinder block device access: review 134153.
  • Refactor ISCSIDriver to support other iSCSI transports besides TCP: review 130721 (approved).
  • StorPool volume attachment support: review 115716.
  • Support Cinder Volume Multi-attach: review 139580 (approved).
  • Support iSCSI live migration for different iSCSI target: review 132323 (approved).

Cells

Containers Service

Database

Hypervisor: Docker

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

Hypervisor: Ironic

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (fast tracked, approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (fast tracked, approved).
  • Enable the mapping of raw cinder devices to instances: review 128697.
  • Implement vSAN support: review 128600 (fast tracked, approved).
  • Support multiple disks inside a single OVA file: review 128691.
  • Support the OVA image format: review 127054 (fast tracked, approved).

Hypervisor: libvirt

Instance features

Internal

  • A lock-free quota implementation: review 135296.
  • Automate the documentation of the virtual machine state transition graph: review 94835.
  • Fake Libvirt driver for simulating HW testing: review 139927 (abandoned).
  • Flatten Aggregate Metadata in the DB: review 134573 (abandoned).
  • Flatten Instance Metadata in the DB: review 134945 (abandoned).
  • Implement a new code coverage API extension: review 130855.
  • Move flavor data out of the system_metadata table in the SQL database: review 126620 (approved).
  • Move to polling for cinder operations: review 135367.
  • PCI test cases for third party CI: review 141270.
  • Transition Nova to using the Glance v2 API: review 84887.
  • Transition to using glanceclient instead of our own home grown wrapper: review 133485 (approved).

Internationalization

  • Enable lazy translations of strings: review 126717 (fast tracked).

Networking

Performance

  • Dynamically alter the interval nova polls components at based on load and expected time for an operation to complete: review 122705.

Scheduler

  • A nested quota driver API: review 129420.
  • Add a filter to take into account hypervisor type and version when scheduling: review 137714.
  • Add an IOPS weigher: review 127123 (approved, implemented); review 132614.
  • Add instance count on the hypervisor as a weight: review 127871 (abandoned).
  • Allow extra spec to match all values in a list by adding the ALL-IN operator: review 138698 (fast tracked, approved).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530 (abandoned).
  • Allow the remove of servers from server groups: review 136487.
  • Convert get_available_resources to use an object instead of dict: review 133728 (abandoned).
  • Convert the resource tracker to objects: review 128964 (fast tracked, approved).
  • Create an object model to represent a request to boot an instance: review 127610 (approved).
  • Decouple services and compute nodes in the SQL database: review 126895 (approved).
  • Enable adding new scheduler hints to already booted instances: review 134746.
  • Fix the race conditions when migration with server-group: review 135527 (abandoned).
  • Implement resource objects in the resource tracker: review 127609.
  • Improve the ComputeCapabilities filter: review 133534.
  • Isolate Scheduler DB for Filters: review 138444.
  • Isolate the scheduler’s use of the Nova SQL database: review 89893.
  • Let schedulers reuse filter and weigher objects: review 134506 (abandoned).
  • Move select_destinations() to using a request object: review 127612 (approved).
  • Persist scheduler hints: review 88983.
  • Refactor allocate_for_instance: review 141129.
  • Stop direct lookup for host aggregates in the Nova database: review 132065 (abandoned).
  • Stop direct lookup for instance groups in the Nova database: review 131553 (abandoned).
  • Support scheduling based on more image properties: review 138937.
  • Trusted computing support: review 133106.

Scheduling

Security

  • Make key manager interface interoperable with Barbican: review 140144 (fast tracked, approved).
  • Provide a reference implementation for console proxies that uses TLS: review 126958 (fast tracked, approved).
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.

Service Groups

Sheduler

  • Add soft affinity support for server group: review 140017 (approved).

Specs for Kilo, an update

We’re now a few weeks away from the kilo-1 milestone, so I thought it was time to update my summary of the Nova specifications that have been proposed so far. So here we go…

API

API (EC2)

  • Expand support for volume filtering in the EC2 API: review 104450.
  • Implement tags for volumes and snapshots with the EC2 API: review 126553 (fast tracked, approved).

Administrative

  • Check that a service isn’t running before deleting it: review 131633.
  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705 (abandoned).
  • Enforce instance uuid uniqueness in the SQL database: review 128097 (fast tracked, approved).
  • Implement a daemon version of rootwrap: review 105404.
  • Log request id mappings: review 132819 (fast tracked).
  • Monitor the health of hypervisor hosts: review 137768.
  • Remove the assumption that there is a single endpoint for services that nova talks to: review 132623.

Cells

Containers Service

Database

Hypervisor: Docker

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

  • Allow volumes to be stored on SMB shares instead of just iSCSI: review 102190 (approved).

Hypervisor: Ironic

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (fast tracked, approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (fast tracked, approved).
  • Enable the mapping of raw cinder devices to instances: review 128697.
  • Implement vSAN support: review 128600 (fast tracked, approved).
  • Support multiple disks inside a single OVA file: review 128691.
  • Support the OVA image format: review 127054 (fast tracked, approved).

Hypervisor: ironic

Hypervisor: libvirt

Instance features

Internal

  • A lock-free quota implementation: review 135296.
  • Automate the documentation of the virtual machine state transition graph: review 94835.
  • Flatten Aggregate Metadata in the DB: review 134573.
  • Flatten Instance Metadata in the DB: review 134945.
  • Implement a new code coverage API extension: review 130855.
  • Move flavor data out of the system_metadata table in the SQL database: review 126620 (approved).
  • Move to polling for cinder operations: review 135367.
  • Transition Nova to using the Glance v2 API: review 84887.
  • Transition to using glanceclient instead of our own home grown wrapper: review 133485.

Internationalization

  • Enable lazy translations of strings: review 126717 (fast tracked).

Networking

Performance

  • Dynamically alter the interval nova polls components at based on load and expected time for an operation to complete: review 122705.

Scheduler

  • Add a filter to take into account hypervisor type and version when scheduling: review 137714.
  • Add an IOPS weigher: review 127123 (approved, implemented); review 132614.
  • Add instance count on the hypervisor as a weight: review 127871 (abandoned).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530 (abandoned).
  • Allow the remove of servers from server groups: review 136487.
  • Convert get_available_resources to use an object instead of dict: review 133728.
  • Convert the resource tracker to objects: review 128964 (fast tracked, approved).
  • Create an object model to represent a request to boot an instance: review 127610.
  • Decouple services and compute nodes in the SQL database: review 126895 (approved).
  • Enable adding new scheduler hints to already booted instances: review 134746.
  • Fix the race conditions when migration with server-group: review 135527 (abandoned).
  • Implement resource objects in the resource tracker: review 127609.
  • Improve the ComputeCapabilities filter: review 133534.
  • Isolate the scheduler’s use of the Nova SQL database: review 89893.
  • Let schedulers reuse filter and weigher objects: review 134506 (abandoned).
  • Move select_destinations() to using a request object: review 127612.
  • Persist scheduler hints: review 88983.
  • Stop direct lookup for host aggregates in the Nova database: review 132065 (abandoned).
  • Stop direct lookup for instance groups in the Nova database: review 131553.

Security

  • Provide a reference implementation for console proxies that uses TLS: review 126958 (fast tracked, approved).
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.

Storage

  • Allow direct access to LVM volumes if supported by Cinder: review 127318.
  • Enhance iSCSI volume multipath support: review 134299.
  • Failover to alternative iSCSI portals on login failure: review 137468.
  • Implement support for a DRBD driver for Cinder block device access: review 134153.
  • Refactor ISCSIDriver to support other iSCSI transports besides TCP: review 130721.
  • StorPool volume attachment support: review 115716.
  • Support iSCSI live migration for different iSCSI target: review 132323 (approved).

Specs for Kilo

Here’s an updated list of the specs currently proposed for Kilo. I wanted to produce this before I start travelling for the summit in the next couple of days because I think many of these will be required reading for the Nova track at the summit.

API

  • Add instance administrative lock status to the instance detail results: review 127139 (abandoned).
  • Add more detailed network information to the metadata server: review 85673.
  • Add separated policy rule for each v2.1 api: review 127863.
  • Add user limits to the limits API (as well as project limits): review 127094.
  • Allow all printable characters in resource names: review 126696.
  • Expose the lock status of an instance as a queryable item: review 85928 (approved).
  • Implement instance tagging: review 127281 (fast tracked, approved).
  • Implement tags for volumes and snapshots with the EC2 API: review 126553 (fast tracked, approved).
  • Implement the v2.1 API: review 126452 (fast tracked, approved).
  • Microversion support: review 127127.
  • Move policy validation to just the API layer: review 127160.
  • Provide a policy statement on the goals of our API policies: review 128560.
  • Support X509 keypairs: review 105034.

Administrative

  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705 (abandoned).
  • Enforce instance uuid uniqueness in the SQL database: review 128097 (fast tracked, approved).

Containers Service

Hypervisor: Docker

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

  • Allow volumes to be stored on SMB shares instead of just iSCSI: review 102190 (approved).

Hypervisor: Ironic

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (fast tracked, approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (fast tracked, approved).
  • Enable the mapping of raw cinder devices to instances: review 128697.
  • Implement vSAN support: review 128600 (fast tracked, approved).
  • Support multiple disks inside a single OVA file: review 128691.
  • Support the OVA image format: review 127054 (fast tracked, approved).

Hypervisor: libvirt

Instance features

Internal

  • Move flavor data out of the system_metdata table in the SQL database: review 126620 (approved).
  • Transition Nova to using the Glance v2 API: review 84887.

Internationalization

  • Enable lazy translations of strings: review 126717 (fast tracked).

Performance

  • Dynamically alter the interval nova polls components at based on load and expected time for an operation to complete: review 122705.

Scheduler

  • Add an IOPS weigher: review 127123 (approved).
  • Add instance count on the hypervisor as a weight: review 127871 (abandoned).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530 (abandoned).
  • Convert the resource tracker to objects: review 128964 (fast tracked, approved).
  • Create an object model to represent a request to boot an instance: review 127610.
  • Decouple services and compute nodes in the SQL database: review 126895.
  • Implement resource objects in the resource tracker: review 127609.
  • Isolate the scheduler’s use of the Nova SQL database: review 89893.
  • Move select_destinations() to using a request object: review 127612.

Security

  • Provide a reference implementation for console proxies that uses TLS: review 126958 (fast tracked).
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.

One week of Nova Kilo specifications

Its been one week of specifications for Nova in Kilo. What are we seeing proposed so far? Here’s a summary…

API

Administrative

  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705.

Containers Service

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

  • Allow volumes to be stored on SMB shares instead of just iSCSI: review 102190.

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (spec approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (spec approved).
  • Support the OVA image format: review 127054.

Hypervisor: libvirt

  • Add a new linuxbridge VIF type, macvtap: review 117465.
  • Add support for SMBFS as a image storage backend: review 103203.
  • Convert to using built in libvirt disk copy mechanisms for cold migrations on non-shared storage: review 126979.
  • Support libvirt storage pools: review 126978.
  • Support quiesce filesystems during snapshot: review 126966.

Instance features

  • Allow direct access to LVM volumes if supported by Cinder: review 127318.

Interal

  • Move flavor data out of the system_metdata table in the SQL database: review 126620.

Internationalization

Scheduler

  • Add an IOPS weigher: review 127123 (spec approved).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530.
  • Create an object model to represent a request to boot an instance: review 127610.
  • Decouple services and compute nodes in the SQL database: review 126895.
  • Implement resource objects in the resource tracker: review 127609.
  • Move select_destinations() to using a request object: review 127612.

Scheduling

  • Add instance count on the hypervisor as a weight: review 127871.

Security

  • Provide a reference implementation for console proxies that uses TLS: review 126958.
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.

Compute Kilo specs are open

From my email last week on the topic:

I am pleased to announce that the specs process for nova in kilo is
now open. There are some tweaks to the previous process, so please
read this entire email before uploading your spec!

Blueprints approved in Juno
===========================

For specs approved in Juno, there is a fast track approval process for
Kilo. The steps to get your spec re-approved are:

 - Copy your spec from the specs/juno/approved directory to the
specs/kilo/approved directory. Note that if we declared your spec to
be a "partial" implementation in Juno, it might be in the implemented
directory. This was rare however.
 - Update the spec to match the new template
 - Commit, with the "Previously-approved: juno" commit message tag
 - Upload using git review as normal

Reviewers will still do a full review of the spec, we are not offering
a rubber stamp of previously approved specs. However, we are requiring
only one +2 to merge these previously approved specs, so the process
should be a lot faster.

A note for core reviewers here -- please include a short note on why
you're doing a single +2 approval on the spec so future generations
remember why.

Trivial blueprints
==================

We are not requiring specs for trivial blueprints in Kilo. Instead,
create a blueprint in Launchpad
at https://blueprints.launchpad.net/nova/+addspec and target the
specification to Kilo. New, targeted, unapproved specs will be
reviewed in weekly nova meetings. If it is agreed they are indeed
trivial in the meeting, they will be approved.

Other proposals
===============

For other proposals, the process is the same as Juno... Propose a spec
review against the specs/kilo/approved directory and we'll review it
from there.

After a week I’m seeing something interesting. In Juno the specs process was new, and we saw a pause in the development cycle while people actually wrote down their designs before sending the code. This time around people know what to expect, and there are left over specs from Juno lying around. We’re therefore seeing specs approved much faster than in Kilo. This should reduce the effect of the “pipeline flush” that we saw in Juno.

So far we have five approved specs after only a week.

On layers

There’s been a lot of talk recently about what we should include in OpenStack and what is out of scope. This is interesting, in that many of us used to believe that we should do ”everything”. I think what’s changed is that we’re learning that solving all the problems in the world is hard, and that we need to re-focus on our core products. In this post I want to talk through the various “layers” proposals that have been made in the last month or so. Layers don’t directly address what we should include in OpenStack or not, but they are a useful mechanism for trying to break up OpenStack into simpler to examine chunks, and I think that makes them useful in their own right.

I would address what I believe the scope of the OpenStack project should be, but I feel that it makes this post so long that no one will ever actually read it. Instead, I’ll cover that in a later post in this series. For now, let’s explore what people are proposing as a layering model for OpenStack.

What are layers?

Dean Troyer did a good job of describing a layers model for the OpenStack project on his blog quite a while ago. He proposed the following layers (this is a summary, you should really read his post):

  • layer 0: operating system and Oslo
  • layer 1: basic services — Keystone, Glance, Nova
  • layer 2: extended basics — Neutron, Cinder, Swift, Ironic
  • layer 3: optional services — Horizon and Ceilometer
  • layer 4: turtles all the way up — Heat, Trove, Moniker / Designate, Marconi / Zaqar

Dean notes that Neutron would move to layer 1 when nova-network goes away and Neutron becomes required for all compute deployments. Dean’s post was also over a year ago, so it misses services like Barbican that have appeared since then. Services are only allowed to require services from lower numbered layers, but can use services from higher number layers as optional add ins. So Nova for example can use Neutron, but cannot require it until it moves into layer 1. Similarly, there have been proposals to add Ceilometer as a dependency to schedule instances in Nova, and if we were to do that then we would need to move Ceilometer down to layer 1 as well. (I think doing that would be a mistake by the way, and have argued against it during at least two summits).

Sean Dague re-ignited this discussion with his own blog post relatively recently. Sean proposes new names for most of the layers, but the intent remains the same — a compute-centric view of the services that are required to build a working OpenStack deployment. Sean and Dean’s layer definitions are otherwise strongly aligned, and Sean notes that the probability of seeing something deployed at a given installation reduces as the layer count increases — so for example Trove is way less commonly deployed than Nova, because the set of people who want a managed database as a service is smaller than the set of of people who just want to be able to boot instances.

Now, I’m not sure I agree with the compute centric nature of the two layers proposals mentioned so far. I see people installing just Swift to solve a storage problem, and I think that’s a completely valid use of OpenStack and should be supported as a first class citizen. On the other hand, resolving my concern with the layers model there is trivial — we just move Swift to layer 1.

What do layers give us?

Sean makes a good point about the complexity of OpenStack installs and how we scare away new users. I agree completely — we show people our architecture diagrams which are deliberately confusing, and then we wonder why they’re not impressed. I think we do it because we’re proud of the scope of the thing we’ve built, but I think our audiences walk away thinking that we don’t really know what problem we’re trying to solve. Do I really need to deploy Horizon to have working compute? No of course not, but our architecture diagrams don’t make that obvious. I gave a talk along these lines at pyconau, and I think as a community we need to be better at explaining to people what we’re trying to do, while remembering that not everyone is as excited about writing a whole heap of cloud infrastructure code as we are. This is also why the OpenStack miniconf at linux.conf.au 2015 has pivoted from being a generic OpenStack chatfest to being something more solidly focussed on issues of interest to deployers — we’re just not great at talking to our users and we need to reboot the conversation at community conferences until its something which meets their needs.

We intend this diagram to amaze and confuse our victims

Agreeing on a set of layers gives us a framework within which to describe OpenStack to our users. It lets us communicate the services we think are basic and always required, versus those which are icing on the cake. It also let’s us explain the dependency between projects better, and that helps deployers work out what order to deploy things in.

Do layers help us work out what OpenStack should focus on?

Sean’s blog post then pivots and starts talking about the size of the OpenStack ecosystem — or the “size of our tent” as he phrases it. While I agree that we need to shrink the number of projects we’re working on at the moment, I feel that the blog post is missing a logical link between the previous layers discussion and the tent size conundrum. It feels to me that Sean wanted to propose that OpenStack focus on a specific set of layers, but didn’t quite get there for whatever reason.

Next Monty Taylor had a go at furthering this conversation with his own blog post on the topic. Monty starts by making a very important point — he (like all involved) both want the OpenStack community to be as inclusive as possible. I want lots of interesting people at the design summits, even if they don’t work directly on projects that OpenStack ships. You can be a part of the OpenStack community without having our logo on your product.

A concrete example of including non-OpenStack projects in our wider community was visible at the Atlanta summit — I know for a fact that there were software engineers at the summit who work on Google Compute Engine. I know this because I used to work with them at Google when I was a SRE there. I have no problem with people working on competing products being at our summits, as long as they are there to contribute meaningfully in the sessions, and not just take from us. It needs to be a two way street. Another concrete example is Ceph. I think Ceph is cool, and I’m completely fine with people using it as part of their OpenStack deploy. What upsets me is when people conflate Ceph with OpenStack. They are different. They’re separate. And that is fine. Let’s just not confuse people by saying Ceph is part of the OpenStack project — it simply isn’t because it doesn’t fall under our governance model. Ceph is still a valued member of our community and more than welcome at our summits.

Do layers help us work our what to focus OpenStack on for now? I think they do. Should we simply say that we’re only going to work on a single layer? Absolutely not. What we’ve tried to do up until now is have OpenStack be a single big thing, what we call “the integrated release”. I think layers gives us a tool to find logical ways to break that thing up. Perhaps we need a smaller integrated release, but then continue with the other projects but on their own release cycles? Or perhaps they release at the same time, but we don’t block the release of a layer 1 service on the basis of release critical bugs in a layer 4 service?

Is there consensus on what sits in each layer?

Looking at the posts I can find on this topic so far, I’d have to say the answer is no. We’re close, but we’re not aligned yet. For example, one proposal has a tweak to the previously proposed layer model that adds Cinder, Designate and Neutron down into layer 1 (basic services). The author argues that this is because stateless cloud isn’t particularly useful to users of OpenStack. However, I think this is wrong to be honest. I can see that stateless cloud isn’t super useful by itself, but we are assuming that OpenStack is the only piece of infrastructure that a given organization has. Perhaps that’s true for the public cloud case, but the vast majority of OpenStack deployments at this point are private clouds. So, you’re an existing IT organization and you’re deploying OpenStack to increase the level of flexibility in compute resources. You don’t need to deploy Cinder or Designate to do that. Let’s take the storage case for a second — our hypothetical IT organization probably already has some form of storage — a SAN, or NFS appliances, or something like that. So stateful cloud is easy for them — they just have their instances mount resources from those existing storage pools like they would any other machine. Eventually they’ll decide that hand managing that is horrible and move to Cinder, but that’s probably later once they’ve gotten through the initial baby step of deploying Nova, Glance and Keystone.

The first step to using layers to decide what we should focus on is to decide what is in each layer. I think the conversation needs to revolve around that for now, because it we drift off into whether existing in a given layer means you’re voted off the OpenStack island, when we’ll never even come up with a set of agreed layers.

Let’s ignore tents for now

The size of the OpenStack “tent” is the metaphor being used at the moment for working out what to include in OpenStack. As I say above, I think we need to reach agreement on what is in each layer before we can move on to that very important conversation.

Conclusion

Given the focus of this post is the layers model, I want to stop introducing new concepts here for now. Instead let me summarize where I stand so far — I think the layers model is useful. I also think the layers should be an inverted pyramid — layer 1 should be as small as possible for example. This is because of the dependency model that the layers model proposes — it is important to keep the list of things that a layer 2 service must use as small and coherent as possible. Another reason to keep the lower layers as small as possible is because each layer represents the smallest possible increment of an OpenStack deployment that we think is reasonable. We believe it is currently reasonable to deploy Nova without Cinder or Neutron for example.

Most importantly of all, having those incremental stages of OpenStack deployment gives us a framework we have been missing in talking to our deployers and users. It makes OpenStack less confusing to outsiders, as it gives them bite sized morsels to consume one at a time.

So here are the layers as I see them for now:

  • layer 0: operating system, and Oslo
  • layer 1: basic services — Keystone, Glance, Nova, and Swift
  • layer 2: extended basics — Neutron, Cinder, and Ironic
  • layer 3: optional services — Horizon, and Ceilometer
  • layer 4: application services — Heat, Trove, Designate, and Zaqar

I am not saying that everything inside a single layer is required to be deployed simultaneously, but I do think its reasonable for Ceilometer to assume that Swift is installed and functioning. The big difference here between my view of layers and that of Dean, Sean and Monty is that I think that Swift is a layer 1 service — it provides basic functionality that may be assumed to exist by services above it in the model.

I believe that when projects come to the Technical Committee requesting incubation or integration, they should specify what layer they see their project sitting at, and the justification for a lower layer number should be harder than that for a higher layer. So for example, we should be reasonably willing to accept proposals at layer 4, whilst we should be super concerned about the implications of adding another project at layer 1.

In the next post in this series I’ll try to address the size of the OpenStack “tent”, and what projects we should be focussing on.

My candidacy for Kilo Compute PTL

This is mostly historical at this point, but I forgot to post it here when I emailed it a week or so ago. So, for future reference:

I'd like another term as Compute PTL, if you'll have me.

We live in interesting times. openstack has clearly gained a large
amount of mind share in the open cloud marketplace, with Nova being a
very commonly deployed component. Yet, we don't have a fantastic
container solution, which is our biggest feature gap at this point.
Worse -- we have a code base with a huge number of bugs filed against
it, an unreliable gate because of subtle bugs in our code and
interactions with other openstack code, and have a continued need to
add features to stay relevant. These are hard problems to solve.

Interestingly, I think the solution to these problems calls for a
social approach, much like I argued for in my Juno PTL candidacy
email. The problems we face aren't purely technical -- we need to work
out how to pay down our technical debt without blocking all new
features. We also need to ask for understanding and patience from
those feature authors as we try and improve the foundation they are
building on.

The specifications process we used in Juno helped with these problems,
but one of the things we've learned from the experiment is that we
don't require specifications for all changes. Let's take an approach
where trivial changes (no API changes, only one review to implement)
don't require a specification. There will of course sometimes be
variations on that rule if we discover something, but it means that
many micro-features will be unblocked.

In terms of technical debt, I don't personally believe that pulling
all hypervisor drivers out of Nova fixes the problems we face, it just
moves the technical debt to a different repository. However, we
clearly need to discuss the way forward at the summit, and come up
with some sort of plan. If we do something like this, then I am not
sure that the hypervisor driver interface is the right place to do
that work -- I'd rather see something closer to the hypervisor itself
so that the Nova business logic stays with Nova.

Kilo is also the release where we need to get the v2.1 API work done
now that we finally have a shared vision for how to progress. It took
us a long time to get to a good shared vision there, so we need to
ensure that we see that work through to the end.

We live in interesting times, but they're also exciting as well.

I have since been elected unopposed, so thanks for that!