How are we going with Nova Kilo specs after our review day?

Share

Time for another summary I think, because announcing the review day seems to have caused a rush of new specs to be filed (which wasn’t really my intention, but hey). We did approve a fair few specs on the review day, so I think overall it was a success. Here’s an updated summary of the state of play:

API

API (EC2)

  • Expand support for volume filtering in the EC2 API: review 104450.
  • Implement tags for volumes and snapshots with the EC2 API: review 126553 (fast tracked, approved).

Administrative

  • Actively hunt for orphan instances and remove them: review 137996 (abandoned); review 138627.
  • Check that a service isn’t running before deleting it: review 131633.
  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705 (abandoned).
  • Implement a daemon version of rootwrap: review 105404.
  • Log request id mappings: review 132819 (fast tracked).
  • Monitor the health of hypervisor hosts: review 137768.
  • Remove the assumption that there is a single endpoint for services that nova talks to: review 132623.

Block Storage

  • Allow direct access to LVM volumes if supported by Cinder: review 127318.
  • Cache data from volumes on local disk: review 138292 (abandoned); review 138619.
  • Enhance iSCSI volume multipath support: review 134299.
  • Failover to alternative iSCSI portals on login failure: review 137468.
  • Give additional info in BDM when source type is “blank”: review 140133.
  • Implement support for a DRBD driver for Cinder block device access: review 134153.
  • Refactor ISCSIDriver to support other iSCSI transports besides TCP: review 130721 (approved).
  • StorPool volume attachment support: review 115716.
  • Support Cinder Volume Multi-attach: review 139580 (approved).
  • Support iSCSI live migration for different iSCSI target: review 132323 (approved).

Cells

Containers Service

Database

Hypervisor: Docker

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

Hypervisor: Ironic

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (fast tracked, approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (fast tracked, approved).
  • Enable the mapping of raw cinder devices to instances: review 128697.
  • Implement vSAN support: review 128600 (fast tracked, approved).
  • Support multiple disks inside a single OVA file: review 128691.
  • Support the OVA image format: review 127054 (fast tracked, approved).

Hypervisor: libvirt

Instance features

Internal

  • A lock-free quota implementation: review 135296.
  • Automate the documentation of the virtual machine state transition graph: review 94835.
  • Fake Libvirt driver for simulating HW testing: review 139927 (abandoned).
  • Flatten Aggregate Metadata in the DB: review 134573 (abandoned).
  • Flatten Instance Metadata in the DB: review 134945 (abandoned).
  • Implement a new code coverage API extension: review 130855.
  • Move flavor data out of the system_metadata table in the SQL database: review 126620 (approved).
  • Move to polling for cinder operations: review 135367.
  • PCI test cases for third party CI: review 141270.
  • Transition Nova to using the Glance v2 API: review 84887.
  • Transition to using glanceclient instead of our own home grown wrapper: review 133485 (approved).

Internationalization

  • Enable lazy translations of strings: review 126717 (fast tracked).

Networking

Performance

  • Dynamically alter the interval nova polls components at based on load and expected time for an operation to complete: review 122705.

Scheduler

  • A nested quota driver API: review 129420.
  • Add a filter to take into account hypervisor type and version when scheduling: review 137714.
  • Add an IOPS weigher: review 127123 (approved, implemented); review 132614.
  • Add instance count on the hypervisor as a weight: review 127871 (abandoned).
  • Allow extra spec to match all values in a list by adding the ALL-IN operator: review 138698 (fast tracked, approved).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530 (abandoned).
  • Allow the remove of servers from server groups: review 136487.
  • Convert get_available_resources to use an object instead of dict: review 133728 (abandoned).
  • Convert the resource tracker to objects: review 128964 (fast tracked, approved).
  • Create an object model to represent a request to boot an instance: review 127610 (approved).
  • Decouple services and compute nodes in the SQL database: review 126895 (approved).
  • Enable adding new scheduler hints to already booted instances: review 134746.
  • Fix the race conditions when migration with server-group: review 135527 (abandoned).
  • Implement resource objects in the resource tracker: review 127609.
  • Improve the ComputeCapabilities filter: review 133534.
  • Isolate Scheduler DB for Filters: review 138444.
  • Isolate the scheduler’s use of the Nova SQL database: review 89893.
  • Let schedulers reuse filter and weigher objects: review 134506 (abandoned).
  • Move select_destinations() to using a request object: review 127612 (approved).
  • Persist scheduler hints: review 88983.
  • Refactor allocate_for_instance: review 141129.
  • Stop direct lookup for host aggregates in the Nova database: review 132065 (abandoned).
  • Stop direct lookup for instance groups in the Nova database: review 131553 (abandoned).
  • Support scheduling based on more image properties: review 138937.
  • Trusted computing support: review 133106.

Scheduling

Security

  • Make key manager interface interoperable with Barbican: review 140144 (fast tracked, approved).
  • Provide a reference implementation for console proxies that uses TLS: review 126958 (fast tracked, approved).
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.

Service Groups

Sheduler

  • Add soft affinity support for server group: review 140017 (approved).
Share

Specs for Kilo, an update

Share

We’re now a few weeks away from the kilo-1 milestone, so I thought it was time to update my summary of the Nova specifications that have been proposed so far. So here we go…

API

API (EC2)

  • Expand support for volume filtering in the EC2 API: review 104450.
  • Implement tags for volumes and snapshots with the EC2 API: review 126553 (fast tracked, approved).

Administrative

  • Check that a service isn’t running before deleting it: review 131633.
  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705 (abandoned).
  • Enforce instance uuid uniqueness in the SQL database: review 128097 (fast tracked, approved).
  • Implement a daemon version of rootwrap: review 105404.
  • Log request id mappings: review 132819 (fast tracked).
  • Monitor the health of hypervisor hosts: review 137768.
  • Remove the assumption that there is a single endpoint for services that nova talks to: review 132623.

Cells

Containers Service

Database

Hypervisor: Docker

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

  • Allow volumes to be stored on SMB shares instead of just iSCSI: review 102190 (approved).

Hypervisor: Ironic

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (fast tracked, approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (fast tracked, approved).
  • Enable the mapping of raw cinder devices to instances: review 128697.
  • Implement vSAN support: review 128600 (fast tracked, approved).
  • Support multiple disks inside a single OVA file: review 128691.
  • Support the OVA image format: review 127054 (fast tracked, approved).

Hypervisor: ironic

Hypervisor: libvirt

Instance features

Internal

  • A lock-free quota implementation: review 135296.
  • Automate the documentation of the virtual machine state transition graph: review 94835.
  • Flatten Aggregate Metadata in the DB: review 134573.
  • Flatten Instance Metadata in the DB: review 134945.
  • Implement a new code coverage API extension: review 130855.
  • Move flavor data out of the system_metadata table in the SQL database: review 126620 (approved).
  • Move to polling for cinder operations: review 135367.
  • Transition Nova to using the Glance v2 API: review 84887.
  • Transition to using glanceclient instead of our own home grown wrapper: review 133485.

Internationalization

  • Enable lazy translations of strings: review 126717 (fast tracked).

Networking

Performance

  • Dynamically alter the interval nova polls components at based on load and expected time for an operation to complete: review 122705.

Scheduler

  • Add a filter to take into account hypervisor type and version when scheduling: review 137714.
  • Add an IOPS weigher: review 127123 (approved, implemented); review 132614.
  • Add instance count on the hypervisor as a weight: review 127871 (abandoned).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530 (abandoned).
  • Allow the remove of servers from server groups: review 136487.
  • Convert get_available_resources to use an object instead of dict: review 133728.
  • Convert the resource tracker to objects: review 128964 (fast tracked, approved).
  • Create an object model to represent a request to boot an instance: review 127610.
  • Decouple services and compute nodes in the SQL database: review 126895 (approved).
  • Enable adding new scheduler hints to already booted instances: review 134746.
  • Fix the race conditions when migration with server-group: review 135527 (abandoned).
  • Implement resource objects in the resource tracker: review 127609.
  • Improve the ComputeCapabilities filter: review 133534.
  • Isolate the scheduler’s use of the Nova SQL database: review 89893.
  • Let schedulers reuse filter and weigher objects: review 134506 (abandoned).
  • Move select_destinations() to using a request object: review 127612.
  • Persist scheduler hints: review 88983.
  • Stop direct lookup for host aggregates in the Nova database: review 132065 (abandoned).
  • Stop direct lookup for instance groups in the Nova database: review 131553.

Security

  • Provide a reference implementation for console proxies that uses TLS: review 126958 (fast tracked, approved).
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.

Storage

  • Allow direct access to LVM volumes if supported by Cinder: review 127318.
  • Enhance iSCSI volume multipath support: review 134299.
  • Failover to alternative iSCSI portals on login failure: review 137468.
  • Implement support for a DRBD driver for Cinder block device access: review 134153.
  • Refactor ISCSIDriver to support other iSCSI transports besides TCP: review 130721.
  • StorPool volume attachment support: review 115716.
  • Support iSCSI live migration for different iSCSI target: review 132323 (approved).
Share

Specs for Kilo

Share

Here’s an updated list of the specs currently proposed for Kilo. I wanted to produce this before I start travelling for the summit in the next couple of days because I think many of these will be required reading for the Nova track at the summit.

API

  • Add instance administrative lock status to the instance detail results: review 127139 (abandoned).
  • Add more detailed network information to the metadata server: review 85673.
  • Add separated policy rule for each v2.1 api: review 127863.
  • Add user limits to the limits API (as well as project limits): review 127094.
  • Allow all printable characters in resource names: review 126696.
  • Expose the lock status of an instance as a queryable item: review 85928 (approved).
  • Implement instance tagging: review 127281 (fast tracked, approved).
  • Implement tags for volumes and snapshots with the EC2 API: review 126553 (fast tracked, approved).
  • Implement the v2.1 API: review 126452 (fast tracked, approved).
  • Microversion support: review 127127.
  • Move policy validation to just the API layer: review 127160.
  • Provide a policy statement on the goals of our API policies: review 128560.
  • Support X509 keypairs: review 105034.

Administrative

  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705 (abandoned).
  • Enforce instance uuid uniqueness in the SQL database: review 128097 (fast tracked, approved).

Containers Service

Hypervisor: Docker

Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.

Hypervisor: Hyper-V

  • Allow volumes to be stored on SMB shares instead of just iSCSI: review 102190 (approved).

Hypervisor: Ironic

Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (fast tracked, approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (fast tracked, approved).
  • Enable the mapping of raw cinder devices to instances: review 128697.
  • Implement vSAN support: review 128600 (fast tracked, approved).
  • Support multiple disks inside a single OVA file: review 128691.
  • Support the OVA image format: review 127054 (fast tracked, approved).

Hypervisor: libvirt

Instance features

Internal

  • Move flavor data out of the system_metdata table in the SQL database: review 126620 (approved).
  • Transition Nova to using the Glance v2 API: review 84887.

Internationalization

  • Enable lazy translations of strings: review 126717 (fast tracked).

Performance

  • Dynamically alter the interval nova polls components at based on load and expected time for an operation to complete: review 122705.

Scheduler

  • Add an IOPS weigher: review 127123 (approved).
  • Add instance count on the hypervisor as a weight: review 127871 (abandoned).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530 (abandoned).
  • Convert the resource tracker to objects: review 128964 (fast tracked, approved).
  • Create an object model to represent a request to boot an instance: review 127610.
  • Decouple services and compute nodes in the SQL database: review 126895.
  • Implement resource objects in the resource tracker: review 127609.
  • Isolate the scheduler’s use of the Nova SQL database: review 89893.
  • Move select_destinations() to using a request object: review 127612.

Security

  • Provide a reference implementation for console proxies that uses TLS: review 126958 (fast tracked).
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.
Share

Juno nova mid-cycle meetup summary: slots

Share

If I had to guess what would be a controversial topic from the mid-cycle meetup, it would have to be this slots proposal. I was actually in a Technical Committee meeting when this proposal was first made, but I’m told there were plenty of people in the room keen to give this idea a try. Since the mid-cycle Joe Gordon has written up a more formal proposal, which can be found at https://review.openstack.org/#/c/112733.

If you look at the last few Nova releases, core reviewers have been drowning under code reviews, so we need to control the review workload. What is currently happening is that everyone throws up their thing into Gerrit, and then each core tries to identify the important things and review them. There is a list of prioritized blueprints in Launchpad, but it is not used much as a way of determining what to review. The result of this is that there are hundreds of reviews outstanding for Nova (500 when I wrote this post). Many of these will get a review, but it is hard for authors to get two cores to pay attention to a review long enough for it to be approved and merged.

If we could rate limit the number of proposed reviews in Gerrit, then cores would be able to focus their attention on the smaller number of outstanding reviews, and land more code. Because each review would merge faster, we believe this rate limiting would help us land more code rather than less, as our workload would be better managed. You could argue that this will mean we just say ‘no’ more often, but that’s not the intent, it’s more about bringing focus to what we’re reviewing, so that we can get patches through the process completely. There’s nothing more frustrating to a code author than having one +2 on their code and then hitting some merge freeze deadline.

The proposal is therefore to designate a number of blueprints that can be under review at any one time. The initial proposal was for ten, and the term ‘slot’ was coined to describe the available review capacity. If your blueprint was not allocated a slot, then it would either not be proposed in Gerrit yet, or if it was it would have a procedural -2 on it (much like code reviews associated with unapproved specifications do now).

The number of slots is arbitrary at this point. Ten is our best guess of how much we can dilute core’s focus without losing efficiency. We would tweak the number as we gained experience if we went ahead with this proposal. Remember, too, that a slot isn’t always a single code review. If the VMWare refactor was in a slot for example, we might find that there were also ten code reviews associated with that single slot.

How do you determine what occupies a review slot? The proposal is to groom the list of approved specifications more carefully. We would collaboratively produce a ranked list of blueprints in the order of their importance to Nova and OpenStack overall. As slots become available, the next highest ranked blueprint with code ready for review would be moved into one of the review slots. A blueprint would be considered ‘ready for review’ once the specification is merged, and the code is complete and ready for intensive code review.

What happens if code is in a slot and something goes wrong? Imagine if a proposer goes on vacation and stops responding to review comments. If that happened we would bump the code out of the slot, but would put it back on the backlog in the location dictated by its priority. In other words there is no penalty for being bumped, you just need to wait for a slot to reappear when you’re available again.

We also talked about whether we were requiring specifications for changes which are too simple. If something is relatively uncontroversial and simple (a better tag for internationalization for example), but not a bug, it falls through the cracks of our process at the moment and ends up needing to have a specification written. There was talk of finding another way to track this work. I’m not sure I agree with this part, because a trivial specification is a relatively cheap thing to do. However, it’s something I’m happy to talk about.

We also know that Nova needs to spend more time paying down its accrued technical debt, which you can see in the huge amount of bugs we have outstanding at the moment. There is no shortage of people willing to write code for Nova, but there is a shortage of people fixing bugs and working on strategic things instead of new features. If we could reserve slots for technical debt, then it would help us to get people to work on those aspects, because they wouldn’t spend time on a less interesting problem and then discover they can’t even get their code reviewed. We even talked about having an alternating focus for Nova releases; we could have a release focused on paying down technical debt and stability, and then the next release focused on new features. The Linux kernel does something quite similar to this and it seems to work well for them.

Using slots would allow us to land more valuable code faster. Of course, it also means that some patches will get dropped on the floor, but if the system is working properly, those features will be ones that aren’t important to OpenStack. Considering that right now we’re not landing many features at all, this would be an improvement.

This proposal is obviously complicated, and everyone will have an opinion. We haven’t really thought through all the mechanics fully, yet, and it’s certainly not a done deal at this point. The ranking process seems to be the most contentious point. We could encourage the community to help us rank things by priority, but it’s not clear how that process would work. Regardless, I feel like we need to be more systematic about what code we’re trying to land. It’s embarrassing how little has landed in Juno for Nova, and we need to be working on that. I would like to continue discussing this as a community to make sure that we end up with something that works well and that everyone is happy with.

This series is nearly done, but in the next post I’ll cover the current status of the nova-network to neutron upgrade path.

Share