Call for presentations for the linux.conf.au 2014 OpenStack mini-conference

Share

I’ve just emailed this out to the relevant lists, but I figured it can’t hurt to post it here as well…

linux.conf.au will be hosting the second OpenStack mini-conference to
run in Australia. The first one was well attended, and this
mini-conference will be the first OpenStack conference to be held on
Australia’s west coast. The mini-conference is a day long event
focusing on OpenStack development and operations, and is available to
attendees of linux.conf.au.

The mini-conference is therefore calling for proposals for content.
Speakers at the mini-conference must be registered for linux.conf.au
2014 as delegates, or discuss their needs with the mini-conference
organizers if that isn’t possible.

Some examples of talks we’re interested in are: talks from OpenStack
developers about what features they are working on for IceHouse; talks
from deployers of OpenStack about their experiences and how others can
learn from them; talks covering the functionality of OpenStack and how
it can be used in new and interesting ways.

Some important details:

  • linux.conf.au runs from 6 to 10 January 2014 in Perth, Australia at
    the University of Western Australia

  • the mini-conference will be on Tuesday the 7th of January
  • proposals are due to the mini-conference organizer no later than 1 November
  • there are two types of talks — full length (45 minutes) and half
    length (20 minutes)

CFP submissions are made by completing this online form:
CFP submission form

If you have questions about this call for presentations, please
contact Michael Still at openstack-lca2014@lists.stillhq.com for more
details.

Share

Exploring a single database migration

Share

Yesterday I was having some troubles with a database migration download step, and a Joshua Hesketh suggested I step through the migrations one at a time and see what they were doing to my sqlite test database. That’s a great idea, but it wasn’t immediately obvious to me how to do it. Now that I’ve figured out the steps required, I thought I’d document them here.

First off we need a test environment. I’m hacking on nova at the moment, and tend to build throw away test environments in the cloud because its cheap and easy. So, I created a new Ubuntu 12.04 server instance in Rackspace’s Sydney data center, and then configured it like this:

    $ sudo apt-get update
    $ sudo apt-get install -y git python-pip git-review libxml2-dev libxml2-utils
    libxslt-dev libmysqlclient-dev pep8 postgresql-server-dev-9.1 python2.7-dev
    python-coverage python-netaddr python-mysqldb python-git virtualenvwrapper
    python-numpy virtualenvwrapper sqlite3
    $ source /etc/bash_completion.d/virtualenvwrapper
    $ mkvirtualenv migrate_204
    $ toggleglobalsitepackages
    


Simple! I should note here that we probably don’t need the virtualenv because this machine is disposable, but its still a good habit to be in. Now I need to fetch the code I am testing. In this case its from my personal fork of nova, and the git location to fetch will obviously change for other people:

    $ git clone http://github.com/mikalstill/nova
    

Now I can install the code under test. This will pull in a bunch of pip dependencies as well, so it takes a little while:

    $ cd nova
    $ python setup.py develop
    

Next we have to configure nova because we want to install specific database schema versions.

    $ mkdir /etc/nova
    $ sudo mkdir /etc/nova
    $ sudo vim /etc/nova/nova.conf
    $ sudo chmod -R ugo+rx /etc/nova
    

The contents of my nova.conf looks like this:

    $ cat /etc/nova/nova.conf
    [DEFAULT]
    sql_connection = sqlite:////tmp/foo.sqlite
    

Now I can step up to the version before the one I am testing:

    $ nova-manage db sync --version 203
    

You do the same thing but with a different version number to step somewhere else. Its also pretty easy to get the schema for a table under sqlite. I just do this:

    $ sqlite3 /tmp/foo.sqlite
    SQLite version 3.7.9 2011-11-01 00:52:41
    Enter ".help" for instructions
    Enter SQL statements terminated with a ";"
    sqlite> .schema instances
    CREATE TABLE "instances" (
            created_at DATETIME,
            updated_at DATETIME,
    [...]
    

So there you go.

Disclaimer — I wouldn’t recommend upgrading to a specific version like this for real deployments, because the models in the code base wont match the tables. If you wanted to do that you’d need to work out what git commit added the version after the one you’ve installed, and then checkout the commit before that commit.

Share

Nova database continuous integration

Share

I’ve had some opportunity recently to spend a little quality time off line, and I spent some of that time working on a side project I’ve wanted to do for a while — continuous integration testing of nova database migrations. Now, the code isn’t perfect at the moment, but I think its an interesting direction to take and I will keep pursuing it.

One of the problems nova developers have is that we don’t have a good way of determining whether a database migration will be painful for deployers. We can eyeball code reviews, but whether code looks reasonable or not, its still hard to predict how it will perform on real data. Continuous integration is the obvious solution — if we could test patch sets on real databases as part of the code review process, then reviewers would have more data about whether to approve a patch set or not. So I did that.

At the moment the CI implementation I’ve built isn’t posting to code reviews, but that’s because I want to be confident that the information it gathers is accurate before wasting other reviewers’ time. You can see results at openstack.stillhq.com/ci. For now, I am keeping an eye on the test results and posting manually to reviews when an error is found — that has happened twice so far.

The CI tests work by restoring a MySQL database to a known good state, upgrading that database from Folsom to Grizzly (if needed). It then runs the upgrades already committed to trunk, and then the proposed patch set. Timings for each step are reported — for example with my biggest test database the upgrade from Folsom to Grizzly takes between about 7 and 9 minutes to run, which isn’t too bad. You can see an example log at here.

I’d be interested in know if anyone else has sample databases they’d like to see checks run against. If so, reach out to me and we can make it happen.

Share

Merged in Havana: fixed ip listing for single hosts

Share

Nova has supported listing the fixed ips for a single host for a while. Well, except for that time we broke it by removing the database call it used and not noticing. My change to fix that situation has just landed, so this should now work again. To list the fixed ips used on a host, do something like:

    nova-manage fixed list hostname

I will propose a backport to grizzly for this now.

Share

Michael’s surprisingly unreliable predictions for the Havana Nova release

Share

I should start out by saying that because OpenStack is an open source project, it is hard to know exactly what will land in Havana — the developers are volunteers, and sometimes things get in the way of them doing the work they intended. However, these are the notes I wrote up on the high points of the summit for me — I didn’t see all the same sessions as other nova developers, so hopefully others will pitch in with their notes as well.

Scheduler

The scheduler seems to be a point of planned work for a lot of people in this release, with talk about having more scheduling code in the common library, and of adding new filter types. There is definite interest in being able to schedule by methods we don’t currently support — things like rack or PDU diversity, or trying to collocate a tenants machines together. HP is also interested in being able to sell dedicated machines to tenants — in other words, they would guarantee that only one tenants instances appeared on a machine in return for a fee. At the moment this requires setting up a host aggregate for the tenant.

Feeding additional data into scheduling decisions

There is also interest in being able to feed more scheduling information to the nova-scheduler. For example, ceilometer intends to start collecting monitoring data from nova-compute nodes, and perhaps it might inform nova-scheduler that a machine is running hot or has a degraded RAID array. This might also be the source of PDU or CRAC failure information which might affect scheduling decisions — these later two examples are interesting because they are information where it doesn’t make sense to get it from the compute node, the correct location for this information is a data center wide system, not an individual machine. There is concern about nova-scheduler depending on other systems, so these updates from ceilometer will probably be advisory updates, with nova-scheduler degrading gracefully if they are not present or are stale.

Mothballing

This was almost instantly renamed to “shelving”, but “swallow / spew” was also considered. This is a request that Rackspace sees from customers — basically the ability to stop a virtual machine, but keep the UUID and IP addresses associated with the machine as well as the block device mapping. The proposal is to implement this as a snapshot of the machine, and a new machine state. The local disk files for the instance might get deleted if the resources are needed. This would feel like a reboot of an instance to a user.

This is of interest for workloads like “Black Friday” web servers. You could bring a whole bunch up, configure security groups, load balancers, and the applications on the instances and then shelve the instance. When you need the instance to handle load, you’d then unshelve the instance and once it was booted it would just magically start serving. Expect to see shelves instances be cheaper than a running instance, but not free. This is mostly because IP addresses are scarce. Restarting a shelved instance might take a while if the snapshot has to be fetched to a compute node. If you need a more “instant on” bursting capacity, then just leave instances idling and pay full price.

Deferred instance file delete

This is a nice to have requirement for shelving instances, but it is useful for other things as well. This is the ability to delay the deletion of instance files when an instance is torn down. This might end up being expressed as “keep these files for at least X days, unless you are tight on disk resources”. I can see other reasons this would be useful — for example helping support people rescue data from instances users tore down and now want back. It also defers the disk IO from deleting the files until its absolutely necessary. We could also perhaps detect times when the disks are “relatively idle” and use those to clean up file systems.

DNS in nova-network

Expect to see the current DNS driver removed, as no one uses it as best as we can tell. This will be replaced with a simpler drive in nova-compute and the recommendation that deployers use quantum DNS if possible.

Quantum

There is continued work of making quantum the default networking engine for nova. There are still some missing features, but the list of absolutely blocking features is getting smaller. A lot of discussion centered around how to live upgrade clouds from nova-network to quantum. This is not an easy problem, but smart people are looking at it. The solution might involve moving compute nodes over to quantum, and then live migrating instances over to those compute nodes. However, we currently only support one network driver at a time in nova, so we will need to change some code here.

Long running periodic tasks

There will be a refactor of the periodic task code in nova this release to move periodic tasks which incur a lot of blocking IO into separate processes. These processes will be launched by nova-compute, and not be cron jobs or something like that. Most of the discussion was around how to do this safely (eventlet makes it exciting), which is nice in that it indicates some level of consensus that this is needed. The plan for now is to do this in nova-compute, but leave other nova components for later releases.

Libvirt changes

Libvirt is the compute driver I work on, so it’s the only one I want to comment on here. The other drivers are doing interesting things as well, I just don’t want to get details wrong by not understanding their efforts.

First off, there should be some work done on better console logging in Havana. At the moment we use an unbounded file on disk. This will hopefully become a Unix domain socket managing a ring buffer of some form. The Unix domain socket leaves the option open of later making this serial console interactive, but that’s not an immediate goal.

There was a lot of talk about LXC support, and how we need to support file system attachments as well as block devices. There is also some cleanup that can be done for the LXC support in the libvirt to make the code cleaner, but it is not clear who will work on this.

imagebackend.py will probably get refactored, but in ways that don’t make a big difference to users but make it easier to code against (and therefore more reliable). I’m including it here just because I’m excited about that refactor making this code easier to understand.

There was a lot of talk about live migration and the requirement for ssh between compute nodes. Operators don’t love that compute nodes can talk to each other, but expect Havana to include some sort of on demand ssh key management, and a later release to proxy that traffic through something like nova-conductor.

Incremental backups are of interest to deployers as well, but there is concern that glance needs more support for chains of images before we can do that.

Conclusion

The summit was fantastic once again, and the Foundation did an awesome job of hosting it. It was however a pretty tiring experience, and I’m sure I got some stuff here wrong, or missed things that others would consider important. It would be cool for other developers to write up summaries of what they saw at the summit as well.

Share

Faster pip installs

Share

Last week with the help of the lovely openstack-infra people, I discovered that you can have a local cache of pip downloads. This speeds up rebuilding test environments when you need to jump between branches with different dependencies. Its as simple as chucking something like:

    export PIP_DOWNLOAD_CACHE=~/cache/pip
    

…into your .bashrc or equivalent.

Share

Merged in Havana: configurable iptables drop actions in nova

Share

LaunchPad bug 1013893 asked nicely if the drop action for iptables rules created by nova-network could be configured. The idea here is that you might want to do something other than a plain old drop — for example logging before dropping. This has now been implemented in Havana.

To configure the drop action, set the iptables_drop_action to the name of an already existing iptables target. Creating this target is not managed by nova, and you’ll need to do it on every compute node. When iptables creates or deletes rules on compute nodes it will now use this new target. There’s a bit of an upgrade problem here in that this will stop nova from deleting rules which use the old hard coded drop target. However, if an instance is torn down then all of its tables are torn down as well and rules will be deleted correctly, so this is only a problem if a security group is changed while the instance is running.

It occurs to me that we can do better here, so I’ve sent off this review to handle the case where a rule is being removed and used the default drop action.

For safety, I would recommend only using this flag on new compute nodes that have no instances running in order to make this simple.

Share

Upgrade problems with the new Fixed IP quota

Share

In the last few weeks a new quota has been added to Nova covering Fixed IPs. This was done in response to LaunchPad bug 1125468, which was disclosed as CVE 2013-1838.

To be honest I think there are some things the vulnerability management team learned the hard way with this disclosure. For example, we didn’t realize that we needed to update python-novaclient to allow users to set the quota, or that adding a quota would require changes in Horizon. Both of these errors have been corrected.

More importanly, the default value of the new quota was set to 10. I made this decision based on the default value of the instances quota coupled with a desire to protect deployments from denial of service. However, this decision combined with a failure to explicitly call out the new quota in the release notes for the Folsom stable release have resulted in some deployers experiencing upgrade problems. This was drawn to our attention by LaunchPad bug 1161190.

We have therefore moved to set the default quota for fixed IPs to unlimited. If you want to protect yourself from a potential DoS, then you should seriously consider changing this default value in your deployment. This can be done with the quota_fixed_ips flag. The code reviews implementing this change are either merged, or under review depending on the release. At the time of writing this Havana and Grizzly have a fix merged, with Folsom and Essex still under review.

I think this experience also reinforces the importance of testing all upgrades in a lab environment before doing them in production.

Sorry for any inconvenience caused.

Share

Havana Nova PTL elections

Share

This is just a quick reminder that there are only a couple more days to vote in the Nova PTL elections for the Havana cycle. If you’re eligible to vote, you should have a voting URL in your email.

The candidates:

The incumbent PTL, Vishvananda Ishaya, has chosen not to run.

Rackspace is hiring OpenStack developers, let me know if you want to know more.

Share

OpenStack at linux.conf.au 2013

Share

As some of you might know, I’m the Director for linux.conf.au 2013. I’ve tried really hard to not use my powers for evil and make the entire conference about OpenStack — in fact I haven’t pulled rank and demanded that specific content be included at all. However, the level of interest in OpenStack has grown so much since LCA 2012 that there is now a significant amount of OpenStack content in the conference without me having to do any of that.

I thought I’d take a second to highlight some of the OpenStack content that I think is particularly interesting — these are the talks I’ll be going to if I have the time (which remains to be seen):

Monday

  • Cloud Infrastructure, Distributed Storage and High Availability Miniconf: while not specifically about OpenStack, this miniconf is going to be a good warm up for all things IaaS at the conference. Here’s a list of the talks for that miniconf:
      Delivering IaaS with Apache CloudStack – Joe Brockmeier

    • oVirt – Dan Macpherson
    • Aeolus – Dan Macpherson
    • Ops: From bare metal to cloud space – Phil Ingram
    • VMs on VLANs on Bridges on Bonds on many NICs – Kim Hawtin
    • OpenStack Swift Overview – John Dickinson
    • JORN and the rise and fall of clustering – Jamie Birse
    • MongoDB Replication & Replica Sets – Stephen Steneker
    • MariaDB Galera Cluster – Grant Allen
    • The Grand Distributed Storage Debate: GlusterFS and Ceph going head to head – Florian Haas, Sage Weil, Jeff Darcy

Tuesday

  • The OpenStack Miniconf: this is a mostly-clear winner for Tuesday. Tristan Goode has been doing a fantastic job of organizing this miniconf, which might not be obvious to people who haven’t been talking to him a couple of times a week about its progress like me. I think people will be impressed with the program, which includes:
    • Welcome and Introduction – Tristan Goode
    • Introduction to OpenStack – Joshua McKenty
    • Demonstration – Sina Sadeghi
    • NeCTAR Research Cloud: OpenStack in Production – Tom Fifeld
    • Bare metal provisioning with OpenStack – Devananda van der Veen
    • Intro to Swift for New Contributors – John Dickinson
    • All-around OpenStack storage with Ceph – Florian Haas
    • Writing API extensions for Nova – Christopher Yeoh
    • The OpenStack Metering Project – Angus Salkeld
    • Lightweight PaaS on the NCI OpenStack Cloud – Kevin Pulo
    • Enabling Compute Clusters atop OpenStack – Enis Afgan
    • Shared Panel with Open Government
  • The Open Government Miniconf: this is the other OpenStack relevant miniconf on Tuesday. This might seem like a bit of a stretch, but as best as I can tell there is massive interest in government at the moment in deploying cloud infrastructure, and now is the time to be convincing the decision makers that open clouds based on open source are the right way to go. OpenStack has a lot to offer in the private cloud space, and we need to as a community make sure that people are aware of the various options that are out there. This is why there is a shared panel at the end of the day with the OpenStack miniconf.

Wednesday

    There aren’t any OpenStack talks on Wednesday, but I am really hoping that someone will propose an OpenStack BoF via the wiki. I’d sure go to a BoF.

Thursday

  • Playing with OpenStack Swift by John Dickinson
  • Ceph: Managing A Distributed Storage System At Scale by Sage Weil

Friday

  • Openstack on Openstack – a single management API for all your servers by Robert Collins
  • Heat: Orchestrating multiple cloud applications on OpenStack using templates by Angus Salkeld and Steve Baker
  • How OpenStack Improves Code Quality with Project Gating and Zuul by James Blair
  • Ceph: object storage, block storage, file system, replication, massive scalability, and then some! by Tim Serong and Florian Haas

So, if you’re interested in OpenStack and haven’t considered linux.conf.au 2013 as a conference you might be interested in, now would be a good time to reconsider before we sell out!

Share