Juno Nova PTL Candidacy – Made by Mikal

This is a repost of an email to the openstack-dev list, which is mostly here for historical reasons.
Hi.

I would like to run for the OpenStack Compute PTL position as well.

I have been an active nova developer since late 2011, and have been a
core reviewer for quite a while. I am currently serving on the
Technical Committee, where I have recently been spending my time
liaising with the board about how to define what software should be
able to use the OpenStack trade mark. I've also served on the
vulnerability management team, and as nova bug czar in the past.

I have extensive experience running Open Source community groups,
having served on the TC, been the Director for linux.conf.au 2013, as
well as serving on the boards of various community groups over the
years.

In Icehouse I hired a team of nine software engineers who are all
working 100% on OpenStack at Rackspace Australia, developed and
deployed the turbo hipster third party CI system along with Joshua
Hesketh, as well as writing nova code. I recognize that if I am
successful I will need to rearrange my work responsibilities, and my
management is supportive of that.

The future
--------------

To be honest, I've thought for a while that the PTL role in OpenStack
is poorly named. Specifically, its the T that bothers me. Sure, we
need strong technical direction for our programs, but putting it in
the title raises technical direction above the other aspects of the
job. Compute at the moment is in an interesting position -- we're
actually pretty good on technical direction and we're doing
interesting things. What we're not doing well on is the social aspects
of the PTL role.

When I first started hacking on nova I came from an operations
background where I hadn't written open source code in quite a while. I
feel like I'm reasonably smart, but nova was certainly the largest
python project I'd ever seen. I submitted my first patch, and it was
rejected -- as it should have been. However, Vishy then took the time
to sit down with me and chat about what needed to change, and how to
improve the patch. That's really why I'm still involved with
OpenStack, Vishy took an interest and was always happy to chat. I'm
told by others that they have had similar experiences.

I think that's what compute is lacking at the moment. For the last few
cycles we're focused on the technical, and now the social aspects are
our biggest problem. I think this is a pendulum, and perhaps in a
release or two we'll swing back to needing to re-emphasise on
technical aspects, but for now we're doing poorly on social things.
Some examples:

- we're not keeping up with code reviews because we're reviewing the
wrong things. We have a high volume of patches which are unlikely to
ever land, but we just reject them. So far in the Icehouse cycle we've
seen 2,334 patchsets proposed, of which we approved 1,233. Along the
way, we needed to review 11,747 revisions. We don't spend enough time
working with the proposers to improve the quality of their code so
that it will land. Specifically, whilst review comments in gerrit are
helpful, we need to identify up and coming contributors and help them
build a relationship with a mentor outside gerrit. We can reduce the
number of reviews we need to do by improving the quality of initial
proposals.

- we're not keeping up with bug triage, or worse actually closing
bugs. I think part of this is that people want to land their features,
but part of it is also that closing bugs is super frustrating at the
moment. It can take hours (or days) to replicate and then diagnose a
bug. You propose a fix, and then it takes weeks to get reviewed. I'd
like to see us tweak the code review process to prioritise bug fixes
over new features for the Juno cycle. We should still land features,
but we should obsessively track review latency for bug fixes. Compute
fails if we're not producing reliable production grade code.

- I'd like to see us focus more on consensus building. We're a team
after all, and when we argue about solely the technical aspects of a
problem we ignore the fact that we're teaching the people involved a
behaviour that will continue on. Ultimately if we're not a welcoming
project that people want to code on, we'll run out of developers. I
personally want to be working on compute in five years, and I want the
compute of the future to be a vibrant, friendly, supportive place. We
get there by modelling the behaviour we want to see in the future.

So, some specific actions I think we should take:

- when we reject a review from a relatively new contributor, we should
try and pair them up with a more experienced developer to get some
coaching. That experienced dev should take point on code reviews for
the new person so that they receive low-latency feedback as they
learn. Once the experienced dev is ok with a review, nova-core can
pile on to actually get the code approved. This will reduce the
workload for nova-core (we're only reviewing things which are of a
known good standard), while improving the experience for new
contributors.

- we should obsessively track review performance for bug fixes, and
prioritise them where possible. Let's not ignore features, but let's
agree that each core should spend at least 50% of their review time
reviewing bug fixes.

- we should work on consensus building, and tracking the progress of
large blueprints. We should not wait until the end of the cycle to
re-assess the v3 API and discover we have concerns. We should be
talking about progress in the weekly meetings and making sure we're
all on the same page. Let's reduce the level of surprise. This also
flows into being clearer about the types of patches we don't want to
see proposed -- for example, if we think that patches that only change
whitespace are a bad idea, then let's document that somewhere so
people know before they put a lot of effort in.

Thanks for taking the time to read this email!