Complexity Arrangements for Sustained Innovation: Lessons From 3M Corporation

This is the second business paper I’ve read this week while reading along with my son’s university studies. The first is discussed here if you’re interested. This paper is better written, but more academic in its style. This ironically makes it harder to read, because its grammar style is more complicated and harder to parse.

The take aways for me from this paper is that 3M is good at encouraging serendipity and opportune moments that create innovation. This is similar to Google’s attempts to build internal peer networks and deliberate lack of structure. In 3M’s case its partially expressed as 15% time, which is similar to Google’s 20% time. Specifically, “eureka moments” cannot be planned or scheduled, but require prior engagement.

chance favors only the prepared mind — Pasteur

3M has a variety of methods for encouraging peer networks, including technology fairs, “bootlegging” (borrowing idle resources from other teams), innovation grants, and so on.

At the same time, 3M tries to keep at least a partial focus on events driving by schedules. The concept of time is important here — there is a “time to wait” (we are ahead of the market); “a time in between” (15% time); and “a time across” (several parallel efforts around related innovations to speed up the process).

The idea of “a time to wait” is quite interesting. 3M has a history of discovering things where there is no current application, but somehow corporately remembering those things so that when there are applications years later they can jump in with a solution. They embrace story telling as part of their corporate memory, as well as a way of ensuring they learn from past success and failure.

Finally, 3M is similar to Google in their deliberate flexibility with the rules. 15% time isn’t rigidly counted for example — it might be 15% a week, or 15% of a year, or more or less than that. As long as it can be justified as a good use of resources its ok.

This was a good read and I enjoyed it.

 

A corporate system for continuous innovation: The case of Google Inc

So, one of my kids is studying some business units at university and was assigned this paper to read. I thought it looked interesting, so I gave it a read as well.

While not being particularly well written in terms of style, this is an approachable introduction to the culture and values of Google and how they play into Google’s continued ability to innovate. The paper identifies seven important attributes of the company’s culture that promote innovation, as ranked by the interviewed employees:

  • The culture is innovation oriented.
  • They put a lot of effort into selecting individuals who will fit well with the culture at hiring time.
  • Leaders are seen as performing a facilitiation role, not a directive one.
  • The organizational structure is loosely defined.
  • OKRs and aligned performance incentives.
  • A culture of organizational learning through postmortems and building internal social networks. Learning is considered a peer to peer activity that is not heavily structured.
  • External interaction — especially in the form of aggressive acquisition of skills and technologies in areas Google feels they are struggling in.

Additionally, they identify eight habits of a good leader:

  • A good coach.
  • Empoyer your team and don’t micro-manage.
  • Express interest in employees’ success and well-being.
  • Be productive and results oriented.
  • Be a good communicator and listen to your team.
  • Help employees with career development.
  • Have a clear vision and strategy for the team.
  • Have key technical skills, so you can help advise the team.

Overall, this paper is well worth the time to read. I enjoyed it and found it insightful.

Goals Gone Wild

In 2009 Harvard Business School published a draft paper entitled “Goals Gone Wild“, and its abstract is quite concerning. For example:

“We identify specific side effects associated with goal setting, including a narrow focus that neglects non-goal areas, a rise in unethical behavior, distorted risk preferences, corrosion of organizational culture, and reduced intrinsic motivation.”

Are we doomed? Is all goal setting harmful? Interestingly, I came across this paper while reading Measure What Matters, which argues the exact opposite point — that OKRs provide a meaningful way to improve the productivity of an organization.

The paper starts by listing a series of examples of goal setting gone wrong: Sears’ auto repair in the early 1900s over charging customers to meet hourly billable goals; Enron’s sales targets based solely on volume and revenue and not profit; and Ford Motor Company’s goal of shipping a car at a specific target price point which resulted in significant safety failures.

The paper then provides specific examples of how goals can go wrong:

  • By being too specific and causing other important features of a task to be ignored — for example shipping on a specific deadline but ignoring testing adequately to achieve that deadline.
  • By being too common — employees with more than one goal tend to focus on one and ignore the others. For example studies have shown that if you present someone with both quality and quantity goals, that they will fixate on the quantity goals over the quality ones.
  • Inappropriate time horizon — for example, producing quarterly results by canibalizing longer term outcomes. Additionally, goals can be percieved as ceilings not floors, that is once a goal has been met attention is diverted elsewhere instead of over delivering on the goal.
  • By encouraging inappropriate risk taking or unethical behaviour — if a goal is too challenging, then an employee is encouraged to take risks they would not normally be able to justify in order to meet the goal.
  • Stretch goals that are not met hard employee’s confidence in their abilities and impact future performance.
  • A narrowly focused performance goal discourages learning and collaboration with coworkers. These tasks detract from time spent on the narrowly defined target, and are therefore de-emphasised.

The paper also calls out that while most people can see some amount of intrinsic motivation in their own behaviours, goals are extrinsic motivation and can be overused when applied to an intrinsicly motivated workforce.

Overall, the paper urges managers to consider if they goals they are setting are nessesary, and notes that goals should only be used in narrow circumstances.

High Output Management

A reading group of managers at work has been reading this book, except for the last chapter which we were left to read by ourselves. Overall, the book is interesting and very readable. Its a little dated, being all excited with the invention of email and some unfortunate gender pronouns, but if you can get past those minor things there is a lot of wise advice here. I’m not sure I agree with 100% of it, but I do think the vast majority is of interest. A well written book that I’d recommend to new managers.

High Output Management Book Cover High Output Management
Andrew S. Grove
Business & Economics
Vintage Books
1995
243

The president of Silicon Valley's Intel Corporation sets forth the three basic ideas of his management philosophy and details numerous specific techniques to increase productivity in the manager's work and that of his colleagues and subordinates

Juno nova mid-cycle meetup summary: slots

If I had to guess what would be a controversial topic from the mid-cycle meetup, it would have to be this slots proposal. I was actually in a Technical Committee meeting when this proposal was first made, but I’m told there were plenty of people in the room keen to give this idea a try. Since the mid-cycle Joe Gordon has written up a more formal proposal, which can be found at https://review.openstack.org/#/c/112733.

If you look at the last few Nova releases, core reviewers have been drowning under code reviews, so we need to control the review workload. What is currently happening is that everyone throws up their thing into Gerrit, and then each core tries to identify the important things and review them. There is a list of prioritized blueprints in Launchpad, but it is not used much as a way of determining what to review. The result of this is that there are hundreds of reviews outstanding for Nova (500 when I wrote this post). Many of these will get a review, but it is hard for authors to get two cores to pay attention to a review long enough for it to be approved and merged.

If we could rate limit the number of proposed reviews in Gerrit, then cores would be able to focus their attention on the smaller number of outstanding reviews, and land more code. Because each review would merge faster, we believe this rate limiting would help us land more code rather than less, as our workload would be better managed. You could argue that this will mean we just say ‘no’ more often, but that’s not the intent, it’s more about bringing focus to what we’re reviewing, so that we can get patches through the process completely. There’s nothing more frustrating to a code author than having one +2 on their code and then hitting some merge freeze deadline.

The proposal is therefore to designate a number of blueprints that can be under review at any one time. The initial proposal was for ten, and the term ‘slot’ was coined to describe the available review capacity. If your blueprint was not allocated a slot, then it would either not be proposed in Gerrit yet, or if it was it would have a procedural -2 on it (much like code reviews associated with unapproved specifications do now).

The number of slots is arbitrary at this point. Ten is our best guess of how much we can dilute core’s focus without losing efficiency. We would tweak the number as we gained experience if we went ahead with this proposal. Remember, too, that a slot isn’t always a single code review. If the VMWare refactor was in a slot for example, we might find that there were also ten code reviews associated with that single slot.

How do you determine what occupies a review slot? The proposal is to groom the list of approved specifications more carefully. We would collaboratively produce a ranked list of blueprints in the order of their importance to Nova and OpenStack overall. As slots become available, the next highest ranked blueprint with code ready for review would be moved into one of the review slots. A blueprint would be considered ‘ready for review’ once the specification is merged, and the code is complete and ready for intensive code review.

What happens if code is in a slot and something goes wrong? Imagine if a proposer goes on vacation and stops responding to review comments. If that happened we would bump the code out of the slot, but would put it back on the backlog in the location dictated by its priority. In other words there is no penalty for being bumped, you just need to wait for a slot to reappear when you’re available again.

We also talked about whether we were requiring specifications for changes which are too simple. If something is relatively uncontroversial and simple (a better tag for internationalization for example), but not a bug, it falls through the cracks of our process at the moment and ends up needing to have a specification written. There was talk of finding another way to track this work. I’m not sure I agree with this part, because a trivial specification is a relatively cheap thing to do. However, it’s something I’m happy to talk about.

We also know that Nova needs to spend more time paying down its accrued technical debt, which you can see in the huge amount of bugs we have outstanding at the moment. There is no shortage of people willing to write code for Nova, but there is a shortage of people fixing bugs and working on strategic things instead of new features. If we could reserve slots for technical debt, then it would help us to get people to work on those aspects, because they wouldn’t spend time on a less interesting problem and then discover they can’t even get their code reviewed. We even talked about having an alternating focus for Nova releases; we could have a release focused on paying down technical debt and stability, and then the next release focused on new features. The Linux kernel does something quite similar to this and it seems to work well for them.

Using slots would allow us to land more valuable code faster. Of course, it also means that some patches will get dropped on the floor, but if the system is working properly, those features will be ones that aren’t important to OpenStack. Considering that right now we’re not landing many features at all, this would be an improvement.

This proposal is obviously complicated, and everyone will have an opinion. We haven’t really thought through all the mechanics fully, yet, and it’s certainly not a done deal at this point. The ranking process seems to be the most contentious point. We could encourage the community to help us rank things by priority, but it’s not clear how that process would work. Regardless, I feel like we need to be more systematic about what code we’re trying to land. It’s embarrassing how little has landed in Juno for Nova, and we need to be working on that. I would like to continue discussing this as a community to make sure that we end up with something that works well and that everyone is happy with.

This series is nearly done, but in the next post I’ll cover the current status of the nova-network to neutron upgrade path.

Further adventures with base images in OpenStack

I was bored over the New Years weekend, so I figured I’d have a go at implementing image cache management as discussed previously. I actually have an implementation of about 75% of that blueprint now, but its not ready for prime time yet. The point of this post is more to document some stuff I learnt about VM startup along the way so I don’t forget it later.

So, you want to start a VM on a compute node. Once the scheduler has selected a node to run the VM on, the next step is the compute instance on that machine starting the VM up. First the specified disk image is fetched from your image service (in my case glance), and placed in a temporary location on disk. If the image is already a raw image, it is then renamed to the correct name in the instances/_base directory. If it isn’t a raw image then it is converted to raw format, and that converted file is put in the right place. Optionally, the image can be extended to a specified size as part of this process.

Then, depending on if you have copy on write (COW) images turned on or not, either a COW version of the file is created inside the instances/$instance/ directory, or the file from _base is copied to instances/$instance.

This has a side effect that had me confused for a bunch of time yesterday — the checksums, and even file sizes, stored in glance are not reliable indicators of base image corruption. Most of my confusion was because image files in glance are immutable, so how come they differed from what’s on disk? The other problem was that the images I was using on my development machine were raw images, and checksums did work. It was only when I moved to a slightly more complicated environment that I had enough data to work out what was happening.

We therefore have a problem for that blueprint. We can’t use the checksums from glance as a reliable indicator of if something has gone wrong with the base image. I need to come up with something nicer. What this probably means for the first cut of the code is that checksums will only be verified for raw images which weren’t extended, but I haven’t written that code yet.

So, there we go.

Openstack compute node cleanup

I’ve never used openstack before, which I imagine is similar to many other people out there. Its actually pretty cool, although I encountered a problem the other day that I think is worthy of some more documentation. Openstack runs virtual machines for users, in much the same manner as Amazon’s EC2 system. These instances are started with a base image, and then copy on write is used to write differences for the instance as it changes stuff. This makes sense in a world where a given machine might be running more than one copy of the instance.

However, I encountered a compute node which was running low on disk. This is because there is currently nothing which cleans up these base images, so even if none of the instances on a machine require that image, and even if the machine is experiencing disk stress, the images still hang around. There are a few blog posts out there about this, but nothing really definitive that I could find. I’ve filed a bug asking for the Ubuntu package to include some sort of cleanup script, and interestingly that led me to learn that there are plans for a pretty comprehensive image management system. Unfortunately, it doesn’t seem that anyone is working on this at the moment. I would offer to lend a hand, but its not clear to me as an openstack n00b where I should start. If you read this and have some pointers, feel free to contact me.

Anyways, we still need to cleanup that node experiencing disk stress. It turns out that nova uses qemu for its copy on write disk images. We can therefore ask qemu which are in use. It goes something like this:

    $ cd /var/lib/nova/instances
    $ find -name "disk*" | xargs -n1 qemu-img info | grep backing | \
      sed -e's/.*file: //' -e 's/ .*//' | sort | uniq > /tmp/inuse
    

/tmp/inuse will now contain a list of the images in _base that are in use at the moment. Now you can change to the base directory, which defaults to /var/lib/nova/instances/_base and do some cleanup. What I do is I look for large image files which are several days old. I then check if they appear in that temporary file I created, and if they don’t I delete them.

I’m sure that this could be better automated by a simple python script, but I haven’t gotten around to it yet. If I do, I will be sure to mention it here.

Being Geek

This book by long time Apple engineering manager, as well as startup employee, Michael Lopp is a guide to how to manage geeks. That wasn’t really what I was expecting — which is sort of the inverse. I was hoping for a book about how to be a geek who has to deal with management. This book helps with that, by offering the inverse perspective, but I’d still like to see a book from my direction.

The book is well written, in a conversational and sometimes profane manner (a comment I see others make about his other book “Managing Humans”). I think that’s ok in this context, where it feels as if Michael is having a personal conversation with you the reader. An overly formal tone here would cause the content to be much more boring, and its already dry enough.

I’m not sure I agree with everything said in the book, but the first half resonated especially strongly with me.

[isbn: 9780596155407;0596155409]

Scott Adam’s blog: the boner theory of management

The Boner Theory of Economics also predicts that in the long run — perhaps in a few hundred years — the military will be 100% gay men. This is the best case scenario for taxpayers because it will keep down costs, and recruiting will be easy.

Recruiter: “We can’t afford to give you body armor, but you’ll be surrounded by young, vital men who are a long way from home. Would you like a tour of the showers?”

Recruit: “Yes, but I can’t stand up right away.”

http://dilbertblog.typepad.com/the_dilbert_blog/2007/03/the_boner_theor.html

Open Source document management from Alfresco

An Alfresco employee (Alfrescoer?) posts about some of the interesting things they’ve learnt about being an open source company along the way. The comments about PR being more effective the cold sales calls is especially interesting. I argued for years at TOWER that we should be paying more attention to people searching for our product, instead of paying pretty boys to drive sports cars to sales presentations that everyone secretly hates. If your product has a good reputation and people can find it online, surely the customers will come to you?