Why is Azure so slow to start instances?

Share

I’ve been playing with terraform recently, and decided to see how different the terraform for launching a simple Ubuntu instance in various clouds is. There are two big questions there for me — how big is the variation between OpenStack derived clouds; and how painful is it to move between the proprietary clouds? Part of this is because terraform doesn’t present a standardised layer of cloud functionality, it has a provider per cloud.

(Although, I suspect there’s nothing stopping someone from writing a libcloud provider or something like that. It is an interesting idea which requires some additional thought.)

My terraform implementations for each cloud are on github if you’re interested. I don’t want to spend a lot of analysis on the actual terraform, because I think the really interesting thing I found isn’t where I expected it to be (there’s a hint in the title for this post). That said, the OpenStack clouds vary mostly by capabilities. vexxhost for example seems to only offer flavors that require boot-from-volume. The proprietary clouds are complete re-writes, but are generally relatively simple and well documented.

However, that interesting accidental thing — as best as I can tell, Microsoft Azure is really really slow to launch instances. The graph below presents five instance launches on each cloud I tested:

As you can see, Vault, Vexxhost, and AWS are basically all in the same ballpark. Google and Azure are outliers, with Google being crazy fast (but also very slow to delete instances, a metric not presented here), and Azure being more than three times slower than everyone else.

Instance launch time isn’t a great metric to be honest, but it does matter. For example if you were trying to autoscale a web tier or a kubernetes cluster, then waiting over two minutes just for the instance to boot before it can be configured and added to the cluster is probably not ok.

I wonder why Azure is so slow?

I did some further exploring after writing this post and was able to improve performance by changing how I handled resource groups in the terraform. The performance still isn’t great though. You can read more about that in a separate post if you’d like.

Share

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.