Azure VM Diagnostics Explained

At VMPower we strive to ensure that Azure is a 1st class experience (as with all cloud providers we support). VMPower periodically collects performance metrics of VMs to better provide analytics on what VMs are under capacity.

Some of us at VM Power are ex-Microsoft engineers and so we know how to handle the quirky-ness of Azure to retrieve the necessary information we need for our customers.

There won't be an AWS version of this post since its much more straight forward to do this with AWS Cloudwatch APIs.

Detecting Azure VM Datapoints

We previously wrote about how we can detect idle VMs but we didn't do much sharing on how that works with our cloud providers. So we'll start with Azure.


The first thing to understand when it comes to Azure VM diagnostics is that everything is logged to Azure Tables. It's also good to note that Azure will charge you standard storage rates for storing the diagnostics data (which is extremely inexpensive at ~$0.07USD/TB).

Azure Tables is a PaaS service that Azure offers which provides mega-scalabe tables with very basic query abilities. We don't need to get into too much detail here since there is already a great blog post on how to effectively query Azure Tables.

The point is that if you want to retrieve data on your Virtual Machines you need to be able to query Azure Tables, which can be done from any language.

Linux (v2)

Alright let's get right down to it. There are different table schemas for different VM types. Linux VMs have a different schema than Windows VMs. Every table for Linux is split by the type of performance counter it logs. For example, to get Linux CPU data you need to inspect the LinuxCpuVer2v0 table:

Using the Azure storage explorer you can see that the table looks like this:

linux cpu table screenshot

With Azure tables you need to always include the partition key in your query. Otherwise your query will be unacceptably slow (think up to several minutes long!). For a detailed explanation check the blog post I mentioned earlier.

Notice how each partition key is an odd set of 0000000000000000__0 appended before what appears to be some kind of number? So the suffix of the PartitionKey is a .NET timestamp ticks value (I know for a Linux table, right?). For those unfamiliar, its basically the number of 100 nanosecond units that have passed since 0 A.D.

Partition keys can be queried by comparison operators available by the Azure Tables service. So using the gt operator in combination with the 0000000000000000__0 can get you just the CPU data points for the time span you care about (for example, the last 5 minutes).

The catch is that the 0000000000000000__0 can be 0000000000000001__0 or 0000000000000002__0 or 0000000000000003__0 all the way up to 0000000000000009__0. This is not documented by Azure however it appears that the latest data can be in any one of these prefixes. Once you find the right prefix, you can get the correct data for your time interval of choice.

Programmatically speaking, to query for the last 5 minutes of data requires you to iterate up to N=10 through each parition key to determine if you selected the correct prefix.

You can find Disk and Memory information from the LinuxDiskVer2v0 and LinuxMemoryVer2v0 tables within the same storage account.


Windows VM tables are much more straight forward. The partition key is just 0 appended to a .NET timestamp and querying by gt on the partition key will give you all entries within your specified time without the need of iterating through prefixes:

Window VM performance counter table screenshot.

Note that on Windows, you'll have to also include CounterName into your table query based on the performance metric you want.

Finding the Correct Storage Account

On VMPower we use the Azure Resource Manager APIs to discover the correct storage account you configured for diagnostics storage in the portal. Just use the Azure resource explorer and you'll see it can easily be found within a VM resource description.