Contents

Customize Azure Linux Virtual Machines with cloud-init

Customize Azure Linux VMs with cloud-init on first boot


Cloud-init is a popular open-source tool that automates the initial setup of a cloud instance, such as an Azure Linux virtual machine (VM). It allows administrators to define custom configurations, such as users, disk layout, and network settings, that will be applied to the instance when it is first launched. This way, you can quickly and consistently provision new instances with your desired configurations, eliminating the need for manual setup and configuration. In this post, we will explore how to use cloud-init on Azure Linux VMs to configure your instances according to your needs.

Customize Azure Linux VMs with cloud-init

Cloud-init is a popular tool for customizing Linux instances at launch time. It is commonly used to automate the initial setup of virtual machines in the cloud, including the installation of packages, creation of users, and configuration of network settings.

When deploying an Azure Linux virtual machine, cloud-init can be used to configure the instance to meet specific requirements. This can include tasks such as:

  • Installing packages: You can specify a list of packages to be installed at launch time, such as Apache, Docker, or Git.
  • Creating users: You can create new users and specify their username, password, and group membership.
  • Running custom scripts: You can include custom scripts to be executed at launch time, such as installing software or performing additional configuration tasks.
  • and much more!

By using cloud-init, you can automate the process of deploying and configuring Azure Linux virtual machines, saving time and reducing the risk of errors that can occur when manually setting up instances. This can also help to standardize the configuration of instances, ensuring that they are consistently set up and ready to use.

cloud-init supported Linux distros on Azure

Not all Linux distros have cloud-init enabled images available in the Azure Marketplace. More information.

cloud-init config file

The cloud-init configuration file is a YAML-formatted file that specifies the initial configuration for a cloud instance. Cloud-init uses it to set up the instance and can contain a variety of different sections that define different aspects of your desired configuration.

Lets walk through the following example config file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#cloud-config
package_update: true
package_upgrade: true

groups:
- docker

system_info:
  default_user:
    groups: [docker]

packages:
- docker.io
- unattended-upgrades

On the first line you need to specify the user data format. cloud-init has multiple user data format types which it can act upon. In the above example I have mentioned the type #cloud-config which allows me to specify the configuration in a human-friendly format. If you are interested in all of the available user data format types you can find them in the documentation.

On the next line I have set package_update: true, this tells cloud-init to update the packaage list before installing anything, to ensure that it has the latest information about the available packages. After this we run package_upgrade: true, so that cloud-init will upgrade all installed packages to the latest available versions. This way all of the latest security patches and bug fixes are applied for both existing and new packages which we will install in the same config file later on. cloud-init will use the package manager that is being used by default on the system.

Because I want to install Docker on the VM I need to create a group with the name docker and add the default user to the group. The groups section specifies that the docker group should be created. The system_info section sets the default user information, including the groups that the user should belong to. In this case, the default user is added to the docker group using the groups directive.

default_user
NOTE: Note that this configuration does not specify a username for the default user, so the username will depend on the underlying Linux distribution and cloud-init implementation.

The packages section lists all the packages to be installed.

This cloud-config file uses a simple format and structure, and it is easy to customize to meet your specific needs. To use this file, you would pass it as a user data file when creating your Azure Linux virtual machine, and cloud-init would automatically use it to set up the instance with Docker installed and running. In the next section I will show how you can achieve this.

Provisioning warning
WARNING: Since the configuration is applied during the initial boot process, you cannot change the configuration afterwards. Trying to do so will result in a error message.

Deploy an Azure Linux Virtual Machine with cloud-init config

Using the az vm create command to deploy an Azure VM, you can specify the cloud-init configuration using the --custom-data flag option. For example:

1
2
3
4
5
6
az vm create \
  --resource-group myResourceGroup \
  --name LinuxVM \
  --image Canonical:0001-com-ubuntu-server-focal:20_04-lts-gen2:latest \
  --custom-data cloud-init.txt \
  --generate-ssh-keys 

The cloud-init.txt, containing your cloud-init configuration, will be passed to the Azure VM at launch time, where cloud-init will use it to configure the instance.

Paramterize cloud-init with Bicep

One of the use cases I had apart from installing Docker was that I also wanted to install an Azure DevOps agent on the VM so that I could use the VM as a self hosted agent. I also wanted to be able to substitute the parameters due to the fact that the VM will be deployed to multiple environments. This can be achieved by using the format (or replace) function in Bicep.

In the the following cloud-init config file I have added some additional packages which are dependencies required for the self hosted agent. I have also added the runcmd directive, this is used to run shell commands on the machine. I will not cover the actions performed to install the agent, if you are interested in the instructions you can find them in the Azure DevOps self hosted agent docs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#cloud-config
package_update: true
package_upgrade: true

groups:
- docker

system_info:
  default_user:
    groups: [docker]

apt:
  sources:
    # https://canonical-cloud-init.readthedocs-hosted.com/en/latest/reference/examples.html#additional-apt-configuration-and-repositories
    docker.list:
      # source
      # Creates a file in /etc/apt/sources.list.d/ for the sources list entry
      # based on the key: "/etc/apt/sources.list.d/docker.list"
      # One can specify [signed-by=$KEY_FILE] in the source definition, which
      # will make the key be installed in the directory /etc/cloud-init.gpg.d/
      # and the $KEY_FILE replacement variable will be replaced with the path
      # to the specified key. If $KEY_FILE is used, but no key is specified,
      # apt update will (rightfully) fail due to an invalid value.
      source: "deb [arch=amd64 signed-by=$KEY_FILE] https://download.docker.com/linux/ubuntu focal stable"
      # keyid
      # Importing a gpg key for a given key id. Used keyserver defaults to
      # keyserver.ubuntu.com
      keyid: 8D81803C0EBFCD88 # GPG key ID published on a key server

packages:
# https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
- docker-ce
- docker-ce-cli
- containerd.io
- unattended-upgrades
# ado
- zlib1g
- libssl1.1
- libkrb5-3
- liblttng-ust-ctl4 
- liblttng-ust0 
- liburcu6

runcmd:
- mkdir /myagent && cd /myagent
- curl -sOL https://vstsagentpackage.azureedge.net/agent/{0}/vsts-agent-linux-x64-{0}.tar.gz
- tar zxvf vsts-agent-linux-x64-{0}.tar.gz
- AGENT_ALLOW_RUNASROOT=1 ./config.sh --url {1} --auth pat --token {2} --unattended --pool {3} --agent {4} --replace
- AGENT_ALLOW_RUNASROOT=1 ./svc.sh install
- AGENT_ALLOW_RUNASROOT=1 ./svc.sh start
- AGENT_ALLOW_RUNASROOT=1 ./svc.sh status
- chown -R {5} /myagent

Note that for some of the values in the cloud-init file I have given {0}, {1}, etc. These values will be replaced by the format string function in bicep.

In the main.bicep you can leverage the loadTextContent function to load in the content of the specified file as a string. Next you can call the format function to substitute the parameters in the cloud-config file. And then as a last step you can use the base64 function to convert the string to base64 and pass it in.

In this sample bicep file im calling a module for the virtual machine deployment. The referenced module can be found in my github repository: virtualMachines/main.bicep.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
@description('The name of you Virtual Machine.')
param vmName string = 'UbuntuVM'

@description('Username for the Virtual Machine.')
param adminUsername string

@description('SSH Key or password for the Virtual Machine. SSH key is recommended.')
@secure()
param adminPasswordOrKey string

@description('Location for all resources.')
param location string = 'southcentralus'

@description('Unique DNS Name for the Public IP used to access the Virtual Machine.')
param dnsLabelPrefix string = toLower('${vmName}-${uniqueString(resourceGroup().id)}')

@secure()
param patToken string

var addressPrefix = '10.1.0.0/16'
var subnetAddressPrefix = '10.1.0.0/24'

// https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/v2-linux?view=azure-devops#unattended-config
var values = {
  adoVersion: '2.217.2'
  adoOrganizationUrl: 'https://dev.azure.com/JeroenTrimbach'
  adoPatToken: patToken
  adoPoolName: 'selfhosted'
  adoAgentName: 'ubuntu'
  adoUser: adminUsername
}

var cloudInit = loadTextContent('cloud-init-ado-param.yml')

resource virtualNetwork 'Microsoft.Network/virtualNetworks@2022-07-01' = {
  name: 'virtualnetwork-01'
  location: location
  properties: {
    addressSpace: {
      addressPrefixes: [
        addressPrefix
      ]
    }
    subnets: [
      {
        name: 'subnet-01'
        properties: {
          addressPrefix: subnetAddressPrefix
        }
      }
    ]
  }
}

resource publicIP 'Microsoft.Network/publicIPAddresses@2021-05-01' = {
  name: 'publicIP'
  location: location
  sku: {
    name: 'Basic'
  }
  properties: {
    publicIPAllocationMethod: 'Dynamic'
    publicIPAddressVersion: 'IPv4'
    dnsSettings: {
      domainNameLabel: dnsLabelPrefix
    }
    idleTimeoutInMinutes: 4
  }
}

resource nsg 'Microsoft.Network/networkSecurityGroups@2021-05-01' = {
  name: 'nsg'
  location: location
  properties: {
    securityRules: [
      {
        name: 'SSH'
        properties: {
          priority: 1000
          protocol: 'Tcp'
          access: 'Allow'
          direction: 'Inbound'
          sourceAddressPrefix: '*'
          sourcePortRange: '*'
          destinationAddressPrefix: '*'
          destinationPortRange: '22'
        }
      }
    ]
  }
}

module virtualMachine 'modules/virtualMachines/main.bicep' = {
  name: 'deploy-virtualMachine'
  params: {
    location: location
    adminPasswordOrKey: adminPasswordOrKey
    adminUsername: adminUsername
    name: vmName
    subnetId: resourceId('Microsoft.Network/VirtualNetworks/subnets', 'virtualnetwork-01', 'subnet-01')
    publicIpId: publicIP.id
    networkSecurityGroupId: nsg.id
    authenticationType: 'sshPublicKey'
    customData: base64(format(cloudInit, values.adoVersion, values.adoOrganizationUrl, values.adoPatToken, values.adoPoolName, values.adoAgentName, values.adoUser))
  }
}

output sshCommand string = 'ssh ${adminUsername}@${publicIP.properties.dnsSettings.fqdn}'
base64
TIP: Please keep in mind that we only use the LoadTextContent bicep function because we want to substitute parameters. If your cloud-init config file contain static values you can leverage the loadFileAsBase64 bicep function instead. This function loads the file directly as a base64 string without the need of calling the base64 function separately.

And then deploy your bicep file e.g.:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# login to azure
az login

# grab resource group name. e.g.:
rg=$(az group list --query "[].name" --output tsv)

# deploy main.bicep with runtime params
az deployment group create \
  -g $rg \
  -f main.bicep \
  -p patToken=<insertValue> \
  -p adminUsername=<insertValue> \
  -p adminPasswordOrKey=<insertValue>

# and for convenience
az deployment group show -g $rg --name main --query properties.outputs.sshCommand.value -o tsv

See my github repo for all files used in this post: cloud-init-demo.

Debugging cloud-init

A log file can be found in the following location: /var/log/cloud-init-output.log.

Keep in mind that for Azure Linux Virtual Machines cloud-init only runs on first boot. In case you found an error and fixed it then in order to fix the configuration you will have to redeploy the Virtual Machine.

Conclusion

In summary, cloud-init is a powerful tool for automating the deployment and configuration of Azure Linux virtual machines, helping to streamline the process and ensure consistent and reliable instance setups.

Have a very nice day! 👍