etcd CoreOS cluster on Azure

The basics – what is etcd?

etcd is an open source distributed key value store.

It was created by CoreOS to provide a store for data and service discovery for their distributed applications like fleet,flannel and locksmithd.

etcd exposes a REST based API for applications to communicate

etcd runs as a daemon across all cluster members. Clusters should be created in the recommended odd numbers of 3,5,7 etc nodes to allow for node failures and a quorum (>51%) to remain.

Clusters run with a single leader and other cluster members following. If the leader fails, remaining nodes through a consensus algorithm elect a new one.

What’s it used for?

Apart from the CoreOS use of etcd, who else uses it?

The big one is Kubernetes – Management, orchestration and service discovery of containers across a cluster of worker nodes. etcd used to store cluster state and configuration data and is core to the operation of Kuberbetes.

Apache Mesos and Docker Swarm can also optionally use etcd as their cluster KV stores.

etcd on Azure?

I decided to use the Azure CLI to deploy a three node etcd cluster.

I could have also used PowerShell on Windows or authored an ARM template, but it was interesting to use the Azure CLI, as I tend to use OS X and a Linux desktop more than Windows these days.

The final script sets up:
-An Azure resource group
-An Azure Storage account
-A virtual network with single subnet
-Three vnics with static IPs
-Three Public IPs for connecting in via SSH later if required
-Finally three CoreOS Linux VMs, that each use a cloud-config.yaml file to automate the configuration of the etcd cluster and Systemd units. I also add in my public SSH key to allow remote SSH logins if required.

The script can be git cloned from https://github.com/seedb/AzureCLI.etcd.git

Alternatively, here it is:

#!/bin/bash
#Tested on MacOS X 10.11.3 and Ubuntu 15.10 with Azure CLI tools installed
#https://azure.microsoft.com/en-us/documentation/articles/xplat-cli-install/
#Use azure login to get a token against your Azure subscription before running script
#--------------------------------------------------------------------------------------------
#Variables to change
location="australiasoutheast" #Azure DC Location of choice
resourcegroup="etcd-cluster" #Azure Resource Group name
vnetname="etcd-vnet" #Virtual Network Name
vnetaddr="10.1.0.0/16" #Virtual Network CIDR
subnetname="etcd-subnet" #Subnet name
subnetaddr="10.1.0.0/24" #Subnet CIDR
availgroup="etcd-avail-group" #Availibilty group for VMs
storageacname="etcdstorage01" #Storage Account for the VMs
networksecgroup="etcd-net-sec" #Network Sec Group for VMs
#VM name and private IP for each
vm01_name="etcd-01"
vm_static_IP1="10.1.0.51"
vm02_name="etcd-02"
vm_static_IP2="10.1.0.52"
vm03_name="etcd-03"
vm_static_IP3="10.1.0.53"
coreos_image="coreos:coreos:Stable:835.9.0" #CoreOS image to use
vm_size="Basic_A0" #VM Sizes can be listed by using azure vm sizes --location YourAzureDCLocaitonOfChoice
#---------------------------------------------------------------------------------------------
#Azure CLI------------------------------------------------------------------------------------
#Azure Resource Group setup
azure config mode arm
azure group create --location $location $resourcegroup
#Azure Storage account setup
azure storage account create --location $location --resource-group $resourcegroup --type lrs $storageacname
#Azure Network Security Group setup
azure network nsg create --resource-group $resourcegroup --location $location --name $networksecgroup
#Virtual network and subnet setup
azure network vnet create --location $location --address-prefixes $vnetaddr --resource-group $resourcegroup --name $vnetname
azure network vnet subnet create --address-prefix $subnetaddr --resource-group $resourcegroup --vnet-name $vnetname --name $subnetname --network-security-group-name $networksecgroup
#Public IP for VMs (can create SSH inbound rules to access these if required)
azure network public-ip create --resource-group $resourcegroup --location $location --name "$vm01_name"-pub-ip
azure network public-ip create --resource-group $resourcegroup --location $location --name "$vm02_name"-pub-ip
azure network public-ip create --resource-group $resourcegroup --location $location --name "$vm03_name"-pub-ip
#Virtual Nics with private IPs for CoreOS VMs
azure network nic create --resource-group $resourcegroup --subnet-vnet-name $vnetname --subnet-name $subnetname --location $location --name "$vm01_name"-priv-nic --private-ip-address $vm_static_IP1 --network-security-group-name $networksecgroup --public-ip-name "$vm01_name"-pub-ip
azure network nic create --resource-group $resourcegroup --subnet-vnet-name $vnetname --subnet-name $subnetname --location $location --name "$vm02_name"-priv-nic --private-ip-address $vm_static_IP2 --network-security-group-name $networksecgroup --public-ip-name "$vm02_name"-pub-ip
azure network nic create --resource-group $resourcegroup --subnet-vnet-name $vnetname --subnet-name $subnetname --location $location --name "$vm03_name"-priv-nic --private-ip-address $vm_static_IP3 --network-security-group-name $networksecgroup --public-ip-name "$vm03_name"-pub-ip
#Create 3 CoreOS-Stable VMs, fly in cloud-config file, also provide ssh public key to connect with in future
azure vm create --custom-data=etcd-01-cloud-config.yaml --ssh-publickey-file=id_rsa.pub --admin-username core --name $vm01_name --vm-size $vm_size --resource-group $resourcegroup --vnet-subnet-name $subnetname --os-type linux --availset-name $availgroup --location $location --image-urn $coreos_image --nic-names "$vm01_name"-priv-nic --storage-account-name $storageacname
azure vm create --custom-data=etcd-02-cloud-config.yaml --ssh-publickey-file=id_rsa.pub --admin-username core --name $vm02_name --vm-size $vm_size --resource-group $resourcegroup --vnet-subnet-name $subnetname --os-type linux --availset-name $availgroup --location $location --image-urn $coreos_image --nic-names "$vm02_name"-priv-nic --storage-account-name $storageacname
azure vm create --custom-data=etcd-03-cloud-config.yaml --ssh-publickey-file=id_rsa.pub --admin-username core --name $vm03_name --vm-size $vm_size --resource-group $resourcegroup --vnet-subnet-name $subnetname --os-type linux --availset-name $availgroup --location $location --image-urn $coreos_image --nic-names "$vm03_name"-priv-nic --storage-account-name $storageacname
#Azure CLI------------------------------------------------------------------------------------

Generally it takes around 10 minutes to deploy all the components (sometimes a storage account can take 15 minutes by itself – bizarre but repeatable!) But after it’s done, your resource group should look like this.

Screen Shot 2016-03-26 at 2.02.10 PM

At this point, I created an SSH inbound rule into one of the public IPs on one VM and test that my etcd cluster is all good.

Screen Shot 2016-03-26 at 2.14.24 PM

I can then insert some key-value pairs for testing and then think about deploying Kubernetes. All in another blog 🙂

Bulk creating VMs using Nutanix Acropolis ACLI

I have been playing around with the fantastic Nutanix Community Edition (CE) now for a week or more.

Despite the availability of all KVM VM actions in the HTML5 Prism GUI, I wanted to quickly create any number of VMs for testing.

The best way to do this for me was from the shell of the Nutanix controller VM.

I SSH’d into the CVM on my Nutanix CE node, then from the bash shell ran my script which calls the Acropolis CLI commands for creating and modifying VMs.

It loops for 10 iterations, sets VMs to 2GB of memory and 1 virtual CPU. It also attaches a 20GB disk for each VM on my NDFS container called CNTR1

for n in {1..10}
do
acli vm.create MyVM$n memory=2048 num_vcpus=1
acli vm.disk_create MyVM$n create_size=20G container=CNTR1
done

ntnx1

This is the tip of the iceberg really, in the script you could also mount an ISO file from your container to each VM and power them on if you wanted to. Better still, clone from an existing VM and create 10 that way would have been better, but the ACLI does not appear to have vm.clone yet as shown in the Nutanix Bible: Capture Or the Acropolis Virtualization Administration docs: Capture2

ACLI commands present on the Community Edition (currently 2015.06.08 version) – are missing vm.clonentnx3 Make sure you check out the new “Book of Acropolis” section in the Nutanix Bible: http://stevenpoitras.com/the-nutanix-bible/#_Toc422750852

Teradici PCoIP firmware 3.5 USB performance comparsion

Recently, Teradici released version 3.5 firmware for PCoIP Zero clients based on the Portal processor.

One of the main enhancements was the ability to fully utilize the USB 2.0 capable hardware that always existed in these devices, but was restricted to USB 1.1 performance by the previous firmware.

Version 3.5 firmware removes this restriction when using a PCoIP Zero client against a VMware View desktop.

I thought I would do a quick test to see what the real world differences were between version 3.4 and 3.5 firmware on a Wyse P20 Zero client.

The Test setup
Wyse P20 Zero client (on gigabit network)
VMware View 5.0 and vSphere 5.0 Infrastructure
Windows XP Virtual Desktop (1 vCPU and 1.5Gb RAM) running in a View 5.0 floating pool
Teradici PCoIP Management Console 1.7

I took a basic 2GB Sandisk USB memory stick and connected it to the Wyse P20 running V3.4 firmware. I then copied a 150MB test file from the local disk of my VM to the memory stick.

I then logged off the desktop and refreshed it so as not to influence the performance via caching etc.

The Wyse P20 was then updated to V3.5 firmware, pushing it out from the Teradici Management console.

Logging back into a fresh View XP desktop, I copied the same 150MB file to the USB memory stick.

The results:

V3.4 firmware: 4m:35s to copy 150MB file

V3.5 firmware: 1m:48 to copy 150MB file

So, almost a four fold increase in USB performance

This will make the Teradici based zero clients more acceptable in situations requiring moving files via USB memory stick and removable hard disks, as well as other applications involving USB based CD/DVD burners.

In my environment, this is the best feature of the latest V3.5 firmware – a must have.

Load Balancing VMware View 4.5 with Microsoft NLB

With the release of VMware View 4.5, the connection servers are now fully supported on Windows Server 2008 R2 – it seemed like a good time to test the easy-to-setup NLB inside the latest Windows Server OS.

To enable NLB on your View Servers you will have to enable the feature on both servers that you intend to balance.

From Server Manager select Features and click add feature, then check Network Load Balancing.

From there, open up Administrative Tools on one of servers and you should see a shortcut to the Network Load Balancing Manager.

Launching NLB Manager, right click on the top level tree and select New Cluster
Add the host name of your first View connection server and click connect – it should correctly resolve the IP address if your DNS is working correctly on your network. Click next.
Accept the unique host identifier of 1 –  as this is the first server in your NLB cluster. Leave other settings as is
Next, add the cluster IP address.
This is a free IP address on your network that will be the virtual cluster IP of your load balanced View servers
Add the DNS name to the cluster IP (you will need to make a DNS host record for your cluster IP)
NOTE: VMware recommends that you use multicast mode, because unicast mode forces the physical
switches on the LAN to broadcast all Network Load Balancing traffic to every machine on the LAN** VMware docs on NLB
Next define the port rules for load balance access to your cluster. By default it will load balance all ports 0-65535, both TCP and UDP.
You may wish to only allow HTTP/HTTPS TCP to be load balanced. Edit the ruleset to suit your environment. I used the defaults.
Click Finish and this will add your first host to the cluster
To add the second View Connection server to the cluster, right click the top level of the cluster and select add host to cluster and step thought the process again adding the details of the second View server to the cluster name and cluster IP.
When you’re done – you will have two VMware View connection servers in an NLB cluster. You should be now be able to ping both DNS name and IP address of the cluster from your network. (depending on firewall settings you may have on/off on Server 2008 R2 – but thats another matter)

More P4000 G2 IOMeter Benchmarking (with vSphere RoundRobin tweak)

Here we go with some more IOMeter madness. This time using a much debated vSphere round-robin tweak.

This debate has gone on in the blogosphere in recent weeks about the number of IOPS per path tweak that can be changed from the ESX command line.

The debate is simply; Does it work?

By default, using the round-robin path selection,  1000 IOPS goes down the first path and then the next 1000 on the next path.

From the commandline of the ESX host we can set the round robin parameters:

esxcli nmp roundrobin setconfig –device LUNID  –iops 3 –type iops

Where LUNID is your volume identifer in vSphere e.g naa.6000eb39539d74ca0000000000000067

This setting effectively means that the path will switch every 3 IOPS.

It  appears to greatly help throughput intensive workloads i.e large block sequential reads (restores and file archives) where the iSCSI load balancing of the HP P4000 nodes (ALB) can take effect.

It appears that small block (4k, 8K) are not really helped by the change – so I am only looking at larger block (32k and 256K)

The Setup
Three node P4500 G2 cluster, with twelve 450GB 15K SAS drives in each node. All running SAN/iQ 8.5. Node NICs using ALB.

Windows Server 2003 R2 VM Guest (Single vCPU, 2GB RAM) with a virtual disk on 50GB Volume @ Network RAID-10 (2-way replica)

ESX 4.0 Update 1, with two VMkernel ports on a single vSwitch (two Intel Pro/1000 NICs) bound to the software iSCSI adapter with Round Robin multi-path selection

IOMeter Settings

One worker
64 Outstanding requests (mid-heavy load)
8000000 sectors (4GB file)
5 Minute test runs

The Results

(100% Sequential, 100% Read, 32K Transfer size)

Default IOPS setting :110 MB/s sustained with 3500 IOPS
Tweaked IOPS setting: 163MB/s sustained with 5200 IOPS

(100% Sequential, 100% Read, 256K Transfer size)

Default IOPS setting :113 MB/s sustained with 449 IOPS
Tweaked IOPS setting: 195MB/s sustained with 781 IOPS

The NMP tweak does indeed seem to make a difference for larger block read operations from  iSCSI volumes where the array can do it’s outbound load balancing magic

I would be interested to see if others can get similar results on the HP/Lefthand arrays as well as other iSCSI and NFS arrays (Equallogic, EMC, NetApp)

The real question is; does this help in the real world or only under IOMeter “simulation” conditions.


HP P4000 G2 IOMeter Performance

Some quick benchmarks on a three node P4500 G2 cluster, with twelve 450GB 15K SAS drives in each node. All running SAN/iQ 8.5

The Setup:
Windows Server 2003 R2 VM Guest (Single vCPU, 2GB RAM) with a virtual disk on 50GB Volume @ Network RAID-10 (2-way replica)

ESX 4.0 Update 1, with two VMkernel ports on a single vSwitch (two Intel Pro/1000 NICs) bound to the software iSCSI adapter with Round Robin multi-path selection (shown below)

More info on the vSphere iSCSI config can be found on Chad Sakacs blog or from the PDF on HPs P400o site

IOMeter Settings

One worker
64 Outstanding requests (mid-heavy load)
8000000 sectors (4GB file)

The Results

Max Read: (100% Sequential, 100% Read, 64K Transfer size)
156 MB/s sustained

Max Write: (100% Sequential, 100% Write, 64K Transfer size)
105 MB/s sustained

Max IOPS (100% Sequential, 100% Read, 512byte Transfer size)
31785 IOPS

Harsh Test IOPs (50/50 Random/Sequential, 50/50 Read/Write, 8k Transfer size)
3400 IOPS and 34MB/s

Some new “real world” tests
(50/50 Random/Sequential, 50/50 Read/Write, 4k Transfer size)
3200 IOPS and 23MB/s

(50/50 Random/Sequential, 50/50 Read/Write, 32k Transfer size)
2700 IOPS  and 78MB/s

Overall:

Excellent throughput and IO for three 1Gb iSCSI  nodes on a software initiator and with replicated fault tolerant volumes.

And remember, IOPS scale linearly with each extra node added. And 10Gbe NICs are available should you really want to get that throughput flying.