VDI and vSphere DRS
Abstract
"I would type away in my e-mail or application, but nothing appears. I can move the mouse around, but after a few seconds, the virtual desktop catches up to what I typed!"The above example is not an uncommon complaint you will run into through your VDI adventures. Most of the time, the behavior is observed with a specific action - a network latency issue between the presented application and the back end database, i.e. the database is unable to keep up with the queries. If applicable, follow up your investigation and probe for complement behaviors like "(occasional) choppiness scrolling within the application", "Application Not Responding errors", etc.
However, sometimes, the behavior can be caused by the underlying hypervisor's attempts to balance itself. Vmware vSphere's Distributed Resource Scheduler (DRS) is one common example. Here are some options to consider reviewing:
Automation Level
The DRS feature constantly monitors the overall balance of a given cluster, and by default, DRS is allowed to dynamically migrate VMs from one host to another. Even with the fastest flash storage and optimized VMs, the migration can take upwards of 12 seconds. That's 12 seconds when the machine is essentially "paused" until the migration is complete. If a user is logged in at the time, the behavior isn't that of a disconnected session, but one that's temporarily put on pause ... with no warning to the user.
If the cluster isn't overly over-committed, it should be set to Manual, where all VM migrations must be decided by the administrator, or Partially Automated, which gives DRS a chance to relocated VMs only on power-on or reboots.
Remember that even if the cluster has plenty of headroom for resources, DRS will do what it needs to achieve a balance within the cluster. If it's too aggressive, it will migrate VMs between hosts even though they can handle hundreds of more replicas each. This brings me to the another DRS setting...
Migration Threshold
According to the console, the Migration Threshold measure how much imbalance across hosts in the cluster is acceptable based on CPU and memory loads. Recommendations are generated automatically (...). The more conservative the setting, the more imbalance is tolerated between hosts.
The default setting is acceptable in most cluster implementations.
Even if the Automation Level is set to Partially Automated, DRS will still provide more conservative recommendations.
VM Distribution and Memory Metric
VM Distribution is one of the main functions that DRS uses to balance the cluster. The Automation Level determines when DRS performs such functions. Enable this feature when:
1. Most of your VDI VMs have a similar resource usage profile.
2. You consider that if a host fails, then VMs are booted up in a balanced manner.
By default DRS balances hosts based on the Active Memory used by the VMs because Active Memory tends to be more a more accurate metric of what is being used. It works best when memory isn’t over-committed. But VDI VMs tend to be clones from a gold image, so it is much easier to forecast how much memory they can consume. If you can determine that memory isn’t overly committed, this feature can be safely enabled.
No comments