Vmware Horizion View isseus

DevilWAH · October 2014

Does any one here use VMware view desktops? We have had a contractor install it bu the experience is not good. It makes me think of going on the console of a VMguest server before you have installed VMware tools. Doing things like opening Lync 2013 and scrolling push the CPU to 100%, and I see CPU spikes when Maxamizing and Minimizing windows.

The hardware is ample 4 socket 8 core servers with 256Gig ram. each VDI guest has 6 gig provisioned and a quad CPU. and we only are testing with 4 guests so I cant see it being a resource issue.

Any thoughts how to improve the user experience?

Cheers

Essendon · October 2014

Tell us more about these desktops

- linked clones/full?
- master image optimized?
- AV scan times?
- happens in the mornings/evenings? (bootstorms/log offs)
- why 4 vCPUs?
- what OS?

How's the storage doing? You say the hosts have grunt, what's their utilization like and do they run server workloads too?

DevilWAH · October 2014

1. They are dedicated desktops so full clones
2. I am not a VDI engineer but according to the engineer yes they are. Visual effect turned of and vmware tools installed (i believe he ran a pre determined optermisation script for windoews 7 as per Vmware best practices.
3. No AV installed at the moment UAT
4. Happens all the time (its 12:00 am and I doubt any one else is logged on

)
5. Sorry its only a single VCPU with 2 Vcores
6. Windows 7

As for storage its running on VSAN with 2 SSD and 6 Spindle drivers per host. The servers where spec'ed by a the company who specilise in VDI, and have dedicated 10Gig running between host. VDi takes about 15 seconds to boot so don't think there is any storage limitation. The servers were spec'ed to support 200 Desktops across 6 hosts.

I can reproduce it but opening up the microsoft Lync client, opening up task manager and then dragging the Lync window around the screen. CPU will jump to 100%. it only jumps to 50% if I do the same with word and other applications do the same, some 100% some around 50%.

seems mouse operations such as dragging and scrolling are a massive CPU hit.

DevilWAH · October 2014

Oh and it is running on vmware esxi host 5.5. and latest vmware view

Essendon · October 2014

Thanks for the additional info.

- How about Windows Aero?
- Drop the number of vCPU's down to 1 (also a best practice, unless 2 are needed for multimedia intensive applications).
- Windows update/Super-fetch/indexing/hibernation/screensaver/system restore/fading effects/animations for maximizing and minimizing windows - OFF?
- what kind of storage adapter for the guests?
- the software on the guests - is it being streamed or is it a full install?

Essendon · October 2014

Also check CPU Ready times for these VM's please, though I suspect the problem's with the master image.

DevilWAH · October 2014

HI,

thanks for all the pointers, it seems that the "EXPERT" did not sort out the image very well and it was not optimised. Now we have all the user experiences turned off (pretty much as per the optimization tool from vmware labs with a few other tweaks).

But we still see an issue when scrolling in the Lync client sending the CPU to 100%. how do I check CPU ready times?

PS. Thank you for the help

Essendon · October 2014

No problems. Too many purported experts in our industry!

Do you see the problem with Lync only? How's Lync being delivered to the desktops? Is it a full or a streaming application?

Can you drop down the vCPU's to 1? Should make a difference I reckon.

CPU Ready times can be checked with the vSphere client. VM > Performance > Advanced > Chart Options > Counters > Check Ready, Usage and Co-stop > Ok. Then tell me the value you see for Ready > Summation > Average for the virtual desktop. Or better just Copy and paste the resultant chart in here if you can.

Essendon · October 2014

Check the latency for the LUN(s) the desktops are on. Use Lync and again see if the latency jumps up.

DevilWAH · October 2014

Annoyingly I have to wait for the consultant to give me the stats as it has still to be handed over

. And am being told that he is at a customer site for the next few days!

However been reading up a lot about this and I think you might well have hit on it. Lync is installed locally and when you scroll in Lync it is checking the status of your contacts, so a lot of work is going on, any latency in ready state would cause the issue, so as soon as I can I will be looking at it.

It started me thinking though and I have been looking at the ready states on or servers. I took on e at random which is installed as per CISCO recommendations with 16 VCPU's. I see 15 of the CPU's at about 200ms ready state, but one hovering around 3500ms. and this is repeated on a few other servers with one CPU seeming to have latency issues while the others are OK. Is this common and is there a way to improve it, I notice these servers are the ones that feel unresponsive.
I was also wondering if there is a difference when it comes to scheduling vCPU and vCPU cores? I am assuming if I reduced a Duel vCPU guest to a single vCPU guest but with 2 cores that would actually make no difference.

Cheers

joelsfood · October 2014

Lync on View is definitely rough.

We actually went from zero clients to thin clients that run lync natively specifically for this issue.

Other than that, I'm definitely no VDI expert, and don't claim to be one, so I won't propose any further suggestions off the top of my head here. Thankfully, I don't run the vdi servers, just our server VMs

dales · October 2014

View optimisation can get off to a good start by following this http://www.vmware.com/files/pdf/VMware-View-OptimizationGuideWindows7-EN.pdf page 17 onwards is the interesting bit. Office 2013 we have found increases the IO greatly in VDI deployments compared to Office 2010. There is also a VDI plugin for Lync that improves the audio and video capabilities (although still with VDI YMMV).

As with anything SBC or VDI the infrastructure bit is easy its the image that should take the most time.

DevilWAH · October 2014

Yep have the vdi plug in, but I am not sure it works as the documentation states

1. vdi plug in must match the OS bit 64/32 for it to load correctly
2. VDI must match the office version installed bit 64/32.

Well as its suggest to run 64 bit windows but microsoft recommends against 64bit office, its a bit dificult for this to work in 99% of installations. It also only helps with voice/video and only in thin clients not zero clients, as joelsfool pointed out.

Essendon · October 2014

DevilWAH wrote: »

However been reading up a lot about this and I think you might well have hit on it. Lync is installed locally and when you scroll in Lync it is checking the status of your contacts, so a lot of work is going on, any latency in ready state would cause the issue, so as soon as I can I will be looking at it.

It started me thinking though and I have been looking at the ready states on or servers. I took on e at random which is installed as per CISCO recommendations with 16 VCPU's. I see 15 of the CPU's at about 200ms ready state, but one hovering around 3500ms. and this is repeated on a few other servers with one CPU seeming to have latency issues while the others are OK. Is this common and is there a way to improve it, I notice these servers are the ones that feel unresponsive.

I was also wondering if there is a difference when it comes to scheduling vCPU and vCPU cores? I am assuming if I reduced a Duel vCPU guest to a single vCPU guest but with 2 cores that would actually make no difference.

Cheers

I particularly dislike these vendor "recommendations", but since you mention this one's from Cisco there may be a valid reason for it. What's running on these VM's with 16 vCPU's?

Lemme explain a bit about Ready times - CPU Ready Time is the time a VM waits for the hypervisor to schedule it one of its pCores. Most VM's will have some ready times regardless of how busy a host is, the ready times will go up if:

- the host(s) is/are heavily oversubscribed
- the VM(s) have multiple vCPUs. It is totally normal for VMs with multiple vCPUs to have higher ready times than VMs with single/two vCPU's. These VM's will require more grunt work from the hypervisor because it needs to find time for all vCPUs. It is worth noting that each core accumulates time separately.
- the VM(s) are overprovisioned. Say you have a VM with 8 vCPU's. The hypervisor now needs to find time to be able to schedule all of these vCPUs concurrently, the VM continues to wait till the hypervisor can do so resulting in high CPU ready times. This is exacerbated by:

> multiple VMs that are overprovisioned (the hypervisor now needs to find time to schedule multiple VMs resulting in worse performance for all such VMs)
> there are lots of VMs with single CPUs and some with multiple CPUs (large differences in scheduling)
> utilization of host (I generally recommend a 1:4 ratio of pCPUs to vCPUs, 1:6 is probably not too bad, 1:8 is stretching it and anything more is just not right) and the NUMA config of the host (and if the overprovisioned VMs fit into a single NUMA node). This one people tend to overlook but is just as important. Read this please and I'll avoid having to rehash the info out.

If things are clearer from the above points (dont blame you if they arent, this isnt a simple topic!), it's easy to misinterpret the readings from the vSphere client. Generally, anything over 10% per vCPU is critical and should be looked at right away. It's worth mentioning that Ready times arent only to do with CPU IO, but they can be due to CPU/RAM/disk. Besides, the reported values you are seeing are a summation over 20 seconds. So if some vCPUs have a 200ms ready time, then 200/20000 x 100 = 1% (is not too bad), but 3500/20000 x 100 = 17.5% is really bad. This means that do any work, the VM's waiting 17% of the time for it to be scheduled on the hypervisor because all vCPUs must be scheduled concurrently.

I suggest you look at the CPU and RAM utilization of your hosts. If the hosts arent under CPU/RAM contention, then your likely problem is disk. Check the latency of the disks these servers are on.

About your question with the difference between sockets and cores - there's no difference as long as you stay within the NUMA boundary of the host (see the blog I linked to above). A VM with 1 socket and 4 cores vs 4 cores and 1 socket will have absolutely no difference in performance. None at all (apart from licensing, if applicable).

I know I got a little carried away by the above (this is one of my pet topics!) and there are a lot more things I could ask you about your environment to point you to the issue. But I'd suggest you drop the the number of vCPUs down if they arent needed. Just makes your hosts work harder and your VMs wait longer.

Great suggestions by the other posters too. We have XEN VDI desktops running on ESX and the performance can be less than stellar. VDI desktops are a different ball game than server VMs and either environment must be designed carefully. NEVER, for the love of God, run your VDI on the same infrastructure as your server workloads.

As if the above isnt enough, read this link for some great information.

DevilWAH · October 2014

WOW outstanding

Again thank you for taking the time to explain all this, I am at heart a network engineer but have a decent understanding of vmware from day to day management and setting up. But your making me feel unworthy

Thats a great post!

I do actually just about get it, although to be honest I have spent an hour or so reading some documents but that just brings it all together very nicely.

As soon as I can get in to the VDI estate that is separate on servers speced up for it by the consultant to look at the performance and tst with a single VCPU.

But mean while I will look at our current server estate, we have 6 hosts with 2 X 4 cores, I know we have 120 servers with about 350 cores in total so I think in terms of subscription that OK, but they range from 1vCPU to 16vCPU with various splits, and looking at some the ready time is harsh.

Am i correct in thinking that, if I have 2 vCPU in the guest and my NUMA are 1 physical CPU with 4 cores, then the scheduler will be able to assign 2 per numa at any one time. But if I have a guest with 3 as well then this would take the entire numa and prevent another guest with its 2vCPU's using the unused resource. And if I had a 6 VCPU guest it will require 2 NUMA's taking up 4 cores of one and 2 of another.

But what about if I assign 16vCPU's to a guest when the server has 2 NUMA's each with 4 cores i assume that in this case it can't assign all at once? so is it that is has to assign vCPUs in groups per NUMA assignment. So vCPU's 1 -4 and 9-12 are assigned to physical NUMA 1 and vCPU's 5-8 and 12-16 are assigned to NUMA 2 for example and then schedules once it can assign a complete NUMA. Or how else does it mange this kind of over provisioning?

Cheers and again thank you so much for the lesson in vmware.

Essendon · October 2014

No problems mate. When I read information coming from the likes of Frank Denneman and Duncan Epping, I get the same feeling. Heck, I ran my VCDX design idea past existing an VCDX and after 20 minutes it felt like I was just scratching the surface. The learning never ends, does it?!

DevilWAH · October 2014

Essendon wrote: »

The learning never ends, does it?!

Thats why we do the job

Essendon · October 2014

Like I said, higher the number of cores a VM has the higher the ready time. If a VM has 16 vCPUs, it's definitely going to have high ready times, that's just the nature of virtualization. What's important is you determine whether they actually need that much grunt. What are the monster VMs being used for?

What version of ESXi/ESX are you running? ESXi 5.x has introduced some nice scheduling improvements over previous versions, vNUMA is one of them. With 4.1, the NUMA architecture was not exposed to the guest VM, consequently allocation of CPU and RAM isn't NUMA aware and this results in sub-optimal performance for VM's with more vCPUs than a NUMA node could provide. With 5.x, the NUMA architectures is exposed to the guest VM with more vCPUs a single node can provide. Such VMs are called wide VMs. The guest OS now knows where locality can be achieved and efficiently schedules its processes to a local NUMA node. This avoids the VM having to go across the NUMA node interconnect bus and significant improvement is achieved.

If you have 2 vCPU in a VM and your NUMA architecture was 4 cores/NUMA node, then the scheduler will be able to fit the VM into one node. It's important for this reason that a VM's vCPUs be multiples of the NUMA node size. And yes you are right about the other calculations you made about fitting differently sized VMs in NUMA nodes. I am not sure if idle pCores aren't allocated to other VMs, though I think the hypervisor will dish them out. ESXi is an incredible piece of engineering!

If a VM has 16 vCPUs, yes it can schedule all of them at once (if the host has available cycles). This is if HT is enabled. If HT is disabled, it will not even let you create a 16 vCPUs (I think). It will allocate 1-4 to NUMAnode 1, 5-8to NUMAnode 2 and so on. This of course depends on whether the host has cycles available to be able to schedule all the VMs demands at once.

On a side note, the DRS scheduler is unaware of how a particular NUMA is doing and will not move any VMs around. A host's CPU scheduler also does not take hyperthreading into consideration. 1 pCPU = 1 pCPU as far as the host's CPU scheduler is concerned, not 1 pCPU = 2 pCPUs as most people believe. HT will only provide at best a 30% improvement in CPU performance, see I said at best, usually it's more like 1% - 10%. For designing a virtual infrastructure or improving one, don't even consider HT.

Hope this helps.

Essendon · October 2014

Strange how this ended up into 2 separate posts, but anyway!

iBrokeIT · October 2014

TLDR - It is important to right size your VMs because over sizing them can actually cause a performance hit

Amazing explanation Essendon, I enjoyed it. Also, VMware Operations Manager can help you "right size" your VMs, find performance issues and fix them. It is a really useful tool if you don't have Essendon's level of knowledge to available on demand.

JBrown · October 2014

DevilWAH

Have you checked to confirm if the CPUs set to High/Maximum Performance in the BIOS ? are these Intel's E54xx or E56xx CPUs by any chance ? i have had similar issues with these CPU models in the past. They are not made for real virtualization environments, IMHO.
What are the server models, btw ? I have about a dozen CISCO C240M3 in my environment.

Essendon · October 2014

@DevilWAH, so how did you go?

DevilWAH · October 2014

Oh you will love this

VMware consultants finished the build and then left site. Because of the performance issue they have not handed it over and will nt give us the credentials to log on and looks like it will be 2 more weeks till they come back!

So £70K of server hard ware and however much Licencing and consultancy and we cant do a thing with it!

If you want an example of how NOT to run a IT project this would be a great one to study. And no its not one of mine

If / when we get access I will get back with what happens, but so far it is just a mess! One bit of advice when the performance is not what was expected, don't send snotty emails to the consultants management team, it does not help matter or encourage them to work with you to fix it!

jibbajabba · October 2014

Oh that is nothing .. A shame I cannot gossip ... But it could be worse ... BELIEVE ME

DevilWAH · October 2014

jibbajabba wrote: »

Oh that is nothing .. A shame I cannot gossip ... But it could be worse ... BELIEVE ME

I haven't scratched the surface of this one

I have worked on poorly managed projects before but!

examiner2111 · October 2014

There is a little known tool called VMware View Planner

https://my.vmware.com/web/vmware/details?downloadGroup=VIEW-PLAN-300&productId=320

There is a lot of documentation to read and takes quite a bit of time to set it up. It is something your consultant should have done. It is a tool that supposedly only should be used by consultants (you might not even be able to download it using your myvmware portal). Pressure them to use this!

In short, it is capable of simulating real world application response for the most common business application. Lync 2013 was not on the list last I checked. But it is still worth it. In the end it pumps out a report. You can manage workloads to see what your breaking point will be etc.

We use View 5.2 and have Lync. I can say that Office 2013 in general has way crappier performance than Office 2010 ever did.

DevilWAH · October 2014

OH heres another issue with there set up.

If I copy a file from the filer to me physical desk top I get about 500mbs constant transfer rate. Not steller but with a 1gig NIC and a heavily utilized filer not to horrendous.

If I copy the same file at the same time to my VID desk top, it starts at 500mbs, then drops to 0 after about 20 seconds and eventually the whole desktops hangs!

I can copy the same file to any physical desk top / server and it goes without a hitch, do the same to any VDI desktops and it fails and crashes, but I am told its my network that is causing all the performance issues.

Essendon · October 2014

What kind of RAID are these desktops on? It might be the network too, how are the fabric switches/interconnects doing?

DevilWAH · October 2014

Essendon wrote: »

What kind of RAID are these desktops on? It might be the network too, how are the fabric switches/interconnects doing?

its running vsan with 2 SSD and 4 Spindle disks (not sure exact setup as like I say we cant see, but its redundancy level 1.

three servers plugs in to one cisco 6500 chassic which is a VSS pair to another chassie with the other three servers in it. All 10G connections and utilisation never hits more than 5-10% of the switch-core links and 10-20% of the switch to switch links (and that including other data)

Like I say the only thing we have issues is is the VDI, our own Vmware estate for servers uses FC and Iscsi with ISCSI running over the same switches and there is no issue here. Indeed our server estate has old severs and runs many more machines than the VDI but proforme so much better.

DevilWAH · October 2014

I know it well could be the network, as each of the ESXi hosts has mutli 10G and 1G connections and I am not sure how they have all been set up on the ESXi side. My side is only as they asked.

Essendon · October 2014

Let us know when you've got the keys to the kingdom and go from there. Curious, were these consultants from VMware? Usually they are good.

Vmware Horizion View isseus

Comments