Hyper-V R2 iSCSI Pass-thru Disk Performance

I'm working with a client and ran into a puzzling issue with disk performance using pass-thru disks and iSCSI on Hyper-V R2. Here's the setup:

2 Hyper-V R2 standalone physical servers (H1 and H2)
All storage is iSCSI using MCS for load balancing to an EMC Celerra
VHDs for virtual OS drives
Pass-thru disks for other storage

2 virtual Exchange 2007 mailbox servers (MB1 and MB2)
1 per physical host
Using CCR for high availabilty
OS drive is a VHD
Separate log and DB luns connected as pass-thru disks (PTD)

Everything seemed to be fine until yesterday when I needed to move some files around and the copy was terribly slow. I was copying from PTD to PTD and getting 8 MB/s throughput. I figured it was a link speed mismatch since that's about the speed of a 100Mbps connection, but everything was correctly connecting at 1 Gbps. I also verified on the physical host that both iSCSI links were carrying traffic and MCS was configured properly. We decided to test some other copy scenarios to guage the throughput. Monitoring the network activity showed utilization levels that change with the copy speed. Higher copy speeds = Higher link utilization

Inside Virtual server:
File share (LAN) to PTD - 100 MB/s
VHD to PTD - 50 MB/s
PTD to PTD - 25 MB/s

On Physical server:
LAN to iSCSI Lun - 100 MB/s
iSCSI to iSCSI - 300 MB/s
Local disk to iSCSI - 300 MB/s

Any guesses as to why the PTD to PTD copy performance is so slow? We did a test backup against a replica of the public folder database (I know PFs are supported in CCR, we'll move it when we get another virtual server built) and the iSCSI link utilization was about 10% which is the same slow utilization we see with the 25 MB/s throughput. That kind of performance just isn't going to work

Next we are going to advertise a lun directly to a test virtual server using the iSCSI initiator inside the VM and test the speed. I'm used to using the iscsi initiator inside a vm on VMWare because ESX doesn't load balance iSCSI connections. We'll see how that turns out, but does anyone have any thoughts in the meantime?

Find more posts tagged with

Comments

Hyper-Me

Did you attach the pass through disks to the SCSI bus for the VM or the IDE bus?

I think i read somewhere that pass-throughs, especially for SQL use will suffer bad if left on the IDE bus.

Claymoore

Each pass-thru disk is on its own virtual SCSI controller.

Maybe I should test this with PTDs on the same virtual SCSI controller. I'm also running JetStress against these disks to see if the poor copy throughput is an indication of overall poor performance or just some strange quirk.

Claymoore

Curiouser and curiouser...

We did some copies on a different server using VHDs hosted on fibre channel and got similar VHD performance. We also tested copies between PTDs on the same virtual SCSI adaptor and got marginally better speeds (~45 MB/s). I ran three different JetStress tests, increasing the thread count each time, and the server passed with flying colors.

This seems to be a weird copy issue on the VMs and not an indication of poor underlying disk performance. Since we don't regularly copy data from one disk to another on Exchange (we were moving log files to a larger drive to support a migration without turning on circular logging), this copy issue won't be a problem.

kejs

Im experiencing the same speed issue.
I have windows 2008 r2 guest installed on ptd as mail server (exchange 2010)
ptd is iscsi disk attached to ide controller in hyperv.
this is a test environment. I get 15 MB/s when copying from ptd to ptd.
i wonder if this will impact my preformance when we go to production ? i dont have a lot of users, 80-100 but they do have 10gb pst files each waiting to be transfered to exch db.
copying form lan to ptd works as expected on 1 gbps link.

have you learned why is this happening since your original post?

thanks.

Claymoore

It was a SAN problem, not a Hyper-V problem. The SAN admins mistakenly set up the volume set as concantenated disks rather than striped disks. Essentially we were only writing to one spindle until it filled up, and then we would write to the next disk. They created a new striped set and I moved the data to those disks, but I stopped using PTDs and went with fixed disks on the new setup. PTDs were unnecessarily complicating things without any real performance benefit.

I think the 32GB of cache in their EMC Symmetrix was responsible for our passing grades when we ran JetStress. If you haven't already, download and run Jetstress to test out the disk performance
Microsoft Exchange Server Jetstress 2010
Download details: Microsoft Exchange Server Jetstress 2010 (64 bit)

Also, keep in mind that you are running Exchange 2010 which has about a 70% reduction in disk I/O from Exchange 2007. You will probably be fine, but check out your SAN and run JetStress to be sure.

kejs

SAN looks fine to me, raid 10, should fly.
Jetstress does not pass.
I installed entire guest OS and exchange on PTD, maybe that was my error.
Since now i can only use ide and not scsi controller on PTD that could maybe cause slow performace.
Migrating OS partition to fixed disk and exchange db to ptd now will be a major pain

Anyhow, thanks for your input.

Claymoore

An Exchange 2010 database is designed to work on a single SATA spindle. If Jetstress 2010 is failing in your environment, something is seriously wrong.

We were able to eliminate iSCSI as a suspect by testing connections to the same disk group over fibre. Can you test with fibre to be sure it is not a problem with the SAN?

Eliminate the guest VMs by copying files from disk to disk on the host server. Is there a performance difference?

For your iSCSI connections, do you have separate NICs or ports for the iSCSI links? Is Multiple Connection Sharing (MCS) enabled on the iSCSI initiator? Are you using plain NICS, NICs with TCP Offload Engines, or iSCSI HBAs? If you are using TOE NICs, is the TCP chimney enabled?

I recommend you use fixed disks rather than pass-thru disks. Exchange 2010 uses DAGs instead of shared disk clustering so Exchange does not require direct access to the disk. PTDs just complicate management without providing any measurable increase in performance.

kejs

i dont think its a problem with SAN, other LUNs work fine, and when and a stupid eye test tells me that all leds are blinking when i copy something from lan to ptd, and that is around 60 MB/s on 1 Gbp/s link, which is fine, since i have heavly load on low end Gigabit switch. ill leave that test for last.
host server is fine, i get around 200 MB/s from partition to partiton
I may have found where i messed up. i use 1 nic for all hyperv guest traffic(1 Gbps), 1 nic for host computer domain(100 Mb/s) and 1 nic for iSCSI(100 Mb/s), the last nic with iSCSI attached should be on 1 Gb/s at least. as i see it, 7 MB/s copy from PTD to PTD could reflect that mistake.
Must buy a soho switch today to test it, have no more gigabit ports from SAN room to my lab.

anyway, learned a lot during the process so i dont complain

MCS is set to round robin
tcp chimney is enabled on windows server side and as i could find out...SAN supports it out of the box, not verified.

As for fixed disks i can now agree..
been trying to migrate the current installation yesterday, for some reason it does not work if i use disk2vhd from guest machine or from the phy disk itself. When the os boots up from .vhd, there is no nic. not even legacy nic shows up.
will try with vmm trial.

Claymoore

kejs wrote: »

I may have found where i messed up. i use 1 nic for all hyperv guest traffic(1 Gbps), 1 nic for host computer domain(100 Mb/s) and 1 nic for iSCSI(100 Mb/s), the last nic with iSCSI attached should be on 1 Gb/s at least. as i see it, 7 MB/s copy from PTD to PTD could reflect that mistake.
Must buy a soho switch today to test it, have no more gigabit ports from SAN room to my lab.

anyway, learned a lot during the process so i dont complain
MCS is set to round robin
tcp chimney is enabled on windows server side and as i could find out...SAN supports it out of the box, not verified.

Looks like you found it. A 100Mb/s link will give you around 8MB/s transfer rate at maximum. Once you get up around 300Mb/s your transfer rate is about as fast as a local disk so a 1 Gb/s link is really what you need. For heavy iSCSI use in a virtual environment, use 10 Gb links.

If this is a lab you can get away with a single link, but you must have multiple iSCSI links in production. These links need to run through different switches for redundancy as well. If something happens to that iSCSI link it would be like pulling the IDE cable on a local drive while it is in use. You can guarantee something will be corrupted.

Also get switches with redundant power supplies. A redundant power supply adds an entire 9 to your uptime and is the cheapest HA investment you can make. I can tell you many stories about devices with single power supplies, single power channels or redundant power supplies plugged into the same channel, but they all end in something being shut down unintentionally.

kejs

Yes, that was that.
With this new soho switch i get 55-65 MB/s PTD to PTD.

As for shutting down things...i know, i accidentally unplugged a few

)

ok, valuable information, i havent thought about multiple iscsi links, but it really doesnt cost much, so i will implement 1 Gbps failover for starters.
once again thanks for your help.