CAS Failover

pham0329 · December 2010

Just wanted some claification. I have 2 Exchange servers, both hosting the CAS, HT, and MB roles. Both servers are members of a DAG for database failover. All outlook clients are currently connected to Server1, and the RpcClientAccessServer attribute are pointing to Server1 for all the databases.

My boss believes that because the 2 server are members of a DAG, if Server1 goes down, clients can connect to the CAS on Server2 and connectivity will be uninteruppted. Is this correct?

From what I understand, this is not true...and it's what the CAS Array is for. However, since we have a DAG, we can't place the servers in a CAS Array. Now, my question is if Server1 goes down, if I change the RpcClientAccessServer on the databases to Server2, will that do anything? What if I were to create load balancing using DNS Round Robins, would that work?

jibbajabba · January 2011

You will need some sort of loadbalancing .... if both server are in the same domain and both running the CAS role as well, then it depends to which server the clients connect.

For example: https://mail.domain.com/owa can obviously only point to one server. But if this server is going down then the second server would not automatically run from that IP in order to provide connectivity.

DNS Round Robbin really only works for the HT server but for CAS people would get a 404 error on OWA or other connectivcity issues when server1 is down and people would have to hit F5 a few times until round robin points them to the right, working server.

But you are right, using NLB in conjunction with DAG is not a supported scenario. You will need external loadbalancing, for example loadbalancer.org / ZEUS / Radware or even ISA ...

pham0329 · January 2011

So if we don't have a software/hardware load balancing solution, if Server1 goes down, how do I manually redirect clients to Server2?

jibbajabba · January 2011

pham0329 wrote: »

So if we don't have a software/hardware load balancing solution, if Server1 goes down, how do I manually redirect clients to Server2?

Changing the DNS record of the FQDN used on the CAS server.
Depending on the TTL this will obviously take a while to propergate.

There is simply no easy way if both server have all roles installed. If you do need every role to be fault tolerant you will need either more server or indeed hardware loadbalancer - no way around that ..

Essentially it depends how much downtime you are allowed to suffer ...

pham0329 · January 2011

Gomjaba wrote: »

Changing the DNS record of the FQDN used on the CAS server.
Depending on the TTL this will obviously take a while to propergate.

There is simply no easy way if both server have all roles installed. If you do need every role to be fault tolerant you will need either more server or indeed hardware loadbalancer - no way around that ..

Essentially it depends how much downtime you are allowed to suffer ...

Took your advise and created an CAS Array in Exchange, and made an entry in DNS pointing to the primary CAS server, with a TTL of 5 minutes. If Server1 goes down, I'll just point it to Server2

Now that I have this setup, I do have a question though. What's the point of a CAS Array? The CAS Array, by itself, is useless. It doesn't do anything until I point the DNS record to a LB solution, or am I missing something?

When you do the New-ClientAccessArray command, what exactly does that accomplish in Exchange? What would happen if I were to create a LB cluser, add a couple CAS server to it, create a DNS entry for the cluster, but instead of creating a CAS Array in Exchange, I edit the RpcClientAccessServer to point to the cluster directly?

jibbajabba · January 2011

I am quite new to this myself but as far as I understand, without creating a CAS array, adding a second CAS server won't do anything as it will be using the first one in the site only and changing the DNS pointing to it seems to do nothing either as the site wouldn't "know" this CAS server.

Again, that is what "I" understand .. without the power of google I wouldn't put my finger on it 100% though. Maybe someone else as a bit more input.

However, speaking of google ...

So now that we rely even more on the Client Access Servers within an Exchange 2010 infrastructure, clients need to be able to quickly re-connect to another CAS server in case the one they are connected to is down for planned or unplanned reasons. Say hi to the new Client Access array feature in Exchange 2010. A Client Access array is, as the name implies, an array of CAS servers. More specifically, it is an array consisting of all the CAS servers in the Active Directory site where the array is created. So instead of connecting to a FQDN of a CAS server, an Outlook client can connect to the FQDN of the CAS array (such as outlook.domain.com). This makes sure Outlook clients connecting via MAPI are connected all the time even during mailbox database fail and switch-overs (aka *-overs).

Here is how things work in regards to CAS arrays. An Exchange 2010 mailbox database has an attribute called RpcClientAccessServer. When creating a new mailbox database in an Active Directory site where a CAS array has not been created, this attribute will be set to the first CAS server installed in the AD site. You can see what this attribute is set to by running the following command:

If a CAS array exists in the AD site when you create a new Mailbox database, this attribute will automatically be set to the FQDN of the CAS array. This is so the CAS array on the Client Access server knows which Mailbox server and database a user should be directed to.

When the CAS array has been created you should create an “A record” in your internal DNS named outlook.domain.com pointing to the virtual IP address of your internal load balancing solution.

Windows NLB can still be used in conjunction with a CAS array as long as the Mailbox server role is not installed on the same machine and that any mailbox databases on the server are not protected via a Database Availability Group (WNLB and clustering have some sharing conflicts that makes this an unsupported scenario). You can, of course, also choose to use a CAS array in conjunction with an external hardware load balancer, which is the recommended approach especially if you have more than 8 CAS nodes.

If you use WNLB it is just a matter of creating the WNLB cluster and pointing the DNS record at the WNLB VIP and make sure that TCP port 135 (EndPoint Mapper) and the dynamic RPC port range (TCP 1024-65535) are added to the port rules list.

Uncovering the new RPC Client Access Service in Exchange 2010 (Part 1)

and

New-ClientAccessArray: What is it? - Federated Infrastructure - Site Home - MSDN Blogs

and

Configuring Client Access Array for Exchange 2010 – Walkthrough How to MS Exchange

A ClientAccessArray is new to Exchange 2010 and simply represents a set of Exchange 2010 servers with the CAS role installed that are load balanced in some fashion.

Exchange 2010 RPC Client Access Service and the ClientAccessArray | Kraft Kennedy | Technology Blog

pham0329 · January 2011

adding a second CAS server won't do anything as it will be using the first one in the site only and changing the DNS pointing to it seems to do nothing either as the site wouldn't "know" this CAS server.

But what if you were to set the RpcClientAccessAttribute to point to a DNS entry that points to a load balance cluster of CAS server.

For example:

I create a NLB cluster of CAS server, and a DNS entry name cas.example.com that points to the cluster.

Now, instead of creating a CAS Array, I go to powershell and do

Set-MailboxDatabase DB1 -RpcClientAccessServer cas.example.com

Wouldn't that work as well? When user access cas.example.com, the NLB cluster would handle the load balancing.

jibbajabba · January 2011

pham0329 wrote: »

But what if you were to set the RpcClientAccessAttribute to point to a DNS entry that points to a load balance cluster of CAS server.

For example:

I create a NLB cluster of CAS server, and a DNS entry name cas.example.com that points to the cluster.

Now, instead of creating a CAS Array, I go to powershell and do

Set-MailboxDatabase DB1 -RpcClientAccessServer cas.example.com

Wouldn't that work as well? When user access cas.example.com, the NLB cluster would handle the load balancing.

What I understand though, is that once you install a CAS server, the object in the schema basically points to that particular CAS server only and if you then install a second server, which has mailbox / cas and HT, only HT and MB "adds" itself to the schema, so the organization would recognize these roles and uses the site costs of the AD in order to determine which HT server it is using for example. And as far as I know, also uses the closest MB server when creating a new mailbox, unless you specify a specific one.

So to my understanding, even if you add a second CAS role - the organization would not know about it, because think about it - it pretty much is just an IIS webpage / application (in a way) ...

So, even if you say that the cas.domain.com is used by a specific mailbox server, I can IMAGINE that you'd get connection issues if the user hits the second CAS server UNLESS you create a CAS array.

Which again, isn't supported when using DAGs on the server anyway ...

But again - I only started studying myself for Exchange - but this is how I understand all those KB I read for this thread here ....

I really think there is no way of avoiding any third part LB solution if you have to have both CAS and MB server on the same two server and still want to accomblish automatic failover - even if it means changing DNS ...

But easy to test - install virtual machines and test it - you don't need any proper certificates or DNS for testing ... you can still use exchange within the organization - so install a server, make it an AD / DNS server, install two Exchange and done .. should be very easy to test ...

pham0329 · January 2011

Gomjaba wrote: »

What I understand though, is that once you install a CAS server, the object in the schema basically points to that particular CAS server only and if you then install a second server, which has mailbox / cas and HT, only HT and MB "adds" itself to the schema, so the organization would recognize these roles and uses the site costs of the AD in order to determine which HT server it is using for example. And as far as I know, also uses the closest MB server when creating a new mailbox, unless you specify a specific one.

So to my understanding, even if you add a second CAS role - the organization would not know about it, because think about it - it pretty much is just an IIS webpage / application (in a way) ...

So, even if you say that the cas.domain.com is used by a specific mailbox server, I can IMAGINE that you'd get connection issues if the user hits the second CAS server UNLESS you create a CAS array.

Which again, isn't supported when using DAGs on the server anyway ...

But again - I only started studying myself for Exchange - but this is how I understand all those KB I read for this thread here ....

I really think there is no way of avoiding any third part LB solution if you have to have both CAS and MB server on the same two server and still want to accomblish automatic failover - even if it means changing DNS ...

But easy to test - install virtual machines and test it - you don't need any proper certificates or DNS for testing ... you can still use exchange within the organization - so install a server, make it an AD / DNS server, install two Exchange and done .. should be very easy to test ...

The CAS Array itself is supported by the DAG, it's the NLB Cluster that's not supported...or so I understand haha. So even if I have a DAG, I can still create a CAS Array and assign the array to the mailbox server. It's just that when something happens, I have to manually redirect the DNS record to a 2nd CAS as oppose to having it down automatically by the NLB.

And as far as I know, also uses the closest MB server when creating a new mailbox, unless you specify a specific one.

I believe that Exchange has some sort of equation/way of calculating the best mailbox DB for new databases, and it's not necessarily the closest MB server.

I've spent a good chunk of my time at work setting up the virtual machine for testing...I'll update this with how it goes.

jibbajabba · January 2011

pham0329 wrote: »

The CAS Array itself is supported by the DAG, it's the NLB Cluster that's not supported...or so I understand haha. So even if I have a DAG, I can still create a CAS Array and assign the array to the mailbox server. It's just that when something happens, I have to manually redirect the DNS record to a 2nd CAS as oppose to having it down automatically by the NLB. .

Sure is, but you said "instead of creating a CAS array"

royal · January 2011

Ok, this is exactly how it works. A CAS Array is simply an object in AD that gets created with an FQDN that doesn't match an actual server name. For example, RPCArray.domain.com. This FQDN then points to a Virtual IP that will get load balanced to CAS Servers in a specific site. I say a specific site due to the fact that when creating a CAS Array, you specify the AD Site. Load Balancing CAS Servers should only occur for a given AD Site and not load balanced across sites unless you are using a DNS Load Balanced solution such as F5 GTMs which should also be used alongside service based load balancers such as F5 LTMs.

The RPC Array gets stamped on databases that will be typically housed in that given site. When a user creates a profile or gets moved to a database, it will use the RPC Array FQDN assigned to their database as their Exchange Server. If you look at the Account Properties in Outlook, the RPC Array will show as the Rpc Array FQDN. Again, the DNS record for the RPC CAS Array will point to a VIP which then gets load balanced across the CAS Servers. So if 1 CAS Server goes down, your MAPI RPC traffic gets sent to another CAS in that load balanced configuration.

In the situation you do not have load balancing, what you can do is point the RPC Array FQDN to 1 server and if that goes down, manual intervention is required to repoint your A record to the other CAS Server. In this situation, a 5 minute TTL is recommended so clients get the updated IP relatively quickly.

A note about the RPC CAS Array FQDN, it should never be the same as the Outlook Anywhere FQDN. The reason for this is because when you're outside the network, you don't want Outlook to be able to resolve the Exchange ServefFQDN specified in the Outlook properties. If it resolves, Outlook will spend at least 1 minute trying to connect using RPC to that FQDN. Because you're outside the network, RPC will fail and you'll wait 1+ minutes trying to connect via RPC till it finally falls back to Outlook Anywhere. Now if you have a separate FQDN for Outlook Anywhere and the RPC Array, the RPC Array FQDN outside the network will fail and Outlook will immediately fall back to Outlook Anywhere without any delay.

Check out some blog articles I wrote here:
Exchange 2010 RTM High Availability Load Balancing Options | Elan Shudnow's Blog

Exchange 2010 RPC Client Access Service and Multiple Sites | Elan Shudnow's Blog

Exchange 2010 Databases and the RPCClientAccessServer Database Parameter | Elan Shudnow's Blog

CAS Failover

Comments