Standby Server?

paintb4707 · December 2007

Well the money man came up to me yesterday and says that he wants the company to have a "back up" server that would essentially take over the role of another server if one were to crash.

I explained to him that I don't believe it would be possible for another server to just automatically take over the role as a temporary exchange server for example if the original one were to crash

Now with the new image-based backups that we plan to invest in, I suggested however that we could buy a "standby" server that would pretty much collect dust until a hardware issue would arrive. At that point, I could then restore the imaged back-up to the standby server and we would be back in business. The only catch is, the only reason for having this standby server would be for a hardware related issue in which case we were stuck in water waiting for a part to be delivered. Software issues could easily be resolved by restoring a backup from the day prior.

Now I know that when you pay for off-site storage, they usually guarantee your server uptime. In case of a hardware failure outside of business hours (possibly inside as well), I believe the off-site center would be responsible for restoring the back up to the standby server.

My company wants virtually zero downtime, as we will soon be implementing an EDI system that will be completely real-time. We may have to start back-ups in hourly intervals.

What do you guys think of my standby server idea? Any suggestions or possibly a better solution?

BeaverC32 · December 2007

What type of server is this? More information is needed based on the technology that is used. Active/Active and Active/Passive redundancy is done very often for BCP.

Mishra · December 2007

http://www.doubletake.com/

paintb4707 · December 2007

BeaverC32 wrote:

What type of server is this? More information is needed based on the technology that is used. Active/Active and Active/Passive redundancy is done very often for BCP.

We have an Exchange server which is the PDC that also serves DNS and DHCP and then we have a file server/SQL database for inventory which will soon be merged with Microsoft Financials.

Mishra wrote:

http://www.doubletake.com/

I don't think this is what we're looking for. If a software related issue occurred it would just be replicated to the target server. I'd basically still end up restoring a backup from a day prior.

blargoe · December 2007

I hope you have more than one domain controller.

Personally, I'd have two dedicated servers for Active Directory/DNS/DHCP, a clustered exchange server, and clustered SQL server, if they want virtually ZERO downtime. Restoring from backup or images/snapshots should be a last resort.

paintb4707 · December 2007

blargoe wrote:

I hope you have more than one domain controller.

Personally, I'd have two dedicated servers for Active Directory/DNS/DHCP, a clustered exchange server, and clustered SQL server, if they want virtually ZERO downtime. Restoring from backup or images/snapshots should be a last resort.

Nope, the Exchange server is the only domain controller.

Why do you prefer two servers for AD/DNS/DHCP?

dynamik · December 2007

paintb4707 wrote:

Why do you prefer two servers for AD/DNS/DHCP?

Because if it fails, you're totally hosed.

iowatech · December 2007

BeaverC32 wrote:
What type of server is this? More information is needed based on the technology that is used. Active/Active and Active/Passive redundancy is done very often for BCP.

We have an Exchange server which is the PDC that also serves DNS and DHCP and then we have a file server/SQL database for inventory which will soon be merged with Microsoft Financials.

Mishra wrote:
http://www.doubletake.com/

I don't think this is what we're looking for. If a software related issue occurred it would just be replicated to the target server. I'd basically still end up restoring a backup from a day prior.

Double Take is exactly what you were asking for. It does instant data recovery in the event of a server fail. It's basically server mirroring using byte to byte server replication on a continuing basis.

And also for someone to ask that you have "24/7 uptime" then put all their eggs in one basket on one server is asinine. I would tell them immediatly to spread that out over AT LEAST 2, if not 3 servers.

seuss_ssues · December 2007

We run a DC and a cluster of servers at our main site and a DC and a cluster of servers at our DR site which is 200 miles away.

We can pretty much lose our primary site and several nodes from our DR cluster and still be fine.

As everyone has already mentioned if your DC goes down your pretty much screwed.

Smallguy · December 2007

blargoe wrote:

I hope you have more than one domain controller.

Personally, I'd have two dedicated servers for Active Directory/DNS/DHCP, a clustered exchange server, and clustered SQL server, if they want virtually ZERO downtime. Restoring from backup or images/snapshots should be a last resort.

this is exactly what you want for the "money Man's" recommendations but this also costs money.

Zero downtime definitely will cost you

I did a lot of research into this about a year ago and it is not cheap.. in the end my boss implemented a bastardized solution

do your research....I found Msexchange.org really good for exchange clustering information

I'm sure SQL server has similar requirements.

I would definitely let him know before you move to the EDI that currently all your eggs are in one basket and cover yourself.

paintb4707 · December 2007

iowatech wrote:

BeaverC32 wrote:
What type of server is this? More information is needed based on the technology that is used. Active/Active and Active/Passive redundancy is done very often for BCP.

We have an Exchange server which is the PDC that also serves DNS and DHCP and then we have a file server/SQL database for inventory which will soon be merged with Microsoft Financials.

Mishra wrote:
http://www.doubletake.com/

I don't think this is what we're looking for. If a software related issue occurred it would just be replicated to the target server. I'd basically still end up restoring a backup from a day prior.

Double Take is exactly what you were asking for. It does instant data recovery in the event of a server fail. It's basically server mirroring using byte to byte server replication on a continuing basis.

Not to be rude but I'm having a hard time seeing how this is so useful. If a file corruption were ever to occur it would be replicated to the target server. Now you have two failed servers. What would be the point of that? The only reason I would see having this is because its automated, but as I mentioned above there are off-site storage centers that would handle your backups remotely. They could restore a backup to the standby server if ever a failure with the original one and it would theoretically serve the same purpose wouldn't it? It seems like more of a hassle dealing with two servers in replication opposed to one on its own.

paintb4707 · December 2007

dynamik wrote:

paintb4707 wrote:

Why do you prefer two servers for AD/DNS/DHCP?

Because if it fails, you're totally hosed.

Another question. How can I spread the load between two servers for the same role?

iowatech · December 2007

Not to be rude but I'm having a hard time seeing how this is so useful. If a file corruption were ever to occur it would be replicated to the target server.

I'm not taking that as rude it's a valid question, however that's an extreme case in which you would have to rely on tape back up to restore a valid copy of the server as a last resort. Which 99% of the time you'll never have to worry about that as corruption "in most cases" is an extreme rarity.

Another question. How can I spread the load between two servers for the same role?

Just run DCPROMO on the other servers if you want to have multiple DC's it will bring up the wizard and asks all the questions about if you want to add this server to an existing domain blah blah. Then by default it will install DNS on the server unless you have your own DNS cluster already thats not intergrated into the domain controller. From the sounds of it though you don't have that. DHCP will be just be on one server though. It's pretty straight forward.

paintb4707 · December 2007

iowatech wrote:

Not to be rude but I'm having a hard time seeing how this is so useful. If a file corruption were ever to occur it would be replicated to the target server.

I'm not taking that as rude it's a valid question, however that's an extreme case in which you would have to rely on tape back up to restore a valid copy of the server as a last resort. Which 99% of the time you'll never have to worry about that as corruption "in most cases" is an extreme rarity.

Corruption was just one of the many examples. Pretty much any software related issue would be replicated and then we have two useless servers. For example, last Friday first thing when I walk in the door I have 5 people telling me the email is down. First thing I usually do is restart all the Exchange services, low and behold when restarting the SA it didn't want to start. It was stuck in "starting", so then after rebooting the Exchange server it didn't even want to log in. It was stuck at "applying computer settings" which I correctly assumed was because the SA was not starting. So basically I had to go into safe mode to disable all the Exchange services just so I could do a normal boot. From there I was completely clueless considering anything I could find related to the the errors from event viewer were not helpful at all, I had to call Microsoft and spend about 2 or 3 hours on the phone with them troubleshooting an issue that I couldn't even explain how it happened.

I think if image based backups only took 15 minutes to restore, it would probably be much feasible in my entry level shoes to just restore a backup rather than fiddle with it for 3 hours trying to figure it out on my own while half the company is twiddling their fingers. You may or may not know I'm the only IT guy here for my company.

I mean, is there any reason why NOT to restore a backup in 15 minutes every time there was an issue? I know its running around the problem opposed to facing it but like I said above, downtime cannot effect this company.

dynamik · December 2007

iowatech wrote:

Just run DCPROMO on the other servers if you want to have multiple DC's it will bring up the wizard and asks all the questions about if you want to add this server to an existing domain blah blah. Then by default it will install DNS on the server unless you have your own DNS cluster already thats not intergrated into the domain controller. From the sounds of it though you don't have that. DHCP will be just be on one server though. It's pretty straight forward.

Yep. You can also put the DHCP service on the other server for fault tolerance as well. MS recommends the 80/20 rule for the distribution of addresses. You'd put the same range on both DHCP servers and exclude 20% of the addresses on your primary server and then exclude 80% on your secondary server. As long as you setup your exclusions correctly, so you don't have the machines assigning the same IP addresses, you'll be good. That way, if your primary server goes down, you'll still have 20% of the scope to cover any new leases or renewals. Since the default lease is 8 days, you will most likely get the primary server back up before a great deal expire.

I guess some of us are having a hard time understanding exactly what files you need replicated between machines. All of the server software, such as Exchange, SQL, Sharepoint, etc., has very specific ways of providing fault tolerance. It's not as simple as just copying files between servers. You can also look at Shadow Copies of Shared Folders for storing previous versions of files. You can allocate a certain amount of disk space to store previous versions as well as the schedule for when to run the file comparison.

Also, what size are your disks? I know I couldn't do an image restore in 15 minutes with our data. Exchange, DNS, DHCP, etc. all have their own backup mechanisms. If you keep up with regular backups of those, you should be able to just restore that specific service instead of having to image the entire system.

iowatech · December 2007

I mean, is there any reason why NOT to restore a backup in 15 minutes every time there was an issue? I know its running around the problem opposed to facing it but like I said above, downtime cannot effect this company.

Then make them purchase two sites and/or sets of servers with the exact SAME software running on the EXACT same hardware independent of one another and just update them manually and log each change that occurs so you can mirror them yourself as needed once you know the changes are correct and do not effect the primary site. This provides the most extreme version of fault tolerance which is a hot site. There is not much else to offer here.

Corruption was just one of the many examples. Pretty much any software related issue would be replicated and then we have two useless servers. For example, last Friday first thing when I walk in the door I have 5 people telling me the email is down. First thing I usually do is restart all the Exchange services, low and behold when restarting the SA it didn't want to start. It was stuck in "starting", so then after rebooting the Exchange server it didn't even want to log in

Again that's because your entire company is running on one server.

nel · December 2007

We have an Exchange server which is the PDC that also serves DNS and DHCP and then we have a file server/SQL database for inventory which will soon be merged with Microsoft Financials.

Doesnt look like the money men have spent much so far! Is the company you are at a small business? because sadly, if you want near zero downtime you have to pay big money for that.

Personally, I'd have two dedicated servers for Active Directory/DNS/DHCP, a clustered exchange server, and clustered SQL server,

This is the method i would also use. sadly it costs quite a bit. You could purchase a nice large san to store them on and cluster all your important services.

But if you are a small business clustering maybe out of your budget. and also dont forget about your backup's because some people get caught with the client agents costs if you backup multiple hosts to an lto library for example.

paintb4707 · December 2007

nel wrote:

We have an Exchange server which is the PDC that also serves DNS and DHCP and then we have a file server/SQL database for inventory which will soon be merged with Microsoft Financials.

Doesnt look like the money men have spent much so far! Is the company you are at a small business? because sadly, if you want near zero downtime you have to pay big money for that.

Well I don't necessarily blame the financial guy for this but the consultants that they used before me apparently weren't very informative and/or knowledgeable. I knew that right off the bat when I discovered that there were no mailbox limits set and several users with 3gb boxes.

Personally, I'd have two dedicated servers for Active Directory/DNS/DHCP, a clustered exchange server, and clustered SQL server,

This is the method i would also use. sadly it costs quite a bit. You could purchase a nice large san to store them on and cluster all your important services.

But if you are a small business clustering maybe out of your budget.

How much would something like this cost to implement? We are a small business (~100 users) but *supposedly* the company would lose 20k in payroll for 5 hours of downtime so cost may not be so much of a concern.

nel · December 2007

To be honest, ive never had the honour to actual put together a purchase like this so i cant say for sure. Maybe another engineer could point him in the right price bracket?

If no one can, you could write a spec for your project and take it to a company and ask them for a quote. Then at least you will get some good experiance in documenting and the technical side of the build you want. dont forget to budget everything and a little more for licenses, disks etc. Whilst his willing to flash the cash maybe you could get a domain or exchange upgrade (if needed) to plan for the long haul and it would give you some great experiance whilst you are doing it all. just some idea's really.

btw, are you the one heading the project by yourself or are you working alongside someone?

sprkymrk · December 2007

I think if image based backups only took 15 minutes to restore, it would probably be much feasible in my entry level shoes to just restore a backup rather than fiddle with it for 3 hours trying to figure it out on my own while half the company is twiddling their fingers.

Image based backups are not the 100% answer either though. Here are a few problems:

1. If your hardware is the problem, you are down until you can replace it.
2. I've never seen a 15 minute restore of a PC\Workstation, let alone a server. By the time you restore the data you've got a lot more time involved, especially if it's a file server. If you say the data is included with the image, how many times a day did you create the image? Most image software needs to be booted from outside the OS in order to correctly image everything, which can't be done while your server is up an running.
3. If you've got file corruption, and all you do is restore an image, how do you know the same thing won't happen again? Many times file corruption comes from bad hard drives or programs with memory leaks, which can take a long time to show up.

So really the best of both worlds is a combination of hardware redundancy (like mirrored servers, which can also increase performance substantially), multiple levels of backups, and the ability to quickly reimage a machine if need be. Do as much as you can afford financially and that you have the ability to implement technically.

SWM · December 2007

I look after a similar size network. 1 Server for Exchange, 1 for SQL, 1 for File and Print and 1 ISA firewall. All servers are running Backup Exec System recovery creating images every 2 hours to a storage PC/Server.

This storage PC is in a fireproof room along with a spare (just made redundant server). Backup Exec System Recovery allows both files and folders to be restored from incremental images as well as a Baremetal restore onto dissimilar hardware.

If any one of my servers die's from either OS corruption or hardware failure, I can make the call either to attempt to repair the OS problem or "pull the plug" and restore a entire server image onto my spare server.

Yes you still have issues if your image data also contains the corruption, but if thats the case I just roll back further through the images until I find a working image. We also use offsite tape backups and a overnight data copy link to a remote site incase of theft etc.

I dont think any one system is perfect, but several used together are pretty effective. BESR has got me out of trouble at several sites and depeding upon the size of data on a server, I can have a image restored onto a completely different server up and running with in 1-2 hours.

just my 2c worth

Sie · December 2007

Unless you start clustering would I might do is when a fault occurs kick off the image on the 'spare' server.

Whilst this is running troubleshoot the primary server, if the image is complete before you've fixed it do the swap then.

You then have the main server to still troubleshoot after and will least downtime that way.
(excluding clustering etc)

Not ideal but just my 2Ps worth from what it sounds like you want to do / do it with.

Personally where I work we have a primary site and a DR site and clustered boxs on both.
Its usually the network devices that cause the problems!

Dont forget to take that into concideration when looking at DR solutions.

paintb4707 · December 2007

SWM wrote:

If any one of my servers die's from either OS corruption or hardware failure, I can make the call either to attempt to repair the OS problem or "pull the plug" and restore a entire server image onto my spare server.

Sounds exactly like what I was originally planning to do. Except I'd restore the images into a virtual environment if ever a rare case occurred where both servers went down.

Sie wrote:

Unless you start clustering would I might do is when a fault occurs kick off the image on the 'spare' server.

Whilst this is running troubleshoot the primary server, if the image is complete before you've fixed it do the swap then.

You then have the main server to still troubleshoot after and will least downtime that way.
(excluding clustering etc)

Not ideal but just my 2Ps worth from what it sounds like you want to do / do it with.

Personally where I work we have a primary site and a DR site and clustered boxs on both.
Its usually the network devices that cause the problems!

Dont forget to take that into concideration when looking at DR solutions.

Not a bad idea at all.

Additionally, I'd like to do as blargoe suggested and move AD/DNS/DHCP off our Exchange server. I never actually considered the problems that would follow if the only domain controller went down. I just tested it in a lab environment and... nothing can communicate!

Mishra · December 2007

paintb4707 wrote:

iowatech wrote:

Not to be rude but I'm having a hard time seeing how this is so useful. If a file corruption were ever to occur it would be replicated to the target server.

I'm not taking that as rude it's a valid question, however that's an extreme case in which you would have to rely on tape back up to restore a valid copy of the server as a last resort. Which 99% of the time you'll never have to worry about that as corruption "in most cases" is an extreme rarity.

Corruption was just one of the many examples. Pretty much any software related issue would be replicated and then we have two useless servers. For example, last Friday first thing when I walk in the door I have 5 people telling me the email is down. First thing I usually do is restart all the Exchange services, low and behold when restarting the SA it didn't want to start. It was stuck in "starting", so then after rebooting the Exchange server it didn't even want to log in. It was stuck at "applying computer settings" which I correctly assumed was because the SA was not starting. So basically I had to go into safe mode to disable all the Exchange services just so I could do a normal boot. From there I was completely clueless considering anything I could find related to the the errors from event viewer were not helpful at all, I had to call Microsoft and spend about 2 or 3 hours on the phone with them troubleshooting an issue that I couldn't even explain how it happened.

I think if image based backups only took 15 minutes to restore, it would probably be much feasible in my entry level shoes to just restore a backup rather than fiddle with it for 3 hours trying to figure it out on my own while half the company is twiddling their fingers. You may or may not know I'm the only IT guy here for my company.

I mean, is there any reason why NOT to restore a backup in 15 minutes every time there was an issue? I know its running around the problem opposed to facing it but like I said above, downtime cannot effect this company.

To address your questions, here are a few answers. I am only going to discuss Exchange failover right now. I'll let the others describe anything else.

If your boss wants 100% uptime then you have more than just the servers to consider. Here are some things to think about and i'm probably not even touching all of it

1) Multiple power sources
2) Multiple internet connections (including redundancy in networking gear including the switches the exchange servers plug into)
3) Multiple MX records
4) 2 or more physical sites
5) redundancy in your AD/DHCP/DNS/WINS environments
6) multiple exchange servers
7) Exchange front end servers

test and development servers

And so on... To guarantee that there will not be an outage takes a lot of work and money to implement. Now there are percentages in how much risk you are reducing in every option but your boss has to understand that 100% uptime is hard to achieve.

Now to touch on the few problems you have brought up. Doubletake would have taken care of your services dying out as it would have failed over to the other server which would have been ready to go. This is usually the most common Exchange problem.

In Exchange 2003, if you have a database get corrupted (can be common as someone mentioned that it usually is not) you should have multiple storage groups setup to move your users onto. Once the database is corrupt, move your users to the other storage group so they have a working mailbox. Then run your repair tools on that storage group.

To add onto corrupted database problems, you should have a SAN that your mail resides on, and a failover SAN as well (if you are shooting for uptime). Implement snapshots on your SAN and you can recover data quickly once a database fails.

paintb4707 · December 2007

Mishra wrote:

Now to touch on the few problems you have brought up. Doubletake would have taken care of your services dying out as it would have failed over to the other server which would have been ready to go. This is usually the most common Exchange problem.

What other software related issues could Doubletake prevent? I read that it works off the Best Practices Scanner but how would it failover an issue like that since its not always necessarily related to the Exchange services?

And btw, I just wanted to thank you all for all the suggestions and advice so far. This thread has been very informative.

Standby Server?

Comments