Backup Solution Puzzle.. Help!
Hello guys,
Just looking for a second, third and forth oppinion on a problem i have. Basically i've got 5 sites all linked together by a slow internet connection. Each site deals with video data so they are writting 100Gb a week or so on to RAID-6 8TB Storage arrays, i have two tasks to achieve
1. Backup all data from the DC's
2. Create a central store for all the data.
So in an ideal world i would pull all the data from the other sites over the internet to my central file server and just back that up and bingo!..... but like i said they all are linked together with slow internet links.
So i was thinking i could get a single tape drive for each site, get the users to perform an incremental backup on the tapes every week then get them sent them to me, i'll stick them in an auto-loader and restore them to the central location.
Thats going to take ages and be boring as hell waiting for the data to restore from the other 4 tapes... is that my only option? can anyone think of another way i could do it?
Thanks in advance for any responce because my brain is fryed thinking about it!
Just looking for a second, third and forth oppinion on a problem i have. Basically i've got 5 sites all linked together by a slow internet connection. Each site deals with video data so they are writting 100Gb a week or so on to RAID-6 8TB Storage arrays, i have two tasks to achieve
1. Backup all data from the DC's
2. Create a central store for all the data.
So in an ideal world i would pull all the data from the other sites over the internet to my central file server and just back that up and bingo!..... but like i said they all are linked together with slow internet links.
So i was thinking i could get a single tape drive for each site, get the users to perform an incremental backup on the tapes every week then get them sent them to me, i'll stick them in an auto-loader and restore them to the central location.
Thats going to take ages and be boring as hell waiting for the data to restore from the other 4 tapes... is that my only option? can anyone think of another way i could do it?
Thanks in advance for any responce because my brain is fryed thinking about it!
Comments
-
paintb4707 Member Posts: 420I personally think you should have a backup server at every site doing disk-to-disk backups. Tapes require way too much administrative effort and are too much of a nuisance, get lost, broken, need to be cleaned, etc etc. Additionally, have an external drive connected to each backup server and create two simple batch files to copy and delete old backups to the drive in case the backup server is unbootable for any reason. The only way you're going to have backups efficiently stored in a central location is online backups. And when you're accumulating over 14gb a night in incrementals, I'd suggest you look into upgrading your ISP. The least you have to do is a better job that you're doing. Do you honestly want to be responsible for restoring tapes every day? What if you go on vacation and one of your sites has a disaster?
Off-site backups are great but they're intended for redundancy, not an end all solution. You need on-site backups as well. I think it would be a terrible decision to send all your backups offsite. That would dramatically ruin your RTO if you ever had a disaster. You would need to overnight an external drive to your second site opposed to restoring an onsite backup and getting back up the same day. -
astorrs Member Posts: 3,139 ■■■■■■□□□□How much does the data change at each site every day and what are the links connecting them (and how loaded are they at night)?
I would seriously consider how best to accomplish a B2D at the remote sites combined with block level replication to disk at the central site and then to tape and offsite.
I can't agree with paintb4707 about the lack of a need for tape, I think not sending tapes offsite from that location and only storing on local disk is a disaster waiting to happen. -
paintb4707 Member Posts: 420astorrs wrote:I can't agree with paintb4707 about the lack of a need for tape, I think not sending tapes offsite from that location and only storing on local disk is a disaster waiting to happen.
That's not what I meant at all. I wasn't saying that he shouldn't have off-site backups at all. All I said was there's a much better alternative to the old tape method. -
astorrs Member Posts: 3,139 ■■■■■■□□□□paintb4707 wrote:That's not what I meant at all. I wasn't saying that he shouldn't have off-site backups at all. All I said was there's a much better alternative to the old tape method.
-
Rikku Member Posts: 82 ■■□□□□□□□□I would seriously consider how best to accomplish a B2D at the remote sites combined with block level replication to disk at the central site and then to tape and offsite.
I think astorrs has it on the right path...
Why not do an initial disk to disk full backup/mirror. Send the disk to the central site, then setup like a secure remote incremental backup with something like robocopy create and schedule a nightly batch file/job?
I guess, depending on the slowness of the links and how much your incremental backups are needed for the nightly job will need to balance out...In the end you would need to upgrade the ISP and there are some good deals out there....but it's a start I think for you to go on.
-Rikku -
astorrs Member Posts: 3,139 ■■■■■■□□□□Rikku wrote:then setup like a secure remote incremental backup with something like robocopy create and schedule a nightly batch file/job?
-
blargoe Member Posts: 4,174 ■■■■■■■■■□astorrs wrote:I would seriously consider how best to accomplish a B2D at the remote sites combined with block level replication to disk at the central site and then to tape and offsite.IT guy since 12/00
Recent: 11/2019 - RHCSA (RHEL 7); 2/2019 - Updated VCP to 6.5 (just a few days before VMware discontinued the re-cert policy...)
Working on: RHCE/Ansible
Future: Probably continued Red Hat Immersion, Possibly VCAP Design, or maybe a completely different path. Depends on job demands... -
Drew1872 Member Posts: 10 ■□□□□□□□□□Hi guys,
There is no current backup's in place at all (shocking i know) i've been in the company for less than a week and this is one of the first things they have got me doing. I've checked with the ISP's and they have no upgrades available so we're working with 6mb down and 1mb up at all the sites except 1 of the sites which has 24mb download link not sure about the up.
Thanks for the advise so far -
astorrs Member Posts: 3,139 ■■■■■■□□□□Those are all reasonable links (256k frame is slow, those links aren't in my opinion). Do you know what the load on the circuits are after hours (is there an after hours - and what is the window?). Also can you estimate how much change occurs every day; for example you mentioned video, do you know how much new video is written daily at each site?
Is there money available to do this properly, or are they giving you $1000 to figure it out. -
Drew1872 Member Posts: 10 ■□□□□□□□□□The load on the circuits after hours are minimal, just the usual replication and keep alive traffic that occurs between domain controllers.
There's probably about a 10 hour window available.
I would guess at about 20 gig per day, per site.
There is some decent money available if i can come up with the right solution. -
astorrs Member Posts: 3,139 ■■■■■■□□□□Drew1872 wrote:I would guess at about 20 gig per day, per site.
Oh and what kinds of links are these MPLS, VPN etc? And how big is the pipe in the main office? -
Drew1872 Member Posts: 10 ■□□□□□□□□□Yeah completely new data
Its a VPN link to the main office which has 2x 6mb Download links -
astorrs Member Posts: 3,139 ■■■■■■□□□□Okay well that won't work then - too bad. We'll need a 10Mb link on those sites to be able to keep up with that level of change.
Now we'll have to look at option #2...
- What country (or countries) are these sites in?
- and how far apart are they located? (roughly, as in hundreds of miles all in different cities or throughout 1 or 2 cities)
- Are there IT staff at each location? Or more specifically are there even staff onsite daily at each location? (these aren't surveillance systems for automated oil platforms or something are they?) -
Drew1872 Member Posts: 10 ■□□□□□□□□□All the sites are in the same country (UK) in different city's about 200miles apart from each other. There is IT staff at 2 of the sites full time but not at the other 3 sites. The other 3 have regular staff at them all the time.
-
Drew1872 Member Posts: 10 ■□□□□□□□□□Just out of interest astorrs what was option number 1 going to be?
-
astorrs Member Posts: 3,139 ■■■■■■□□□□Oh and what is the total volume of data currently at each site, you mentioned an 8TB storage array, but how much is actually in use presently?
And how long do we need to keep the video for? # of weeks, months, years... -
Drew1872 Member Posts: 10 ■□□□□□□□□□they are not using the 8Tb storage yet so nothing on there, they have under 2TB of data currently on the systems. I would think we need to keep it on these systems for about 6 months then it could be moved off to a archive somewhere else.
-
astorrs Member Posts: 3,139 ■■■■■■□□□□Drew1872 wrote:Just out of interest astorrs what was option number 1 going to be?
Block level replication would only replicate changes that occur at the physical disk layer. For example, opening a 5MB document and correcting a typo of the word "Helol World" to "Hello World" results in 5MB backed up through traditional file level backups (NtBackup, Backup Exec, etc) while it might only result in a few KB of changes at the block level, therefore we would only replicate those few KBs instead of the entire file again.
The problem is that to replicate 20GB of new data over a 1Mbps link requires just under 90hrs a day... i.e., impossible. -
astorrs Member Posts: 3,139 ■■■■■■□□□□Drew1872 wrote:they are not using the 8Tb storage yet so nothing on there, they have under 2TB of data currently on the systems. I would think we need to keep it on these systems for about 6 months then it could be moved off to a archive somewhere else.
EDIT(S): Oh and do you guys have a preferred vendor you use for servers, etc? (might as well give the recommendations using those products - assuming they meet the need of course).
Also, if you were to loose all the data on the storage array at the site and needed to restore from backup how long could you live without access to the data and what would be the impact to the company ($$$)?
Is there a server at each site or are all the servers centralized?
What are the 8TB storage arrays? Make/model? -
richyfivealive Member Posts: 17 ■□□□□□□□□□Have you thought about something like cisco WAAS?
http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html
I know you have slow links but we have recently tested this and had very good results.. -
Drew1872 Member Posts: 10 ■□□□□□□□□□It write-once, we use Dell servers, i've been looking at HP backup solutions but not limited too that. We would need the data back asap and the impact would be large so need it back quick
-
astorrs Member Posts: 3,139 ■■■■■■□□□□richyfivealive wrote:Have you thought about something like cisco WAAS?
http://www.cisco.com/en/US/products/ps5680/Products_Sub_Category_Home.html
I know you have slow links but we have recently tested this and had very good results.. -
astorrs Member Posts: 3,139 ■■■■■■□□□□Drew1872 wrote:It write-once, we use Dell servers, i've been looking at HP backup solutions but not limited too that. We would need the data back asap and the impact would be large so need it back quick
Is there a server at each site or are all the servers centralized?
What are the 8TB storage arrays? Make/model?
And finally, why do you need to "Create a central store for all the data", is it just for DR purposes or is there a need to review it/report on it/something else that can't be done remotely. -
Drew1872 Member Posts: 10 ■□□□□□□□□□its just for DR purposes, the data wont be needed that often unless a customer requests it again, it needs to accesable but not instantly.
There is a server at each site and the disk arrays are unbranded (purchased before i joined the company) -
astorrs Member Posts: 3,139 ■■■■■■□□□□Okay Drew1872, based on what you've said here, this is what I would do:
At each of the remote sites deploy a tape library and perform a full weekly backup of all the data on Friday during the backup window (after hours/at night). Perform incremental backups every other night (skip the weekend if no data is added). Remove all the tapes every Friday morning and replace them with a new set. Run a scheduled task on Friday in the day to remove all data from the array that is >6 months old. Start with a small number of tapes in the set, you can add tapes to the weekly sets as the amount of data grows larger.
I looked at the Dell options and as usual was not impresssed (sorry Dell). The PowerVault 124T LTO-3 is old and overpriced and won't cut it, and while the Dell PowerVault TL2000 is a good solution of its own accord, it is overpowered (and subsequently overpriced) for what you need.
The HP 1/8 G2 LTO-4 Ultrium 1760 Tape Autoloader would just do the job, but you would have no room for an increase in requirements over time due to it only having 8 slots. The other options from HP are too overpowered/expensive.
Therefore I would look at the options from the major device manufacturers, Overland Storage, StorageTek and Quantum (everyone else pretty much just OEMs those 3's equipment). Using Overland as an example since I am the most familiar with their product line I would look at using the Overland Storage ARCvault 12 LTO-4 (Mfr Part#: OV-ARC101013), it is a 2U rack mountable autoloader with 1 tape drive and 12 slots and includes remote web management capabilities (for diagnostics and such). Price is around US$6700 plus onsite support (if required). This will allow you to perform backups up to 9.6TB uncompressed (you will get really bad compression rates on video as it probably has already been heavily compressed by the codec used, so don't listen to the 2:1 crap the vendors will try to sell you on). Also, since the libraries will be remote, purchase a cleaning tape and leave it in the highest numbered slot. When the drives need cleaning (only clean them if the drive tells you to!) you won't have to have someone locally at the site swap anything.
I would then install Symantec Backup Exec for Windows Servers 12 (Mfr Part#: 13570915, Price: US$1100 inc. 12 months essential support) on the server at each site and connect the tape library to that server. On the backup server at the main office I would install the Central Admin Server Option (Mfr Part#: 13573445, Price: US$1750 inc. 12 months essential support) for Backup Exec to allow central management and reporting of all the backup servers/jobs. I'm assuming the storage array is connected to this server, if it's not and is instead accessed over the LAN you will want to make sure there is a Gigabit Ethernet connection between the server and the array.
For DR purposes negotiate a contract with someone like Iron Mountain to provide Backup Tape Vaulting services. Essentially every week the vendor will show up at each of the sites (there will be different couriers, not the same poor guy driving accross all of the UK ) and remove the previous weeks tapes. He will also return the tapes from 2 weeks ago (there will need to be 3 sets of tapes in rotation for each site, 1 at the site being used, at least 1 in the vault at Iron Mountain and 1 potentially in transit on the Friday swap run). They provide reporting, etc and will use the barcodes on the tapes to catalog everything. The libraries and software I proposed will read those labels automatically (the library has a barcode reader inside it) and catalog each backup so you can easily determine which tapes you require to restore something without the need to translate it to the vendors own numbering system (instead Iron Mountain or whoever will use yours).
At each site identify a primary person (and their backup) who will be responsible for changing tapes on Friday; you will want to have this signed off by their managers and potentially added to their job descriptions by HR, so there is no confusion about the importance of this task (I've found these two things work really well because the employee doesn't feel torn by the work they are paid to do, and the work they are "asked" to do to help IT). If the tapes don't get swapped the backups will fail and your data is at risk - to make sure this doesn't happen it's important to have top down support. Also, if Friday is a holiday where the office is closed, reschedule the tape swap to occur on Thursday for that week (don't forget to tell Iron Mountain in advance!)
If the storage array crashes and all the data is lost it can be restored from the local tapes already in the tape library on site. If the entire facility burns to the ground there will be a copy at Iron Mountains Vault that they can redirect to the facility of your choosing (anywhere) for restoration. The plan here allows for the loss of potentially a weeks data (the site burns to the ground on Thursday night). For many companies this is acceptable, but if you need a tighter windows you can increase the frequency of the backup tape vaulting exchanges. Just be aware you will be charged for each pickup and you will probably need additional tape sets - it's up to you and your company to find the right balance.
This would allow you to have local restore capability, offsite protection in world-class secure vaults, management of rotation to/from offsite storage with monthly/quarterly reporting, centralized management and reporting of job success/failures, etc.
Thoughts? Comments? Concerns?
Andrew -
blargoe Member Posts: 4,174 ■■■■■■■■■□Those aren't slow links. Maybe not fast enough for what you're trying to do, but they sure as heck aren't "slow".
Some kind of a wan accelerator might do the trick, plus some kind of method to copy the files that change as it happens. Dumping 14GB at a time might not work, but every so often, 500MB over a 6MB link with a wan accelerator bursting that particular traffic might do the trickIT guy since 12/00
Recent: 11/2019 - RHCSA (RHEL 7); 2/2019 - Updated VCP to 6.5 (just a few days before VMware discontinued the re-cert policy...)
Working on: RHCE/Ansible
Future: Probably continued Red Hat Immersion, Possibly VCAP Design, or maybe a completely different path. Depends on job demands... -
astorrs Member Posts: 3,139 ■■■■■■□□□□blargoe wrote:Those aren't slow links. Maybe not fast enough for what you're trying to do, but they sure as heck aren't "slow".blargoe wrote:Some kind of a wan accelerator might do the trick, plus some kind of method to copy the files that change as it happens. Dumping 14GB at a time might not work, but every so often, 500MB over a 6MB link with a wan accelerator bursting that particular traffic might do the trick
Here's my calculations, feel free anyone to point out any mistakes/comments you see. I'd still prefer this option...
Stuff we know:
- 20GB of new video is recorded everyday
- 20GB is equivalent to 160,000Mb (8 bits in a byte)
- the data needs to be transfered from the remote sites to the central site
- the 5 remote sites have a 1Mbps upload cap
- the central site has 2x6Mbps download links
- all connections are through VPN
For the purposes of these calculations assume there is no other traffic on the links and that they are "perfect" (no protocol overhead, no latency, etc).
160,000 Mb / 1 Mbps = 160,000 seconds
160,000 seconds / 60 seconds in a minute = 2,666 minutes
2,666 minutes / 60 minutes in an hour = 44.4 hours
Since this is all new data, block level replication will still account for 20GB so we won't get any help there (vs. small changes to existing files where block level replication can really help).
WAN acceleration is primarily focused on three things:
1) compression of data - we won't get much help here since the video is most likely already highly compressed.
2) optimization of TCP - this will provide some benefit to make TCP less chatty especially if the latency exceeds 50ms.
3) caching (to prevent duplication of data accross the WAN) - since the video is always new we will only see minor benefits here in the reduction of data (packet headers and such, the ocassional set of duplicate blocks)
Before anyone says it, things like UDP acceleration of streaming video only benefit you when you are streaming to multiple people at once (in which case the video is only streamed across the WAN once). We're only sending it to one destination here.
I called a couple of friends who work as SEs for Riverbend and Citrix ANG (WANScaler) and they both agreed that the best that could be expected was maybe 10-15%, thus reducing the time required to 37hrs (at best) per day.
Since there is probably existing traffic going over the links, combined with the fact that no link is ever perfect and always has overhead (especially VPNs) we would need to triple the link speed to be able to transfer the data within the 24hr period.
Unless the existing traffic is high or the latency on the latency is above 50ms (where the TCP optimizations really start to benefit the acceleration), neither of them felt WAN Optimization was worth it at ~ US$10k/site for the upgraded 3Mbps links required to meet the daily window.
So I think you're still stuck with doing local backups and off-siting them at each site.