VMWare snapshots are EVIL!

langenoirlangenoir Member Posts: 82 ■■■□□□□□□□
If you were a VMWare admin at a company where you thought people don’t really get how to engineer/admin virtualized environments what would you think of this?

I am trying to get monitoring set up in the main ticket system so that users cannot abuse snapshots by letting them linger for more than 48 hours and avoid snapshot nesting. So I point out the dangers users abusing snapshots and the second level manager asks if there is a way to monitor the snapshot in our main ticketing system.

And this is the response from two of the first level mangers:

Manager1
“In my opinion, snapshots are evil and should be avoided at all cost. at best they should be treated as a secondary safetynet behind a clone, or veeam backup. They degrade the performance of disk IO, and we've seen it cause problems time and time again in production since we started using ESX.”

Manager 2
“not to mention the fact that going back to a snapshot on machines that are member of the domain is problematic - like the machine not being able to communicate with the domain and needing to be rebuilt anyway.”

Keep in mind that most of our guests are using VAS on RHEL to integrate into AD, what would you think of their response?

I know what I think, I also know that I’m biased. So, I’m just wondering how unbiased admins would view those statements.

Comments

  • scaredoftestsscaredoftests Mod Posts: 2,780 Mod
    How in the world can they be evil? They have saved my arse many a time when a patch or upgrade went haywire.
    Never let your fear decide your fate....
  • scott28ttscott28tt Member Posts: 686 ■■■■■□□□□□
    They do have their uses - upgrades and patching, test & dev, leveraged by backup/replication tools...

    But - they're not a substitute for backups, and need to be used by people who know what they are, know what they do, and who know the benefits and risks.
    VCP2 / VCP3 / VCP4 / VCP5 / VCAP4-DCA / VCI / vExpert 2010-2012
    Blog - http://vmwaretraining.blogspot.com
    Twitter - http://twitter.com/vmtraining
    Email - vmtraining.blog@gmail.com
  • scaredoftestsscaredoftests Mod Posts: 2,780 Mod
    Of course not a substitute for backups (unless you have Backup Exec 2012)..kidding..kidding
    Never let your fear decide your fate....
  • langenoirlangenoir Member Posts: 82 ■■■□□□□□□□
    We have Veeam backing up locally and replicating to offsite. So two locations there.
  • DevilWAHDevilWAH Member Posts: 2,997 ■■■■■■■■□□
    Snapshots have there place, but they are not backup tools!

    I don't have a hard time limit for them. and why you would need to rebuild a machine rather than just sort our the domain issue I have no idea.

    the positives and negatives are clear so just balance them up, IT is never black any white, its not right or wrong to use them, jsut good or bad and that entirely depends how you use them. dont let them linger around but honestly unless you have dont the preformace comparisons you can't go spouting about "degrading IO performance!" Indeed in most cases you would never notice the impact of a single snapshot.

    I see snap shots as a short term recovery mechanism for when making changes, and I put in to the description when the snapshot can be removed. So before taking it we have already planned its removal and unless I go back in an change it again, then when some one looks at snap shots it will get deleted even if I have forgotten.

    Description
    "date taken - 27/04/2016
    Reason - Upgrade of Skype from version 15.0.1 to 15.0.2
    Date for removal - 04/05/2016"

    People get two tied up in "best practice" or "it affects performance" with out understanding the real issue, impacts and being able to weight things up.
    • If you can't explain it simply, you don't understand it well enough. Albert Einstein
    • An arrow can only be shot by pulling it backward. So when life is dragging you back with difficulties. It means that its going to launch you into something great. So just focus and keep aiming.
  • joelsfoodjoelsfood Member Posts: 1,027 ■■■■■■□□□□
    Snapshots have some limited uses but are best avoided when people don't understand them and leave them around too long. I monitor both for age and size of snapshots to avoid said issue.
  • langenoirlangenoir Member Posts: 82 ■■■□□□□□□□
    DevilWAH wrote: »
    I don't have a hard time limit for them. and why you would need to rebuild a machine rather than just sort our the domain issue I have no idea.

    This was one of my big issues of the two comments.
  • scaredoftestsscaredoftests Mod Posts: 2,780 Mod
    DevilWAH wrote: »
    Snapshots have there place, but they are not backup tools!

    I don't have a hard time limit for them. and why you would need to rebuild a machine rather than just sort our the domain issue I have no idea.

    the positives and negatives are clear so just balance them up, IT is never black any white, its not right or wrong to use them, jsut good or bad and that entirely depends how you use them. dont let them linger around but honestly unless you have dont the preformace comparisons you can't go spouting about "degrading IO performance!" Indeed in most cases you would never notice the impact of a single snapshot.

    I see snap shots as a short term recovery mechanism for when making changes, and I put in to the description when the snapshot can be removed. So before taking it we have already planned its removal and unless I go back in an change it again, then when some one looks at snap shots it will get deleted even if I have forgotten.

    Description
    "date taken - 27/04/2016
    Reason - Upgrade of Skype from version 15.0.1 to 15.0.2
    Date for removal - 04/05/2016"

    People get two tied up in "best practice" or "it affects performance" with out understanding the real issue, impacts and being able to weight things up.

    That is what we do, put in a description etc.
    Never let your fear decide your fate....
  • tbgree00tbgree00 Member Posts: 553 ■■■■□□□□□□
    I've been to too many clients that had performance issues that somehow decided nightly snapshots was the best way to go for backups. 36 snapshots later their file server was barely functional. They are good for rolling back upgrades and have saved me a number of times as well.

    As part of my current monthly maintenance we run a script to delete all snapshots older than 30 days. vRealize Ops also emails us if a backup doesn't consolidate or if a snapshot is bigger than 4 GB and we deal with those immediately.

    My biggest problem with backups is that they can cause a stun in I/O and on SQL or apps that are sensitive to latency or I/O drops it can cause some issues. Other than that they aren't evil, just misunderstood.
    I finally started that blog - www.thomgreene.com
  • gespensterngespenstern Member Posts: 1,243 ■■■■■■■■□□
    Regarding AD issue. It is usually related to a computer account password which is changed every 7 days (AFAIR) so this issue arises when a snapshot is taken, then the computer and the DC negotiate a new password for the computer account, then the computer is reverted back to a snapshot, resuling in the computer and the DCs having different passwords for the computer account.

    This is treated simply on windows, there's a netdom command and powershell command that do that, also, a dumb approach would be to remove from domain and rejoin domain.

    I can't tell for sure, but there should be an option in kinit or something on RHEL to force new computer account password negotiation after it falls out of sync.

    No need to reimage, it's just dumb and unproductive.

    Also, I don't get when people keep saying that snapshots aren't backups. Is it some kind of common misconception or what? In some sense snapshot can be considered as a backup, in some it can't, but in the end snapshots could be used and are used all the time as a quick and convenient way to revert changes back.
  • scaredoftestsscaredoftests Mod Posts: 2,780 Mod
    Good grief, if we have too many snapshots, it creates chaos for Backup Exec.
    Never let your fear decide your fate....
  • LexluetharLexluethar Member Posts: 516
    Snapshots provide a valuable tool for administrators, but it is just that - a tool and not a backup solution.

    We take snapshots prior to upgrades because rolling back an upgrade from a snapshot takes a minute of down time, opposed to rolling back from a backup may take 30 minutes to an hour depending on the size of the VM.

    With that said it sounds like the issue is NOT auditing snapshots - but rather too many people have access to taking snapshots. As you said I wouldn't keep a snapshot longer than 48 hours - if you keep them longer (we keep upwards of a week) you MUST babysit them and make sure performance is not being effected.

    Sure audit them, but realistically you should not be allowing more than and admin to take snapshots.

    Side note: Why would recovering from a snap have any effect on a domain? It uses the same SID when you restore so why would that even matter?
  • kiki162kiki162 Member Posts: 635 ■■■■■□□□□□
    @langenoir LOL, what type of company do you work for?
  • DigitalZeroOneDigitalZeroOne Member Posts: 234 ■■■□□□□□□□
    langenoir wrote: »
    I am trying to get monitoring set up in the main ticket system so that users cannot abuse snapshots by letting them linger for more than 48 hours and avoid snapshot nesting. So I point out the dangers users abusing snapshots and the second level manager asks if there is a way to monitor the snapshot in our main ticketing system.

    We run a script that removes snapshots after a certain period of time with PowerShell/PowerCLI. It became too cumbersome, to track down people, find out if they need the snapshot, and then get them to remove them. Management told everyone what the new policy was, and now the removal is all automated.

    On the "snapshots are evil" claim...they are a good tool to use, just easy to abuse, so it's sometimes best to just rule with an iron fist.
  • TheProfTheProf Users Awaiting Email Confirmation Posts: 331 ■■■■□□□□□□
    I usually use a script that checks every morning for things like snapshot age and then reports by email on what servers have snapshots and how old they are.

    Now, snapshots in their nature are not a bad thing at all, what makes snapshots dangerous, is how people use them. Same concept applies to resource pools. Everyone says avoid resource pools, but that's because most of the time, they're not properly configured, and like snapshots, they cause problems.

    Use snapshots for specific use cases like making modifications to various apps, upgrades, etc. Don't keep a snapshot for more than 48 hours and don't snapshot a server that generates a lot of data fast, you can end up running out of space.

    Snapshots do have their use cases, saved me many times.
  • langenoirlangenoir Member Posts: 82 ■■■□□□□□□□
    kiki162 wrote: »
    @langenoir LOL, what type of company do you work for?

    A tech company that I think should be much better at doing this sort of thing than they actually are... :/
  • blargoeblargoe Member Posts: 4,174 ■■■■■■■■■□
    TheProf wrote: »
    Now, snapshots in their nature are not a bad thing at all, what makes snapshots dangerous, is how people use them. Same concept applies to resource pools. Everyone says avoid resource pools, but that's because most of the time, they're not properly configured, and like snapshots, they cause problems.

    Use snapshots for specific use cases like making modifications to various apps, upgrades, etc. Don't keep a snapshot for more than 48 hours and don't snapshot a server that generates a lot of data fast, you can end up running out of space.

    Snapshots do have their use cases, saved me many times.


    I agree with all of this 100%.
    IT guy since 12/00

    Recent: 11/2019 - RHCSA (RHEL 7); 2/2019 - Updated VCP to 6.5 (just a few days before VMware discontinued the re-cert policy...)
    Working on: RHCE/Ansible
    Future: Probably continued Red Hat Immersion, Possibly VCAP Design, or maybe a completely different path. Depends on job demands...
  • alias454alias454 Member Posts: 648 ■■■■□□□□□□
    Public shaming works for us. The vCenter alarms get sent out to a distribution list of all the admins with access to create snapshots. We have a handful of people that can create snapshots and they fear my incessant nagging about removing them if they are around too long.

    Snapshots are a great tool and make my life better for the most part. They can cause a great deal of pain if not managed correctly though.
    “I do not seek answers, but rather to understand the question.”
  • JockVSJockJockVSJock Member Posts: 1,118
    Ditto with what everone is saying here.

    - Snapshots are NOT backups, they capture a period of time for an OS that we can fall back to if we are testing something. End users want to hold on to them forever and learn fast that it will screw things up (see next bullet point)

    - We only keep snapshots for 48 hrs, otherwise they go corrupt and eat up all of the free disc space on the LUN

    Another thing too is that we set all of the OS Disk space for Thin Provisioning vs Thick Provisioning. We only do Thick for Disk Space that holds Oracle or MS SQL per VMWare's best practices
    ***Freedom of Speech, Just Watch What You Say*** Example, Beware of CompTIA Certs (Deleted From Google Cached)

    "Its easier to deceive the masses then to convince the masses that they have been deceived."
    -unknown
Sign In or Register to comment.