Backups may not be the hottest of subjects, or the most exhilarating when viewed alongside the current developments in technology, but it’s still most definitely a cornerstone of any IT/OT strategy. Given how important a good backup is to your organisation, it intrigues me how considerations for backup and recovery are viewed from an IT level of a business when compared to those of the OT level. What I quite frequently see is in the IT teams have solid approach to backups and recovery options, with well established processes and procedures in place. Regular testing and off-site / offline replication being common.
Comparing to the OT realms, backups appear less important, and are often poorly maintained if taken at all. There seems to be an almost cultural difference in approaches between the IT and OT teams
Over the last few years there has been the widespread adoption of virtualisation technologies within the control system environments. Virtualisation can drastically reduce the issues of legacy hardware obsolescence and simplifying day to day tasks with centralised management capabilities. Whilst it has many fantastic benefits and advantages it presents a few common traps people fall into. I regularly see one key feature being mis-used. I am of course, referring to snapshots.
I would only be speculating if I was to try and give a validated answer as to the reasons for this, but I see it on an almost monthly basis. The main issue is it would appear that snapshots are used as a method for backups, and most commonly the terms “snapshot” and “backup” are freely interchanged.
We take regular snapshots, so we’re covered right?
It’s normally after hearing this my internal alarm bells start ringing and for one main reason, snapshots are NOT backups when using a hypervisor. Your snapshots are not backups because they depend on these other blocks of data to make any sense. If the underlying template is corrupt, your snapshot is useless. You need a separate backup to make sure you have your data after some unforeseen event, ideally on a different device from the hypervisor.
It’s also best practice to have your backup device separate to your main hypervisor hosts, so in the event of the entire host failure, you can still access the system to recover it. Additionally, replication of that protected data should be done to another location to give a geographical separation from the primary backup location.
Cloud storage options are an ideal location for this, given how cost effective they now are, but local replication to another device on the same site would also be a strong contender.
Letting the Weeds Grow
Snapshots grow with every system change and if you have multiple snapshots points your data growth will be significant. I’ve seen several systems where snapshots have got out of control and consumed the whole datastore, whilst driving disk performance through the floor! The usual outcome is a stalled VM and forced resets in an attempt to regain control of the system
I firmly believe that snapshots do have a valid use scenario in a virtualised controls environment, and that is solely for short terms changes and updates. These should be closely policed, and removed once proven successful.
For real backups and data protection, then a separate backup technology that takes complete VM images and can automate regular backups at a reasonable frequency would be a valuable investment.
Ideally your backup storage would be away from the hypervisor to reduce any impact on disk performance on the host, as well as removing a single point of failure. Once this is operational, I would advocate regular testing of your recovery methods to ensure all goes smoothly.
Backups and snapshots can be used in a complimentary manner, and using both can be a very effective approach to disaster resilience. Backup and recovery solutions have been in a competitive space for many years now, which has forced the vendors to make vast leaps in improvements, and simplification of the interfaces. Most are now relatively easy to configure and use, and recovery directly to the hosts can be done quickly and with minimal clicks.
A robust recovery strategy which is underpinned with the right technology can drastically simplify and changing your approach to taking and testing backups this can yield advantages in terms of saving both time money and effort. Equally, the automation of such tasks will ensure that they are performed consistently and accurately each time. Additional savings can be achieved with a complete BCDR solution, where the requirement to keep spare hardware is removed, and the solution offers an additional layer of resilience to the main infrastructure.
SolutionsPT have put together a range of recovery offerings that have been rigorously tested in an OT environment against a wide range of software applications to ensure they are the right tools for the job. Recovery no longer needs to be the time consuming and onerous task it once was, and the sight solution could actually free up your time to concentrate on other key tasks.
To find out more about range, and how they could be of benefit to your organisation in protecting your systems and data, please contact us to arrange a recovery solution review.