In our penultimate piece looking in more detail at the Department for Education Cyber security standards, we will be looking at two related themes:

  • You should have at least 3 backup copies of important data, on at least 2 separate devices, at least 1 must be off-site
  • Your business continuity and disaster recovery plan should include a regularly tested contingency plan in response to a cyber attack

Backup

Having an adequate backup regime in place is one of the most basic features of a functioning IT operation and yet it is not always something that is at the forefront of the minds of those responsible for the overall school operation.

It may surprise people to learn that backups are barely referenced at all in the Cyber essentials accreditation scheme, a cursory note in the infrastructure text that they are a good idea is as far as it gets.  It is reassuring therefore to see a much more detailed section in these standards fleshing out a little what is deemed the minimum level of backup that should be in place.

That being said, there are still some grey areas here that need careful consideration and some omissions that need highlighting and once we’ve looked at what the standard says we can look at these in a bit more detail.

To summarise the key points:

  • You should have 3 backup copies, on at least 2 devices and at least 1 copy must be stored off-site (or sufficiently far from the original to not be affected by dangers such as fire or flood).
  • You should schedule backups regularly, although it doesn’t use the word the implication would be automated backup schedules
  • At least 1 of the backups must be offline at all times
  • Access to the backups should be restricted to a limited number of devices and user accounts and those accounts should be protected by multi-factor authentication
  • Backups must be regularly checked to make sure they work

Where this leaves some room for interpretation is that individual schools and colleges are to determine what data is classed as important and what schedule for backing up is appropriate for that data.

To make those decisions we need to cross into business continuity/disaster recovery language and consider some key decisions that need to be made.  Firstly, how will you classify your data and determine what level of importance to assign to each element?  You could of course simply lump all data together and have everything backed up to the same schedule and the same level of cover but that probably isn’t very efficient and may be quite expensive when you start looking at the offline backup options.

Now, hopefully if you went through GDPR adoption well you will already have impact analysis for your key data and this is probably a good starting point for the exercise at hand.  Identify the different data sources you may have, for example finance data, payroll data, personnel data, student records, office files, student files.  Then, there are two key metrics that you will start to hear – RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

In the simplest of terms, what these mean are as follows:

  • RTO – how quickly would you need to recover the data in the event of something going wrong?
  • RPO – how much data would you be comfortable with potentially losing – for example, you backup a data set at 11 pm every day; if you have a problem at 4:30 pm you will have potentially lost everything since 11 pm the previous day.

These are decisions that need to be made after careful consideration when defining the business continuity/disaster recovery plan as there is usually considerably more to the decision than first appears and it may be that an alternative contingency plan will reduce the time pressure on restoring the system or if a system is using a database that may have technology that tracks transactions between backups meaning once a day is fine as you can recover all the days transactions via the system.

There will also need to be a prioritisation of which order to restore data in the event of multiple systems being impacted.  Again this may be complex as the first reaction may reveal linkages and dependencies that require systems to be brought back in a certain order.

Whatever point you arrive at there needs to be at least three copies of your data stored in more than one location.  An interesting detail of the standard is that this doesn’t necessitate three separate devices and locations – as one copy has to be offsite (and by definition this should be on a separate device to the main copy) it is acceptable to have two copies on the same device as long as they are separate copies – that is one isn’t overwriting the earlier copy.  However, you end up, as long as you have at least 3 copies you will meet the specification.

As always, there will be a trade-off between how many copies you have and the cost of storage and time to create the backups.  There is a tendency to think of having more backups to reduce the RPO but this may increase the costs substantially so each source of data should be considered on a case-by-case basis.

Another critical part of the standard is the need to have at least one copy of the backup as an offline one.  That is, once written it is no longer accessible via the network and can not be changed.  The standard uses the term cold backup, another that is used in the marketing of systems to achieve this is an immutable backup.  The reason this is now so critical is that there has been an increasing trend over the past few years for cyber attacks, particularly those linked with ransomware, to behave stealthily for long periods after the initial breach.  There are many cases where systems have been compromised for months before anything malicious happens and part of the reason for this is to allow for the compromised systems to make their way onto the backups and, in some cases, ransomware has attacked the main copy and the backup copy simultaneously rendering both useless.

The immutable copy gives a level of protection over and above this as, once written, it can’t be corrupted or changed as it is no longer connected to the network.  Naturally, such systems come with their own costs and this is usually a trade-off based mainly on storage requirements.  However, this needs to be actively considered as part of your backup planning to ensure that you have maximum protection.

Finally, on the subject of backups, an area not to miss is your cloud-based systems.  I have seen many organisations move to a cloud-based system such as Microsoft 365 or Google and then assume that they no longer have to worry about backing up.  I cannot stress enough, these systems do not perform backups in the traditional sense.  While the systems are very good at maintaining multiple copies of documents, with version control allowing you to go back to various points in a documents life, and they don’t actually delete the files when you press delete but send it to a recycle area for a set number of days, what they don’t do is create a backup.  You may of course look at the data stored and think that the level of risk posed by not having a backup is low so consciously decide not to back the data up.  However, this is something that needs to be actively considered.  This is particularly relevant if you use a cloud-based email service.  Would you be OK with potentially losing large amounts of email data with no way of recovering it?

As you may see, the technical specification for backup is quite straightforward but opens several large cans of worms to sort through before coming up with a workable solution.  The best way to achieve this is to make it an output of the school of college business continuity and disaster recovery planning process which takes us neatly to the next part of the standard.

Business Continuity/Disaster Recovery

The first and most important thing to say about this element is that it should not be led by IT, whether internally or externally managed.  The IT elements should be taken as part of the wider school or college business continuity/disaster recovery planning process because while increasingly digital resources and systems are critical to everyday operations, decisions about priorities and elements such as RTO/RPO should be made collaboratively with senior leaders and wider stakeholders to be sure that the plan will match the expectations of the organisation.

It is also vital that not every technical issue is seen to require a technical solution in the first instance.  For example, a natural response may be that systems have to be back up and running asap but this isn’t helpful in a recovery scenario as each recovery will take time and there may be dependencies that dictate the order systems come back online.  For example, if a key piece of hardware needs replacing, can this be ordered without the finance system being active?  When looking at the priority order of recovery and what a reasonable time to recover may be – is there a simple analogue process that could be used as a contingency to relieve pressure to recover the electronic system?

There are of course different scales of magnitude for what could go wrong.  The planning should consider all the ways operations could be interrupted, everything from a snow day to a major incident such as a fire.

One area that IT will need to take a big hand in relates to business interruption due to cyber attacks.  Although a subset of the wider continuity plan there will be significant elements of planning specific to this scenario and there are resources available from NCSC to help guide this process.

In order to meet the standard there needs to be, as part of the wider business continuity/disaster recovery plan, a plan that contains as a minimum:

  • staff responsibilities
  • out-of-hours contacts and procedures
  • internal and external reporting and communication plans
  • priorities for service restoration
  • the minimum operational IT requirements
  • where additional help and resources can be found

Hard copies of the key documentation should be kept, including off-site, in the event of a total system (or site) failure.

Most importantly of all, the plans must be tested regularly.  What’s more, I would recommend desktop testing with the wider business continuity group to ensure that everyone is aware of their role.  Such an exercise can either examine specific scenarios, such as a cyber attack or can form an escalating scenario with new information and challenges being added as the test goes along to fully stress test the plan.  Without the testing a few things are likely to happen:

  • Your contingencies won’t actually allow you to operate as you anticipate
  • Your timings may be grossly optimistic and recovery will be longer than expected
  • You will miss a key element that may make the plan less effective
  • The people tasked with implementing the plan may not understand their roles and responsibilities resulting in them not being able to act effectively
  • Your risk analysis of various scenarios may not be accurate – one scenario may lead to another that wasn’t forseen

Developing a robust business continuity and disaster recovery plan is a complex and time-consuming exercise but doing so will significantly improve the chances of successfully recovering from an incident.

Should you require any help with the development of a plan, particularly in relation to IT aspects and testing please get in touch, we’d be happy to help.


ITSpire Contact