Incorporating Security into your Next HPC Procurement v1.1

Updated: 2020-04-07 (V.1.1)

If you’re not specifying security requirements from the outset of your HPC procurement process then it may become much more difficult to integrate them later. So what should you be asking?

This article describes five security requirements which we believe are relevant to and should be incorporated into any HPC procurement.

Whilst it might seem a good idea to make all of these requirements mandatory, unfortunately, security is not well practiced in the HPC world so mandating some activities will simply result in a selection of non-compliant bids or you having to be willing to accept a lower bar or turn a blind eye to non compliance from vendors. Doing so effectively condones a lower standard of security than you’d specified for your procurement and does not drive for the continued improvement the industry needs.

With a bit of specialist HPC security knowledge we can ensure that there are no showstoppers. The last thing you want is a security issue to be uncovered that requires some level of redesign of the system in order to be addressed; that is a costly and time consuming exercise which may well not be a commitment the vendor is held to or even able to fulfill. This guide takes that as our starting point:

Separate User and Management Networks

RequirementSeparate user and management networks
Mandatory?Mandatory
DetailHosts accessible by users (e.g. login nodes and compute nodes) should not be able to access the network on which nodes used for administration, system management and operation exist.
Why?This will likely be a difficult, and potentially impossible change to make once a system is built. At the very least it will require configuration changes which may not be fully supported, but in some cases it will simply not be possible with the hardware setup. It is, however, one of the single biggest security controls you an have in a multi user supercomputer. Even if a user is able to gain privileged access to a node this separation can prevent a full system compromise.
Importantly separate user and management networks also allows a network for other devices, for example management interfaces for storage controllers and switches, BMCs/IPMI, power supplies and even the embedded devices that control aspects of cooling to live. Some of these devices will unfortunately not uphold a good security posture and any weak component can result in the compromise of an entire system.
Segmenting these systems/interfaces means that even if vulnerabilities do exist in these systems they are not exploitable unless an attacker has first gained access to this network segment.

Evidence of Proactivity in Cyber Security

RequirementVendor should provide evidence of proactivity in the field of cyber security
Mandatory?Nice to have
DetailEvidence of proactivity around security. This could come in many different forms, for example the publication of security advisories relating to their products, contributing to the security of components/software on which their products rely, actively undertaking security assurance of their products.
Why?A vendor that has is undertaking proactive security work should be able to tell you about the activities that they undertake to validate the security of the system they are looking to sell you and what the attack surface of the system looks like. This should be true of both the components and software that they manufacture as well as any third party components and software that the system comprises. A vendor who understands their product attack surface is far less likely to be producing a product that is affected by systemic vulnerabilities requiring significant work to resolve.

Be wary of vendors who cite compliance standards (e.g. FIPS) for the technologies they use, this is a clear indicator that they’ve not spent the time to understand the attack surface of the system they are selling you. Equally any vendor who dismisses your security concerns or suggests that security has a performance trade off either does not understand security or is attempting to justify a poor design choice.

Waiting for security notifications within the technologies they use and then reacting to them is not proactive. You should expect the vendor to be undertaking their own security activities and initiatives themselves with the intention of identifying and resolving security issues in their product. A very proactive vendor will also be undertaking assurance activities on third party components that make up their product.

There is nothing stopping you undertaking your own assurance activities (and this is strongly recommended), but evidence of proactivity upfront almost always correlates to a more trouble free relationship on the security front if security issues are identified.

If at any point you feel that you are not and your procurement process allows for it ask for a conversation with their security team/person. It’s hard to be proactive on security if it’s nobody’s job to look after security; it need not be a dedicated role, but the responsibility should live somewhere and asking to speak with them should clarify what this looks like.

System Administration Documentation

RequirementVisibility of detailed system administration documentation
Mandatory?Nice to have
DetailInformation on how system administration activities are undertaken – for example re-imaging of nodes or changing other system attributes.
Why?One area that tends not to be very transparent until you’ve got your hands on a system is how you go about administration. The toolsets used for administering HPC systems frequently open up attack surface and have been the downfall of many HPC systems that HPCsec have looked at.

The perfect scenario here would be to get hands on a system with the opportunity to explore the system. However, if this is not possible a read through the system administration documentation will often give visibility of potential security issues and areas that should be explored further.

If you do not have the HPC security expertise in house then you’ll likely need to pass this documentation on to your HPC security expert to take a view on. Don’t expect to see known security issues like RSH being used instead of SSH, the issues will be more subtle and likely revolve around the manipulation of logic, so knowledge of how to attack a system will play a key part in digesting this.

Vulnerability Handling Process

RequirementA security vulnerability handling process
Mandatory?Mandatory
DetailInformation on the vendors vulnerability handling process, including how to report security vulnerabilities to them and how they will handle those vulnerabilities when reported.
Why?When a security issue arises is not the time you want to find out that your vendor has no way of handling security vulnerabilities, this is a process that any responsible vendor should have in place. This needn’t be complex and in most cases it is highly likely that they’ll request that you simply use the support contract and process that is in place, which is absolutely fine.

However, be aware that security vulnerabilities can be identified by anyone, so if the vendor has no means for non-customers to report vulnerabilities to them the vendor will likely never get to know about these and so your system will remain vulnerable to them. Again this needn’t be overly complex, a simple form on their webpage allowing security researchers to notify them about vulnerabilities would suffice. You could look online now and find out what the vendors you are considering have in place.

Another thing you can look for online is CVEs relating to the vendor. A vendor who has no CVEs reported against their product is unlikely to be particularly active when it comes to security. A CVE indicates that a vulnerability has been identified and resolved. No CVEs indicates that no vulnerabilities have been identified or resolved and is almost always a sign of a lack of looking rather than the sign of a secure product. You should take some comfort from seeing CVEs relating to the vendor you are considering as this indicates that they have handled vulnerabilities in the past.

It is worth discussing an SLA here too as there is often a chain: If your system uses a storage device that is found to contain an issue this may well not be something that can be directly resolved by your vendor and they will have to fall back to the contracts they have with the storage device vendor. Equally security vulnerabilities come in different shapes and sizes which can make it very difficult to put a time-frame on fixes. At HPCSec we offer brokering and arbitration service to ensure that security issues are being handled appropriately and in the best interests of all involved. If a fundamental flaw in a technology is uncovered then unfortunately it simply is not going to be resolved overnight, it may take months, even years for a complete fix. In some cases a workaround may be a sensible intermediate step although in most cases it will be be absolutely right to expect a timely fix from the vendor and their process should show this.

A final point to consider is that if security vulnerabilities are identified when will you get to know about them? Will the vendor notify you the moment they know, at the point they have produced a fix, or at some other time?

Undertaking Third Party Assurance

RequirementProactive Security on Third Party Components
Mandatory?Nice to have
DetailEvidence that there is some proactive work being undertaken in order to ensure that third party components or dependencies uphold expected security standards and their security model is understood.
Why?A supercomputer comprises many component parts a lot of which will not be under the full control of the vendor. Take storage as an example, it is not uncommon for a supercomputer vendor to provide a supercomputer which utilises a storage appliance provided by a third party (i.e. a company other than the one selling you the supercomputer). Equally other components like workload managers and other core software is often provided by a company other than the vendor. Where this is the case the security of that component is not fully within the control of the vendor selling you the supercomputer. It is therefore reasonable to expect that the vendor selling you the supercomputer has done some form of due diligence on the security of that component in order to ensure that meets your security needs and will not adversely impact the security of the system. The fact that you are reading this guide suggests you expect your suppliers to being proactive when it comes to security, so it is only reasonable to expect that they are doing the same with their suppliers.

There is no hard and fast rule as to what this should look like, but a few positive indicators of a proactive vendor would include:

* Evidence of some form of security assessment of third party components, perhaps a security testing report or even the publication or joint publication of some security advisories

* Ability to articulate a threat model of the third party components and how that interacts with and alters the threat model of the supercomputer as a whole

* Evidence of some form of engagement with the third party on the security front or at the very least some SLAs in place

A vendor engaging with their suppliers on security topics is the type of vendor who keeps their suppliers sharp when it comes to security. It is also a vendor who is likely to have a flow of communication on security topics being pushed at them by their suppliers, meaning the vendor will be receiving advanced notification of security topics affecting their suppliers kit meaning your system may benefit.

It would be infeasible for any supercomputer vendor to cover all third party components, so pick the key ones and investigate what work the vendor is doing with their suppliers on those. Recommendations would be to start with storage, workload management and potentially networking kit.

Security is still in its infancy within the HPC world. Over time and as the security maturity level increases we will update this guide to ensure that the bar continues to raise and that we can benefit from increased security in the systems that we are procuring.

“Security is one of those things that can be extremely difficult to retrospectively add. Make sure you get it right from the start”