The High-Performance Computing (HPC) market is huge. According the consulting firm Emergen research, HPC spending will exceed $66 billion in 2028.
Advances in HPC compute and storage workloads continue to drive data management challenges; as new storage technologies are leveraged come the challenges of multiple storage silos, heterogeneous file systems and long-term storage requirements which are driving the need for reliable and powerful solutions.
In HPC environments, rather than move existing project data for burst and compute sessions, data is often recompiled afresh with high costs and uncertainty over results. Backups can be a way to avoid recompiling; but are there still insurmountable barriers to backing up HPC data? Here is Atempo's take on the three major HPC backups challenges.
HPC Data Backup Challenge #1 – Tree Walking
For HPC file systems, traditional data movement solutions have to go back to basics : they "tree walk" the filesystem to identify changes. This can take days or even weeks. Storage tier HPC admins are faced with making difficult choices for protecting or moving only a part of their data in the time slot available.
Atempo Miria’s FastScan feature collects lists of new, changed and deleted files at node level, enriched with meta data. The information is automatically made available to a Miria server which then orchestrates data movement and data protection.
HPC Data Backup Challenge #2 – VERSIONING AT volume
HPC workloads typically generate petascale volumes of data which need to be managed at each storage tier. Archiving is perfect for long-term storage on tape or cloud but is not suited to version management.
At very high data volumes, snapshots cannot provide a sufficient depth of versions; only backups integrate extensive time navigation and versioning functionality
Even if the challenges of versioning petascale data sets are real, Atempo Miria provides full massive file archiving and backup functionality for HPC storages.
HPC Data Backup Challenge #3 – Long retention periods
The major advantage of having access to historical data sets means you can not only reconstruct point-in-time data but also access this data prior to a cyberattack. Embedded viral attacks are often dormant for several months and recently backed up data is already compromised. Long retention air-gapped HPC data increases your chances of recovering precious and costly HPC research data.
Because many HPC compute results cannot be recreated or are very expensive to run more than once, backing up can be a cost-effective way of preserving essential data over time for re-use, for versioning, for compliance and to respond to security. Whatever the filesystem -GPFS, Lustre, NAS…- backups to the storage destination of your choice are now possible with Atempo Miria.
Learn more about how Atempo Miria addresses HPC data protection challenges: