Accessibility and Preservation: Better and Cheaper Archiving through Archive-tiering

Atempo Archiving

Many organizations are surprised by the impact of Big Data on their business, which is driving the exponential increase in the amount of data they produce.

One of the least anticipated - but most painful - aspects of this phenomenon is the skyrocketing storage costs associated with archiving data.

Fortunately, cost control solutions are emerging on the market allowing organizations to implement efficient data management practices. On one level data management is the movement of data from hot to cooler storage. Today, this one-way data traffic is less valid because all data is potentially required for analysis, value extraction or repurposing. It should be able to move freely between cold and hot storage. The principle of data tiering consists of adapting the class of storage to the life cycle of the data. Data tiering already allows for cost savings on primary storage transferring to lower-cost archive storage.

Thanks to archive-tiering and an optimized classification of their archived data according to their actual uses, organizations manage to drastically reduce their secondary and archive storage costs.

***

Controling data archiving costs with an optimized archive-tiering strategy

The first objective of archive-tiering is to save part of the archive storage costs; but its effectiveness depends on several other factors.

Tiering, or archive-tiering?

Conventional tiering solutions seek first to rationalize the cost of operating secondary storage - which is linked to the cost of storing backup, which increases with each full backup.

Tiering is based on a simple fact: 80% of the data on secondary storage will be used rarely or ever beyond 90 days after creation. It therefore makes sense to seek to reduce storage costs - without compromising their security.

Archive-tiering has the same objective of optimizing storage costs according to the age of the data, its strategic importance, the targeted storage duration, and the reuse ratio of archived data.

It should be noted that the technical challenge is more complex than for tiering insofar as, contrary to a backup that merely copies them, archiving manages the last copies of data that no longer exists on the secondary storage and must transfer them from one storage platform to another while guaranteeing their durability.

Archive-tiering at the service of organizations

More and more organizations are in need of an archive-tiering solution - and those with large long-term archive needs are leading the way. But it is not just a question of inert storage: even research centers that archive very large volumes of data over the long term must make it possible to access them for new analyses.

***

Building an efficient and cost-effective archive-tiering strategy

The efficiency of archiving depends first and foremost on the savings made, but also on the accessibility of the data. In order to maximise the use of both factors, organizations must adapt their archiving plan to the length of time they are looking to archive.

Short-term archiving: the advantages of the nearline archive

Nearline archive-tiering concerns "hot" data, which is likely to be mobilized by operations in the short term.

This solution will be of particular interest to organizations that handle volumes of data that exceed backup capabilities.

Audio-visual production companies, for example, can no longer simply keep all the data they need online for the duration of a film or video series production. They adapt their processing chain accordingly: after each stage (colorization, special effects, titling...), the data is archived in a disk based “active” archive shared between the different professions. This is called nearline archiving.

Some data remains there for up to a year, during which time it is reused to produce derived content such as TV trailers or summaries of the previous season. Only when this need for reuse becomes scarce, does the data switch to a less expensive, colder archiving mode.

Intermediate archiving: the advantages of air gap archiving on tape or in the cloud

Staying in audiovisual production, when volumes are very large and processing does not follow each other as quickly, it is also common to use multi-player tape libraries for this nearline archiving. Archiving is then completely integrated into the processing workflow and quickly moves large volumes of 4K productions from disk to tape and vice versa.

It is also common in companies to deal with data that is neither completely hot nor completely cold. This "warm" data is not given the same strategic priority as that which qualifies for nearline archiving. It is therefore preferable to store it on tape, a medium that allows secure air gap at lower cost. Tapes are no longer stored online: they are simply outsourced to a third-party site.

Medium-term archiving: the advantages of archive-tiering to Amazon Glacier or Google Cloud Archive

Concerns very large volumes of data for which the cost of in-house multicopy protection would be prohibitive, operational transition, data linked to short-term regulations... This "cold" data, which we want to remain accessible over the medium term but which we know in advance will most likely never be restored, is qualified for storage on services such as Amazon Glacier, Google Cloud Archive or other players such as Wasabi.

The choice between one or the other of the storage services is usually made on a comparison of pricing policies. A data management solution, such as Miria for Archiving by Atempo, helps simplify the archiving process and maintain full control over the data. It is Miria that controls cloud storage and tiering in the archive. So, if you decide to subscribe to a new Cloud Operator to achieve additional savings on the storage of your archives, you can simply migrate part of the content of your archives directly from the Miria interface. The solution will then automatically move multiple copies of the archived data between these cloud storages.

While Miria for Archiving also simplifies the restoration procedure (manual from the interface or via APIs), despite recent price reductions, egress or restoration Glacier archives can remain expensive.

Long-term archiving: the advantages of deep glacier archive-tiering

Finally, for data related to the company's assets, or single collection data (seismic measurements, cosmological data, etc.) that must be archived over the very long term, Amazon Glacier Deep Archive remains the reference service provider. The data is becoming less accessible but more secure at a storage cost that continues to decrease.

***

Archive-tiering puts the cost and protection of archived data under control, while ensuring the accessibility of data over time, as well as its integrity throughout the archiving cycle. It requires a classification of the data to be archived, which also strengthens the cyber-resilience of organizations.

Thanks to the evolutionary segmentation of archiving costs, archive-tiering supports with precision and pragmatism the digital transformation of organizations, and the integration of Big Data into their activities.

For more information: