The accessibility and usage of unstructured data for regulatory, analytic and decision-making purposes is driving the need to search and scrutinize this data. The volume of unstructured data is set to grow from 33 zettabytes in 2018 to 175 zettabytes or 175 billion terabytes by 2025 (source IDC). What was once cold data stored on tape will be used more and more for analytics, machine learning and business intelligence.
Legacy archiving methods typically fall short when compared to cloud computing and AI applications where extracting value from data is built into storage processing. Traditional data management is moving towards automation and Delivery as-a-Service. In order to reduce costs and relieve IT management overheads, organizations need powerful scale-out unstructured data management solutions for the long-term which ensure data is readily available.
Traditional backup technologies such as Network Data Management Protocol (NDMP) max out at approximately 100TB of data or 100 million files. With too much data and too many files, NDMP does not guarantee disaster recovery but rather forces many organizations to rely on data replication resulting in little or no backup history. Cyberattacks also propagate to replicated data sets. Limitations in terms of IO parallelization or filesystem scanning capabilities mean that NDMP backup technology and legacy backup software have truly served their time.
These traditional approaches are giving rise to many challenges that can be solved by new technologies for storage and data synchronization for petabyte-scale, scale-out NAS and parallel storages. In other words, there will be solutions to continue to protect data even when volumes exceed the scope of standard protection.
the 6 challenges facing unstructured data protection:
1. Long waiting period for detecting new and changed data
Scanning a scale-out NAS for changes can result in a long and painful wait. Daily changes on large volumes of hundreds of millions or even billions of small files and parsing complete storage filesystems require quick and efficient technology. Atempo’s Miria FastScan Data Discovery quickly collects the list of new, changed and deleted files on storage without the long and painful parsing of a complete storage filesystem. More companies need ways to avoid performing full file tree scans each time a backup, archive or synchronization task is performed.
2. Inability to protect all data at risk
Many organization’s daily backup windows simply cannot handle even incremental backups because too many files are added or changed on a daily basis. More often than not, they are faced with consequential decisions, such as having to choose which volumes of data NOT to protect or protect with unsatisfactory retentions and frequencies. Advanced technical applications such as Miria’s Incremental Forever Backup, have optimized incremental processes to make sure that all new and modified data is collected, after the first backup. As a result, the backup window is minimized and provides complete data restore capabilities.
3. Complex data management
Traditional approaches regularly rely on multiple data protection solutions to try to work around the limitations of each individual solution. Companies frequently possess four or five data protection solutions for different requirements raising costs of separate hardware/software and storage resources. Miria offers a scalable and storage agnostic solution on a powerful data management platform that horizontally scales to hundreds of nodes controlled on a single system.
4. Long service level agreements and recovery time objective
Massive data volumes, together with the complexity of the various backup processes, make data protection, retrieval and business continuity a long and complex endeavour. SLAs are difficult to attain over time and RTOs are not always met. Miria’s advanced capabilities offer backup and recovery for storages and shared filesystems from 100 TB to multi-petabytes, while automatically controlling the volume of storage used through automatic consolidation of the number of versions.
5. Lack of storage independence
Legacy backup limitations often force organizations to implement expensive replications, which bond their data to one storage provider and “attach” their data to one brand of storage. Replications, as we know, are risky backup replacements because replicated data is also easily infected in the event of a cyberattack. Miria will eliminate dependency on a file system and hardware provider. The platform offers complete storage independence by supporting a broad range of storage options – disk, tape, NAS, object, cloud storage and more.
6. Lack of data mobility
Organizations that attempt to leverage non-scalable backup software to perform business sensitive data migrations are unable to move data quickly and securely between storages. This poses a huge challenge in terms of migrating data between old and new storage.
Atempo’s open format solutions move, archive and retrieve data across various operating and filesystems. Miria enables organizations to transparently retrieve and move files from one system to another. File system attributes and data formats are respected whatever your source, whatever your destination storages.
At a minimum, unstructured data backup and archiving technology must handle massive data volumes and meet compliance requirements in a cost-effective manner. The most immediate and apparent benefits of handling data with Miria are:
- Saving space on the primary NAS
- The ability to retain data for purposes such as regulatory compliance
- Cost-effective solutions that enable the use of huge data sets for enhanced analytics or intelligent workflows
Miria automates and simplifies the process of identifying and inspecting unstructured file sets. Miria not only protects and archives, but also copies, synchronizes and migrates unstructured data between storage platforms.
Backup and Archiving can provide a final rampart against cyberattacks. Having multiple data copies on different storage targets increases your chances of restoring valid and valuable data in the event of loss or attack.