Long-Time Archiving using large pools of desktop-grade hard disk drives and is only online for 4 hours a day. By turning off the enclosures we save on power and it’s possible to have petabytes of data in a single Rack.
The petablock can be 5,7PB in a single 42U rack (720disk x 8TB). This is a list off main hardware which is required for a full redundant installation. It is possible to start with a smaller solution with a minium of 3 enclosures and 240 disks.
All the enclosures are powered off by default, only the server and switch are using power 24/7. When data retriving or archiving in this solution is required the enclosures are turned on 1 to 4 times a day depending on usage. When reading or writing to the head (server with LTA software) of the solution data is stored and processed untill the next online cycle. The software choose which part of the data is stored on which disk and then powers on the enclosure one by one. This means only one enclosure is on at one time and using power.
The disks in the petablock have a cycle of being turned on and off a couple of hours a day. This means we can use normal desktop grade disks and there is no need for enterprise disks. The NON-raid solution we using to get a very high redundancy means we can mix various brands, models, versions and even capacities.
When you for example want to migrate disks from 2TB Seagete to 4TB Hitachi you simply mark the first 10 disks for migration. The next cycle the petablock is online we write all data to other disks and give the user an alert to replace the disks. When the new 4TB disks are online we write back the data and you can migrate the next 10 disks. This make the data migration automaticly, save and fast for replacing or expanding of your offline storage.
The disks in the enclosures are checked for reliability during each transfer cycle. When a disk produces errors we move the data we can still read on that disk to another disk and mark the broken disk for replacement. When we disk is completly broken we mark this disk for replacement. When the disk is replace we build back the data from the other disk like in Raid. We know exactly the data that is stored on the disk and can prioritize which archive sets we first want to restore, this depents on the parity level.
When archiving your data all the properties from every file and folder is indexed and saved into a database. This database is kept online and makes it possible to search though your offline data. When finding a file or folder you can simply see which archive set this belngs to and do a request to get the data back online on the next cycle.
The software is able to collect data manually or automatically with an advanced scheduling manager. The gathering can be from a local or network(CIFS) data source, including from a remote (WAN) source. Gathering of data can be incremental, for example when you want to archive the output of a document management system. The data can be collected on a daily, weekly, monthly or quarterly base.