
Once in a while, the topic of DVD backups arise in #ab. Earlier this week natsuneko [or nekosasu] mentioned backing up some 300GB onto DVD. All is good in the world, but DVD’s do get scratches and depending on the media, may start to lose data blocks after time. Is there anything that can be done to aid in preservation?
Yes, it’s called PAR2 (Parity Archiving).
Brief Overview
PAR2 is an algorithmic computation based on a given input of data, usually destined for the Usenet. It’s goal is to provide data verification and the ability to reconstruct lost pieces of the dataset. Finding decent suggestions about PAR2 parameters for anything other than Usenet is rather difficult, so I will focus on PAR2 and DVD backups.
I will be focusing on the command line tool, but for windows QuickPar is decent, I’ve used it, and it does allow for specific settings which mimic the cli version. I will not focus on the actual recovery because it’s rather straightforward.
Settings
There are generally two settings I recommend to use in conjunction, redundancy and block size. Redundancy is a percentage which represent the size of the recovery set compared with the size of the original data. This setting ensures the recoverable size is at most the level of redundancy, it is the maximum amount of data which can go bad in the dataset and still possibly be reconstructible.
It is up to the user to determine how much recovery data they wish to retain. I tend to stick between 9% and 12%, but anything over 30% is probably not going to be needed, and it may be more efficient to just backup on two DVD discs.
Perhaps the most crucial setting in generating a PAR2 recovery set for DVD backups, is the block size. This is important. DVD’s use a native block size of 2352 bytes, which means if a number of blocks on a DVD are unreadable, the number of missing bytes will be a multiple of 2352.
With this in mind, it is important to select a block size as a multiple of 2352, but what makes a good block size?
Block Size
There is a trade-off in choosing block size, mainly in speed of computation versus the deviation of recoverability. By deviation of recoverability, I mean the number of non-sequential locations in the data which may be unrecoverable.
For instance, assume a 1000MB file is missing the last 100MB of data. Regardless of the block size, if PAR2 was generated with 10% or greater redundancy there will be no problem in recovery; the data is sequential.
Now, assume the same file is missing 4,450 23KB segments (approx. 100MB) spread randomly throughout the file. If a block size of 23,520 bytes was used to generate the PAR2, recovery will be guaranteed, but if a larger block size was used, this may not be the case.
With the previous conditions, assume that the PAR2 was generated using a 459KB block size (2352 x 200), which broke the original 1000MB file into roughly 2230 blocks. A 10% redundancy would mean that a maximum of 223 blocks would be recoverable, but there is no way to guarantee that those 4,450 missing 23kb segments do not span more than 223 459KB blocks.
It’s an extreme case, but hopefully the value of having smaller block size is seen. I do not recommend a block size of 23,520 bytes, because it would take days to compute even on a modern system, but I do recommend anything between 150,528 (for very high prio backups) to 940800 (for lower prio backups).
Real Example
A previous night I was merrily viewing Gintama, when the disc froze (old dvd+rw get many scratches over time). The quick method of action:
- dd if=/media/cdrom/Gintama41.avi of=/media/disk/Gintama41.avi conv=noerror,sync
- I had an error with the disc, so I copied what I could with dd.
- par2 create -s588000 -r10 Gin41 Gintama41.avi
- On the server, I generated a par2 for the episode.
- scp server:~/Gin41.par2 /media/disk/Gin41.par2
- I copied the par2 verifying file to my location.
- par2 verify Gin41.par2 Gintama41.avi
- Verified the file and found it needed 48 recovery blocks
- scp server:~/Gin41.vol+015* .
- scp server:~/Gin41.vol+031* .
- I copied over two of the recovery volumes which held 16 and 32 block = 48 block (it just worked that way).
- par2 repair Gin41.par2
- I repaired the file, and fin.
This little process took under 10 minutes, but I’ve tested it in other situations, such a removing an entire episode from a group and seeing if it could rebuild the missing episode (it did). Par2 is quite a viable solution, so be sure to make friends with it! ^_^