Not a newer article, but there are some interesting stuff to read about cow fs. Not sure how far along it came to overcome some of the things pointed in the pdf study…
At the end, there was a question about RAID, and the answer I think it was incomplete.
I asked a similar question in Synology forum and got a very good answer from one of the users.
Let me share the topic: The user name is Telos:
If I understood correctly, Telos is saying that a RAID with 3 drives or more is much more reliable since it will have a parity drive to check which of the other drives has the correct data.
“If drive does not report error but returns bad data (bitrot) the scrub will see two conflicting bits of data and not know what to do. By default, the scrub presumes that the data is correct and that the parity is wrong and recompute parity. Practically speaking… this is a coin toss, whether the correct data is preserved, or the bitrotted data.”
Is that really the case? btrfs just believes the checksum and not the data is wrong and “repairs” the checksum, thus making the data owner believe that everything is repaired and good? That is worrying.
btrfs should instead report an unrecoverable error and let the data owner decide what to do. Most likely the data owner wants to replace the affected file with a backup file. But if btrfs just “fixes” the checksum the data owner is lost.
That is a good point…
I would like to see more people speaking about this to get a better picture.
But the raid part makes sense, because both drives would be saying different things, which one should be trusted ?
So, at least what I got from this is that 3 drives or more is the way to do it to get things much more reliable.
Correct me if I’m wrong about the statement below:
RAID 1 with three drives, 2 would win and correct the third drive.
RAID 5 or 6 would have the parity check.
If in a raid10 both drives have a copy of the data with separate checksums. If one drive has a checksum mismatch the other drive has not and thus has the correct data. If both drives have a checksum error the data can not be restored.
I didn’t take in account raid 10, to be honest, I completely forgot about it.
raid 10 you can loose two drives, one in each side of the array, that is good, and you also have speed.
For the error correction, based on what have been written in the previous comments in this topic, I suppose it would have the same reliability of a single RAID 1, or even worse since you now are splitting files between drives. Or am I wrong about this ? I thing my head bugged right now
I think the best option with 4 drives is raid 5, one disk can fail, and you don’t loose a lot of space, n-1 while raid 6 is n-2.
Raid 10 is also n-2, but without parity check, and you have more speed.
At the end of the day you want to have redundancy. That is RAID1/5/6. Without redundancy, with btrfs on a single drive (or zfs for that matter) the data can not be repaired. It can just tell you if the data is corrupt.