Esto eliminará la página "CDC-issues-reported-2025"
. Por favor, asegúrate de que es lo que quieres.
There were recent news about attacks on content-defined chunking (of borgbackup, restic, tarsnap and likely also all other backup tools that use CDC), which can extract chunking parameters (including the randomly chosen chunker secret):
The purpose of the chunker secret is to counteract fingerprinting attacks via the sizes of the produced chunks, so that a repo-side attacker can not easily determine which known files you have by looking at the chunk sizes in the repository. So, an attacker potentially being able to determine the chunker secret as described in the paper is bad news.
An attacker could e.g. determine which mp3 files of a well-known mp3 collection you have backed up to the repo.
Don’t panic, it’s likely not as bad as it sounds at first:
There might be no easy solution to all size-related fingerprinting problems:
Any ideas about how to improve the situation are welcome, but please consider that any change to how chunking works / works by default would impact the deduplication on existing repos, potentially doubling the amount of repository storage needed (and also having a very slow first-after-change backup).
Switching on obfuscation does not have these negative effects, because the chunk id is computed from the plaintext (before obfuscation is added). It’s not without overhead though and the desired amount of obfuscation depends a lot on the situation, so it’s the users’ choice how much obfuscation overhead they want to add. If you switch on obfuscation, please note that it will only affect NEW chunks written to the repo.
Changing the buzhash chunker in borg is not easy:
OTOH, changing the “fixed” chunker is easier:
Considering these are stable release series, there won’t be big changes in there. The risk of breaking something seems higher than the fingerprinting risk. Small ideas with little risk and little side effects which improve the security are welcome though.
Guess users either can live with the fingerprinting risk or they use the existing “obfuscate” pseudo compressor (some maybe already do, this exists since 1.2.0).
The first paper suggests that using compression improves the security. borg by default uses lz4 (and this is usually faster than none). Guess some non-default algorithm with a non-default level might even add a bit more security.
borg2 will be a breaking release and data will need to be transferred to a new repo anyway, so that could be a good time to implement changes in chunking while avoiding the above mentioned “doubling repo space” issue. It would make “borg transfer” more complex and slower though.
But the question is whether we need changes there or whether it is better to just use the obfuscation.
The first paper seems to suggest using a 64bit buzhash instead of the current 32bit implementation. But that only fixes the CDC issue, but not the fingerprinting of sets of small files issue.
The second paper suggests encrypting the output of buzhash function with AES. Same here, does not fix fingerprinting of sets of small files.
The second paper also suggests padding of the chunks. Guess this has a similar effect as the already existing “obfuscate” functionality.
Esto eliminará la página "CDC-issues-reported-2025"
. Por favor, asegúrate de que es lo que quieres.