Selaa lähdekoodia

Merge pull request #8389 from ThomasWaldmann/files-cache-from-archive

files cache improvements
TW 8 kuukautta sitten
vanhempi
sitoutus
b862f2b95f

+ 12 - 33
docs/faq.rst

@@ -837,50 +837,29 @@ already used.
 By default, ctime (change time) is used for the timestamps to have a rather
 safe change detection (see also the --files-cache option).
 
-Furthermore, pathnames recorded in files cache are always absolute, even if you
-specify source directories with relative pathname. If relative pathnames are
-stable, but absolute are not (for example if you mount a filesystem without
-stable mount points for each backup or if you are running the backup from a
-filesystem snapshot whose name is not stable), borg will assume that files are
-different and will report them as 'added', even though no new chunks will be
-actually recorded for them. To avoid this, you could bind mount your source
-directory in a directory with the stable path.
+Furthermore, pathnames used as key into the files cache are **as archived**,
+so make sure these are always the same (see ``borg list``).
 
 .. _always_chunking:
 
 It always chunks all my files, even unchanged ones!
 ---------------------------------------------------
 
-Borg maintains a files cache where it remembers the timestamp, size and
+Borg maintains a files cache where it remembers the timestamps, size and
 inode of files. When Borg does a new backup and starts processing a
 file, it first looks whether the file has changed (compared to the values
 stored in the files cache). If the values are the same, the file is assumed
 unchanged and thus its contents won't get chunked (again).
 
-Borg can't keep an infinite history of files of course, thus entries
-in the files cache have a "maximum time to live" which is set via the
-environment variable BORG_FILES_CACHE_TTL (and defaults to 20).
-Every time you do a backup (on the same machine, using the same user), the
-cache entries' ttl values of files that were not "seen" are incremented by 1
-and if they reach BORG_FILES_CACHE_TTL, the entry is removed from the cache.
-
-So, for example, if you do daily backups of 26 different data sets A, B,
-C, ..., Z on one machine (using the default TTL), the files from A will be
-already forgotten when you repeat the same backups on the next day and it
-will be slow because it would chunk all the files each time. If you set
-BORG_FILES_CACHE_TTL to at least 26 (or maybe even a small multiple of that),
-it would be much faster.
-
-Besides using a higher BORG_FILES_CACHE_TTL (which also increases memory usage),
-there is also BORG_FILES_CACHE_SUFFIX which can be used to have separate (smaller)
-files caches for each backup set instead of the default one (big) unified files cache.
-
-Another possible reason is that files don't always have the same path, for
-example if you mount a filesystem without stable mount points for each backup
-or if you are running the backup from a filesystem snapshot whose name is not
-stable. If the directory where you mount a filesystem is different every time,
-Borg assumes they are different files. This is true even if you back up these
-files with relative pathnames - borg uses full pathnames in files cache regardless.
+The files cache is stored separately (using a different filename suffix) per
+archive series, thus using always the same name for the archive is strongly
+recommended. The "rebuild files cache from previous archive in repo" feature
+also depends on that.
+Alternatively, there is also BORG_FILES_CACHE_SUFFIX which can be used to
+manually set a custom suffix (if you can't just use the same archive name).
+
+Another possible reason is that files don't always have the same path -
+borg uses the paths as seen in the archive when using ``borg list``.
 
 It is possible for some filesystems, such as ``mergerfs`` or network filesystems,
 to return inconsistent inode numbers across runs, causing borg to consider them changed.

+ 8 - 6
docs/internals/data-structures.rst

@@ -474,18 +474,20 @@ guess what files you have based on a specific set of chunk sizes).
 The cache
 ---------
 
-The **files cache** is stored in ``cache/files`` and is used at backup time to
-quickly determine whether a given file is unchanged and we have all its chunks.
+The **files cache** is stored in ``cache/files.<SUFFIX>`` and is used at backup
+time to quickly determine whether a given file is unchanged and we have all its
+chunks.
 
 In memory, the files cache is a key -> value mapping (a Python *dict*) and contains:
 
-* key: id_hash of the encoded, absolute file path
+* key: id_hash of the encoded path (same path as seen in archive)
 * value:
 
+  - age (0 [newest], ..., BORG_FILES_CACHE_TTL - 1)
   - file inode number
   - file size
-  - file ctime_ns (or mtime_ns)
-  - age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1)
+  - file ctime_ns
+  - file mtime_ns
   - list of chunk (id, size) tuples representing the file's contents
 
 To determine whether a file has not changed, cached values are looked up via
@@ -514,7 +516,7 @@ be told to ignore the inode number in the check via --files-cache.
 The age value is used for cache management. If a file is "seen" in a backup
 run, its age is reset to 0, otherwise its age is incremented by one.
 If a file was not seen in BORG_FILES_CACHE_TTL backups, its cache entry is
-removed. See also: :ref:`always_chunking` and :ref:`a_status_oddity`
+removed.
 
 The files cache is a python dictionary, storing python objects, which
 generates a lot of overhead.

+ 1 - 11
docs/usage/general/environment.rst.inc

@@ -66,8 +66,7 @@ General:
         cache entries for backup sources other than the current sources.
     BORG_FILES_CACHE_TTL
         When set to a numeric value, this determines the maximum "time to live" for the files cache
-        entries (default: 20). The files cache is used to determine quickly whether a file is unchanged.
-        The FAQ explains this more detailed in: :ref:`always_chunking`
+        entries (default: 2). The files cache is used to determine quickly whether a file is unchanged.
     BORG_USE_CHUNKS_ARCHIVE
         When set to no (default: yes), the ``chunks.archive.d`` folder will not be used. This reduces
         disk space usage but slows down cache resyncs.
@@ -85,15 +84,6 @@ General:
         - ``pyfuse3``: only try to load pyfuse3
         - ``llfuse``: only try to load llfuse
         - ``none``: do not try to load an implementation
-    BORG_CACHE_IMPL
-        Choose the implementation for the clientside cache, choose one of:
-
-        - ``adhoc``: builds a non-persistent chunks cache by querying the repo. Chunks cache contents
-          are somewhat sloppy for already existing chunks, concerning their refcount ("infinite") and
-          size (0). No files cache (slow, will chunk all input files). DEPRECATED.
-        - ``adhocwithfiles``: Like ``adhoc``, but with a persistent files cache. Default implementation.
-        - ``cli``: Determine the cache implementation from cli options. Without special options, will
-          usually end up with the ``local`` implementation.
     BORG_SELFTEST
         This can be used to influence borg's builtin self-tests. The default is to execute the tests
         at the beginning of each borg command invocation.

+ 18 - 5
src/borg/archive.py

@@ -1345,7 +1345,7 @@ class FilesystemObjectProcessors:
                         item.chunks.append(chunk_entry)
                 else:  # normal case, no "2nd+" hardlink
                     if not is_special_file:
-                        hashed_path = safe_encode(os.path.join(self.cwd, path))
+                        hashed_path = safe_encode(item.path)  # path as in archive item!
                         started_hashing = time.monotonic()
                         path_hash = self.key.id_hash(hashed_path)
                         self.stats.hashing_time += time.monotonic() - started_hashing
@@ -1376,6 +1376,7 @@ class FilesystemObjectProcessors:
                     # Only chunkify the file if needed
                     changed_while_backup = False
                     if "chunks" not in item:
+                        start_reading = time.time_ns()
                         with backup_io("read"):
                             self.process_file_chunks(
                                 item,
@@ -1385,13 +1386,25 @@ class FilesystemObjectProcessors:
                                 backup_io_iter(self.chunker.chunkify(None, fd)),
                             )
                             self.stats.chunking_time = self.chunker.chunking_time
+                        end_reading = time.time_ns()
                         if not is_win32:  # TODO for win32
                             with backup_io("fstat2"):
                                 st2 = os.fstat(fd)
-                            # special files:
-                            # - fifos change naturally, because they are fed from the other side. no problem.
-                            # - blk/chr devices don't change ctime anyway.
-                            changed_while_backup = not is_special_file and st.st_ctime_ns != st2.st_ctime_ns
+                            if is_special_file:
+                                # special files:
+                                # - fifos change naturally, because they are fed from the other side. no problem.
+                                # - blk/chr devices don't change ctime anyway.
+                                pass
+                            elif st.st_ctime_ns != st2.st_ctime_ns:
+                                # ctime was changed, this is either a metadata or a data change.
+                                changed_while_backup = True
+                            elif start_reading - TIME_DIFFERS1_NS < st2.st_ctime_ns < end_reading + TIME_DIFFERS1_NS:
+                                # this is to treat a very special race condition, see #3536.
+                                # - file was changed right before st.ctime was determined.
+                                # - then, shortly afterwards, but already while we read the file, the
+                                #   file was changed again, but st2.ctime is the same due to ctime granularity.
+                                # when comparing file ctime to local clock, widen interval by TIME_DIFFERS1_NS.
+                                changed_while_backup = True
                         if changed_while_backup:
                             # regular file changed while we backed it up, might be inconsistent/corrupt!
                             if last_try:

+ 1 - 0
src/borg/archiver/__init__.py

@@ -125,6 +125,7 @@ class Archiver(
     def __init__(self, lock_wait=None, prog=None):
         self.lock_wait = lock_wait
         self.prog = prog
+        self.start_backup = None
 
     def print_warning(self, msg, *args, **kw):
         warning_code = kw.get("wc", EXIT_WARNING)  # note: wc=None can be used to not influence exit code

+ 3 - 4
src/borg/archiver/_common.py

@@ -161,14 +161,14 @@ def with_repository(
                     if "compression" in args:
                         manifest_.repo_objs.compressor = args.compression.compressor
                     if secure:
-                        assert_secure(repository, manifest_, self.lock_wait)
+                        assert_secure(repository, manifest_)
                 if cache:
                     with Cache(
                         repository,
                         manifest_,
                         progress=getattr(args, "progress", False),
-                        lock_wait=self.lock_wait,
                         cache_mode=getattr(args, "files_cache_mode", FILES_CACHE_MODE_DISABLED),
+                        start_backup=getattr(self, "start_backup", None),
                         iec=getattr(args, "iec", False),
                     ) as cache_:
                         return method(self, args, repository=repository, cache=cache_, **kwargs)
@@ -230,7 +230,7 @@ def with_other_repository(manifest=False, cache=False, compatibility=None):
                     manifest_ = Manifest.load(
                         repository, compatibility, ro_cls=RepoObj if repository.version > 1 else RepoObj1
                     )
-                    assert_secure(repository, manifest_, self.lock_wait)
+                    assert_secure(repository, manifest_)
                     if manifest:
                         kwargs["other_manifest"] = manifest_
                 if cache:
@@ -238,7 +238,6 @@ def with_other_repository(manifest=False, cache=False, compatibility=None):
                         repository,
                         manifest_,
                         progress=False,
-                        lock_wait=self.lock_wait,
                         cache_mode=getattr(args, "files_cache_mode", FILES_CACHE_MODE_DISABLED),
                         iec=getattr(args, "iec", False),
                     ) as cache_:

+ 2 - 8
src/borg/archiver/create_cmd.py

@@ -214,6 +214,7 @@ class CreateMixIn:
         self.noxattrs = args.noxattrs
         self.exclude_nodump = args.exclude_nodump
         dry_run = args.dry_run
+        self.start_backup = time.time_ns()
         t0 = archive_ts_now()
         t0_monotonic = time.monotonic()
         logger.info('Creating archive at "%s"' % args.location.processed)
@@ -222,10 +223,9 @@ class CreateMixIn:
                 repository,
                 manifest,
                 progress=args.progress,
-                lock_wait=self.lock_wait,
-                prefer_adhoc_cache=args.prefer_adhoc_cache,
                 cache_mode=args.files_cache_mode,
                 iec=args.iec,
+                archive_name=args.name,
             ) as cache:
                 archive = Archive(
                     manifest,
@@ -787,12 +787,6 @@ class CreateMixIn:
             help="only display items with the given status characters (see description)",
         )
         subparser.add_argument("--json", action="store_true", help="output stats as JSON. Implies ``--stats``.")
-        subparser.add_argument(
-            "--prefer-adhoc-cache",
-            dest="prefer_adhoc_cache",
-            action="store_true",
-            help="experimental: prefer AdHocCache (w/o files cache) over AdHocWithFilesCache (with files cache).",
-        )
         subparser.add_argument(
             "--stdin-name",
             metavar="NAME",

+ 1 - 1
src/borg/archiver/list_cmd.py

@@ -37,7 +37,7 @@ class ListMixIn:
 
         # Only load the cache if it will be used
         if ItemFormatter.format_needs_cache(format):
-            with Cache(repository, manifest, lock_wait=self.lock_wait) as cache:
+            with Cache(repository, manifest) as cache:
                 _list_inner(cache)
         else:
             _list_inner(cache=None)

+ 1 - 1
src/borg/archiver/prune_cmd.py

@@ -111,7 +111,7 @@ class PruneMixIn:
                 keep += prune_split(archives, rule, num, kept_because)
 
         to_delete = set(archives) - set(keep)
-        with Cache(repository, manifest, lock_wait=self.lock_wait, iec=args.iec) as cache:
+        with Cache(repository, manifest, iec=args.iec) as cache:
             list_logger = logging.getLogger("borg.output.list")
             # set up counters for the progress display
             to_delete_len = len(to_delete)

+ 174 - 165
src/borg/cache.py

@@ -3,6 +3,7 @@ import os
 import shutil
 import stat
 from collections import namedtuple
+from datetime import datetime, timezone
 from time import perf_counter
 
 from .logger import create_logger
@@ -11,12 +12,13 @@ logger = create_logger()
 
 files_cache_logger = create_logger("borg.debug.files_cache")
 
-from .constants import CACHE_README, FILES_CACHE_MODE_DISABLED, ROBJ_FILE_STREAM
+from .constants import CACHE_README, FILES_CACHE_MODE_DISABLED, ROBJ_FILE_STREAM, TIME_DIFFERS2_NS
+from .checksums import xxh64
 from .hashindex import ChunkIndex, ChunkIndexEntry
 from .helpers import Error
 from .helpers import get_cache_dir, get_security_dir
-from .helpers import hex_to_bin, parse_stringified_list
-from .helpers import format_file_size
+from .helpers import hex_to_bin, bin_to_hex, parse_stringified_list
+from .helpers import format_file_size, safe_encode
 from .helpers import safe_ns
 from .helpers import yes
 from .helpers import ProgressIndicatorMessage
@@ -25,14 +27,13 @@ from .helpers.msgpack import int_to_timestamp, timestamp_to_int
 from .item import ChunkListEntry
 from .crypto.key import PlaintextKey
 from .crypto.file_integrity import IntegrityCheckedFile, FileIntegrityError
-from .fslocking import Lock
 from .manifest import Manifest
 from .platform import SaveFile
 from .remote import RemoteRepository
 from .repository import LIST_SCAN_LIMIT, Repository
 
-# note: cmtime might be either a ctime or a mtime timestamp, chunks is a list of ChunkListEntry
-FileCacheEntry = namedtuple("FileCacheEntry", "age inode size cmtime chunks")
+# chunks is a list of ChunkListEntry
+FileCacheEntry = namedtuple("FileCacheEntry", "age inode size ctime mtime chunks")
 
 
 class SecurityManager:
@@ -154,7 +155,7 @@ class SecurityManager:
         if self.known() and not self.key_matches(key):
             raise Cache.EncryptionMethodMismatch()
 
-    def assert_secure(self, manifest, key, *, warn_if_unencrypted=True, lock_wait=None):
+    def assert_secure(self, manifest, key, *, warn_if_unencrypted=True):
         # warn_if_unencrypted=False is only used for initializing a new repository.
         # Thus, avoiding asking about a repository that's currently initializing.
         self.assert_access_unknown(warn_if_unencrypted, manifest, key)
@@ -194,9 +195,9 @@ class SecurityManager:
                 raise Cache.CacheInitAbortedError()
 
 
-def assert_secure(repository, manifest, lock_wait):
+def assert_secure(repository, manifest):
     sm = SecurityManager(repository)
-    sm.assert_secure(manifest, manifest.key, lock_wait=lock_wait)
+    sm.assert_secure(manifest, manifest.key)
 
 
 def cache_dir(repository, path=None):
@@ -204,13 +205,11 @@ def cache_dir(repository, path=None):
 
 
 class CacheConfig:
-    def __init__(self, repository, path=None, lock_wait=None):
+    def __init__(self, repository, path=None):
         self.repository = repository
         self.path = cache_dir(repository, path)
         logger.debug("Using %s as cache", self.path)
         self.config_path = os.path.join(self.path, "config")
-        self.lock = None
-        self.lock_wait = lock_wait
 
     def __enter__(self):
         self.open()
@@ -235,7 +234,6 @@ class CacheConfig:
             config.write(fd)
 
     def open(self):
-        self.lock = Lock(os.path.join(self.path, "lock"), exclusive=True, timeout=self.lock_wait).acquire()
         self.load()
 
     def load(self):
@@ -278,9 +276,7 @@ class CacheConfig:
             self._config.write(fd)
 
     def close(self):
-        if self.lock is not None:
-            self.lock.release()
-            self.lock = None
+        pass
 
     def _check_upgrade(self, config_path):
         try:
@@ -296,10 +292,6 @@ class CacheConfig:
             raise Exception("%s does not look like a Borg cache." % config_path) from None
 
 
-def get_cache_impl():
-    return os.environ.get("BORG_CACHE_IMPL", "adhocwithfiles")
-
-
 class Cache:
     """Client Side cache"""
 
@@ -330,8 +322,7 @@ class Cache:
 
     @staticmethod
     def break_lock(repository, path=None):
-        path = cache_dir(repository, path)
-        Lock(os.path.join(path, "lock"), exclusive=True).break_lock()
+        pass
 
     @staticmethod
     def destroy(repository, path=None):
@@ -350,71 +341,120 @@ class Cache:
         sync=True,
         warn_if_unencrypted=True,
         progress=False,
-        lock_wait=None,
-        prefer_adhoc_cache=False,
         cache_mode=FILES_CACHE_MODE_DISABLED,
         iec=False,
+        archive_name=None,
+        start_backup=None,
     ):
-        def adhocwithfiles():
-            return AdHocWithFilesCache(
-                manifest=manifest,
-                path=path,
-                warn_if_unencrypted=warn_if_unencrypted,
-                progress=progress,
-                iec=iec,
-                lock_wait=lock_wait,
-                cache_mode=cache_mode,
-            )
-
-        def adhoc():
-            return AdHocCache(manifest=manifest, lock_wait=lock_wait, iec=iec)
-
-        impl = get_cache_impl()
-        if impl != "cli":
-            methods = dict(adhocwithfiles=adhocwithfiles, adhoc=adhoc)
-            try:
-                method = methods[impl]
-            except KeyError:
-                raise RuntimeError("Unknown BORG_CACHE_IMPL value: %s" % impl)
-            return method()
-
-        return adhoc() if prefer_adhoc_cache else adhocwithfiles()
+        return AdHocWithFilesCache(
+            manifest=manifest,
+            path=path,
+            warn_if_unencrypted=warn_if_unencrypted,
+            progress=progress,
+            iec=iec,
+            cache_mode=cache_mode,
+            archive_name=archive_name,
+            start_backup=start_backup,
+        )
 
 
 class FilesCacheMixin:
     """
-    Massively accelerate processing of unchanged files by caching their chunks list.
-    With that, we can avoid having to read and chunk them to get their chunks list.
+    Massively accelerate processing of unchanged files.
+    We read the "files cache" (either from cache directory or from previous archive
+    in repo) that has metadata for all "already stored" files, like size, ctime/mtime,
+    inode number and chunks id/size list.
+    When finding a file on disk, we use the metadata to determine if the file is unchanged.
+    If so, we use the cached chunks list and skip reading/chunking the file contents.
     """
 
     FILES_CACHE_NAME = "files"
 
-    def __init__(self, cache_mode):
+    def __init__(self, cache_mode, archive_name=None, start_backup=None):
+        self.archive_name = archive_name  # ideally a SERIES name
+        assert not ("c" in cache_mode and "m" in cache_mode)
+        assert "d" in cache_mode or "c" in cache_mode or "m" in cache_mode
         self.cache_mode = cache_mode
         self._files = None
-        self._newest_cmtime = None
+        self._newest_cmtime = 0
+        self._newest_path_hashes = set()
+        self.start_backup = start_backup
 
     @property
     def files(self):
         if self._files is None:
-            self._files = self._read_files_cache()
+            self._files = self._read_files_cache()  # try loading from cache dir
+        if self._files is None:
+            self._files = self._build_files_cache()  # try loading from repository
+        if self._files is None:
+            self._files = {}  # start from scratch
         return self._files
 
+    def _build_files_cache(self):
+        """rebuild the files cache by reading previous archive from repository"""
+        if "d" in self.cache_mode:  # d(isabled)
+            return
+
+        if not self.archive_name:
+            return
+
+        from .archive import Archive
+
+        # get the latest archive with the IDENTICAL name, supporting archive series:
+        archives = self.manifest.archives.list(match=self.archive_name, sort_by=["ts"], last=1)
+        if not archives:
+            # nothing found
+            return
+        prev_archive = archives[0]
+
+        files = {}
+        logger.debug(
+            f"Building files cache from {prev_archive.name} {prev_archive.ts} {bin_to_hex(prev_archive.id)} ..."
+        )
+        files_cache_logger.debug("FILES-CACHE-BUILD: starting...")
+        archive = Archive(self.manifest, prev_archive.id)
+        for item in archive.iter_items(preload=False):
+            # only put regular files' infos into the files cache:
+            if stat.S_ISREG(item.mode):
+                path_hash = self.key.id_hash(safe_encode(item.path))
+                # keep track of the key(s) for the most recent timestamp(s):
+                ctime_ns = item.ctime
+                if ctime_ns > self._newest_cmtime:
+                    self._newest_cmtime = ctime_ns
+                    self._newest_path_hashes = {path_hash}
+                elif ctime_ns == self._newest_cmtime:
+                    self._newest_path_hashes.add(path_hash)
+                mtime_ns = item.mtime
+                if mtime_ns > self._newest_cmtime:
+                    self._newest_cmtime = mtime_ns
+                    self._newest_path_hashes = {path_hash}
+                elif mtime_ns == self._newest_cmtime:
+                    self._newest_path_hashes.add(path_hash)
+                # add the file to the in-memory files cache
+                entry = FileCacheEntry(
+                    item.get("inode", 0), item.size, int_to_timestamp(ctime_ns), int_to_timestamp(mtime_ns), item.chunks
+                )
+                files[path_hash] = msgpack.packb(entry)  # takes about 240 Bytes per file
+        # deal with special snapshot / timestamp granularity case, see FAQ:
+        for path_hash in self._newest_path_hashes:
+            del files[path_hash]
+        files_cache_logger.debug("FILES-CACHE-BUILD: finished, %d entries loaded.", len(files))
+        return files
+
     def files_cache_name(self):
         suffix = os.environ.get("BORG_FILES_CACHE_SUFFIX", "")
-        return self.FILES_CACHE_NAME + "." + suffix if suffix else self.FILES_CACHE_NAME
-
-    def discover_files_cache_name(self, path):
-        return [
-            fn for fn in os.listdir(path) if fn == self.FILES_CACHE_NAME or fn.startswith(self.FILES_CACHE_NAME + ".")
-        ][0]
+        # when using archive series, we automatically make up a separate cache file per series.
+        # when not, the user may manually do that by using the env var.
+        if not suffix:
+            # avoid issues with too complex or long archive_name by hashing it:
+            suffix = bin_to_hex(xxh64(self.archive_name.encode()))
+        return self.FILES_CACHE_NAME + "." + suffix
 
-    def _create_empty_files_cache(self, path):
-        with IntegrityCheckedFile(path=os.path.join(path, self.files_cache_name()), write=True) as fd:
-            pass  # empty file
-        return fd.integrity_data
+    def discover_files_cache_names(self, path):
+        return [fn for fn in os.listdir(path) if fn.startswith(self.FILES_CACHE_NAME + ".")]
 
     def _read_files_cache(self):
+        """read files cache from cache directory"""
         if "d" in self.cache_mode:  # d(isabled)
             return
 
@@ -447,39 +487,50 @@ class FilesCacheMixin:
         except FileIntegrityError as fie:
             msg = "The files cache is corrupted. [%s]" % str(fie)
         if msg is not None:
-            logger.warning(msg)
-            logger.warning("Continuing without files cache - expect lower performance.")
-            files = {}
-        files_cache_logger.debug("FILES-CACHE-LOAD: finished, %d entries loaded.", len(files))
+            logger.debug(msg)
+            files = None
+        files_cache_logger.debug("FILES-CACHE-LOAD: finished, %d entries loaded.", len(files or {}))
         return files
 
     def _write_files_cache(self, files):
-        if self._newest_cmtime is None:
-            # was never set because no files were modified/added
-            self._newest_cmtime = 2**63 - 1  # nanoseconds, good until y2262
-        ttl = int(os.environ.get("BORG_FILES_CACHE_TTL", 20))
+        """write files cache to cache directory"""
+        max_time_ns = 2**63 - 1  # nanoseconds, good until y2262
+        # _self._newest_cmtime might be None if it was never set because no files were modified/added.
+        newest_cmtime = self._newest_cmtime if self._newest_cmtime is not None else max_time_ns
+        start_backup_time = self.start_backup - TIME_DIFFERS2_NS if self.start_backup is not None else max_time_ns
+        # we don't want to persist files cache entries of potentially problematic files:
+        discard_after = min(newest_cmtime, start_backup_time)
+        ttl = int(os.environ.get("BORG_FILES_CACHE_TTL", 2))
         files_cache_logger.debug("FILES-CACHE-SAVE: starting...")
         # TODO: use something like SaveFile here, but that didn't work due to SyncFile missing .seek().
         with IntegrityCheckedFile(path=os.path.join(self.path, self.files_cache_name()), write=True) as fd:
-            entry_count = 0
+            entries = 0
+            age_discarded = 0
+            race_discarded = 0
             for path_hash, item in files.items():
-                # Only keep files seen in this backup that are older than newest cmtime seen in this backup -
-                # this is to avoid issues with filesystem snapshots and cmtime granularity.
-                # Also keep files from older backups that have not reached BORG_FILES_CACHE_TTL yet.
                 entry = FileCacheEntry(*msgpack.unpackb(item))
-                if (
-                    entry.age == 0
-                    and timestamp_to_int(entry.cmtime) < self._newest_cmtime
-                    or entry.age > 0
-                    and entry.age < ttl
-                ):
+                if entry.age == 0:  # current entries
+                    if max(timestamp_to_int(entry.ctime), timestamp_to_int(entry.mtime)) < discard_after:
+                        # Only keep files seen in this backup that old enough not to suffer race conditions relating
+                        # to filesystem snapshots and ctime/mtime granularity or being modified while we read them.
+                        keep = True
+                    else:
+                        keep = False
+                        race_discarded += 1
+                else:  # old entries
+                    if entry.age < ttl:
+                        # Also keep files from older backups that have not reached BORG_FILES_CACHE_TTL yet.
+                        keep = True
+                    else:
+                        keep = False
+                        age_discarded += 1
+                if keep:
                     msgpack.pack((path_hash, entry), fd)
-                    entry_count += 1
-        files_cache_logger.debug("FILES-CACHE-KILL: removed all old entries with age >= TTL [%d]", ttl)
-        files_cache_logger.debug(
-            "FILES-CACHE-KILL: removed all current entries with newest cmtime %d", self._newest_cmtime
-        )
-        files_cache_logger.debug("FILES-CACHE-SAVE: finished, %d remaining entries saved.", entry_count)
+                    entries += 1
+        files_cache_logger.debug(f"FILES-CACHE-KILL: removed {age_discarded} entries with age >= TTL [{ttl}]")
+        t_str = datetime.fromtimestamp(discard_after / 1e9, timezone.utc).isoformat()
+        files_cache_logger.debug(f"FILES-CACHE-KILL: removed {race_discarded} entries with ctime/mtime >= {t_str}")
+        files_cache_logger.debug(f"FILES-CACHE-SAVE: finished, {entries} remaining entries saved.")
         return fd.integrity_data
 
     def file_known_and_unchanged(self, hashed_path, path_hash, st):
@@ -517,51 +568,48 @@ class FilesCacheMixin:
         if "i" in cache_mode and entry.inode != st.st_ino:
             files_cache_logger.debug("KNOWN-CHANGED: file inode number has changed: %r", hashed_path)
             return True, None
-        if "c" in cache_mode and timestamp_to_int(entry.cmtime) != st.st_ctime_ns:
+        ctime = int_to_timestamp(safe_ns(st.st_ctime_ns))
+        if "c" in cache_mode and entry.ctime != ctime:
             files_cache_logger.debug("KNOWN-CHANGED: file ctime has changed: %r", hashed_path)
             return True, None
-        elif "m" in cache_mode and timestamp_to_int(entry.cmtime) != st.st_mtime_ns:
+        mtime = int_to_timestamp(safe_ns(st.st_mtime_ns))
+        if "m" in cache_mode and entry.mtime != mtime:
             files_cache_logger.debug("KNOWN-CHANGED: file mtime has changed: %r", hashed_path)
             return True, None
-        # we ignored the inode number in the comparison above or it is still same.
+        # V = any of the inode number, mtime, ctime values.
+        # we ignored V in the comparison above or it is still the same value.
         # if it is still the same, replacing it in the tuple doesn't change it.
-        # if we ignored it, a reason for doing that is that files were moved to a new
-        # disk / new fs (so a one-time change of inode number is expected) and we wanted
-        # to avoid everything getting chunked again. to be able to re-enable the inode
-        # number comparison in a future backup run (and avoid chunking everything
-        # again at that time), we need to update the inode number in the cache with what
-        # we see in the filesystem.
-        self.files[path_hash] = msgpack.packb(entry._replace(inode=st.st_ino, age=0))
+        # if we ignored it, a reason for doing that is that files were moved/copied to
+        # a new disk / new fs (so a one-time change of V is expected) and we wanted
+        # to avoid everything getting chunked again. to be able to re-enable the
+        # V comparison in a future backup run (and avoid chunking everything again at
+        # that time), we need to update V in the cache with what we see in the filesystem.
+        self.files[path_hash] = msgpack.packb(entry._replace(inode=st.st_ino, ctime=ctime, mtime=mtime, age=0))
         chunks = [ChunkListEntry(*chunk) for chunk in entry.chunks]  # convert to list of namedtuple
         return True, chunks
 
     def memorize_file(self, hashed_path, path_hash, st, chunks):
         if not stat.S_ISREG(st.st_mode):
             return
-        cache_mode = self.cache_mode
         # note: r(echunk) modes will update the files cache, d(isabled) mode won't
-        if "d" in cache_mode:
+        if "d" in self.cache_mode:
             files_cache_logger.debug("FILES-CACHE-NOUPDATE: files cache disabled")
             return
-        if "c" in cache_mode:
-            cmtime_type = "ctime"
-            cmtime_ns = safe_ns(st.st_ctime_ns)
-        elif "m" in cache_mode:
-            cmtime_type = "mtime"
-            cmtime_ns = safe_ns(st.st_mtime_ns)
-        else:  # neither 'c' nor 'm' in cache_mode, avoid UnboundLocalError
-            cmtime_type = "ctime"
-            cmtime_ns = safe_ns(st.st_ctime_ns)
+        ctime_ns = safe_ns(st.st_ctime_ns)
+        mtime_ns = safe_ns(st.st_mtime_ns)
         entry = FileCacheEntry(
-            age=0, inode=st.st_ino, size=st.st_size, cmtime=int_to_timestamp(cmtime_ns), chunks=chunks
+            age=0,
+            inode=st.st_ino,
+            size=st.st_size,
+            ctime=int_to_timestamp(ctime_ns),
+            mtime=int_to_timestamp(mtime_ns),
+            chunks=chunks,
         )
         self.files[path_hash] = msgpack.packb(entry)
-        self._newest_cmtime = max(self._newest_cmtime or 0, cmtime_ns)
+        self._newest_cmtime = max(self._newest_cmtime or 0, ctime_ns)
+        self._newest_cmtime = max(self._newest_cmtime or 0, mtime_ns)
         files_cache_logger.debug(
-            "FILES-CACHE-UPDATE: put %r [has %s] <- %r",
-            entry._replace(chunks="[%d entries]" % len(entry.chunks)),
-            cmtime_type,
-            hashed_path,
+            "FILES-CACHE-UPDATE: put %r <- %r", entry._replace(chunks="[%d entries]" % len(entry.chunks)), hashed_path
         )
 
 
@@ -615,7 +663,7 @@ class ChunksMixin:
         if entry.refcount and size is not None:
             assert isinstance(entry.size, int)
             if not entry.size:
-                # AdHocWithFilesCache / AdHocCache:
+                # AdHocWithFilesCache:
                 # Here *size* is used to update the chunk's size information, which will be zero for existing chunks.
                 self.chunks[id] = entry._replace(size=size)
         return entry.refcount != 0
@@ -659,7 +707,13 @@ class ChunksMixin:
 
 class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
     """
-    Like AdHocCache, but with a files cache.
+    An ad-hoc chunks and files cache.
+
+    Chunks: it does not maintain accurate reference count.
+    Chunks that were not added during the current lifetime won't have correct size set (0 bytes)
+    and will have an infinite reference count (MAX_VALUE).
+
+    Files: if a previous_archive_id is given, ad-hoc build a in-memory files cache from that archive.
     """
 
     def __init__(
@@ -668,16 +722,16 @@ class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
         path=None,
         warn_if_unencrypted=True,
         progress=False,
-        lock_wait=None,
         cache_mode=FILES_CACHE_MODE_DISABLED,
         iec=False,
+        archive_name=None,
+        start_backup=None,
     ):
         """
         :param warn_if_unencrypted: print warning if accessing unknown unencrypted repository
-        :param lock_wait: timeout for lock acquisition (int [s] or None [wait forever])
         :param cache_mode: what shall be compared in the file stat infos vs. cached stat infos comparison
         """
-        FilesCacheMixin.__init__(self, cache_mode)
+        FilesCacheMixin.__init__(self, cache_mode, archive_name, start_backup)
         ChunksMixin.__init__(self)
         assert isinstance(manifest, Manifest)
         self.manifest = manifest
@@ -688,7 +742,7 @@ class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
 
         self.path = cache_dir(self.repository, path)
         self.security_manager = SecurityManager(self.repository)
-        self.cache_config = CacheConfig(self.repository, self.path, lock_wait)
+        self.cache_config = CacheConfig(self.repository, self.path)
 
         # Warn user before sending data to a never seen before unencrypted repository
         if not os.path.exists(self.path):
@@ -721,7 +775,6 @@ class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
         with open(os.path.join(self.path, "README"), "w") as fd:
             fd.write(CACHE_README)
         self.cache_config.create()
-        self._create_empty_files_cache(self.path)
 
     def open(self):
         if not os.path.isdir(self.path):
@@ -757,7 +810,6 @@ class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
     def wipe_cache(self):
         logger.warning("Discarding incompatible cache and forcing a cache rebuild")
         self._chunks = ChunkIndex()
-        self._create_empty_files_cache(self.path)
         self.cache_config.manifest_id = ""
         self.cache_config._config.set("cache", "manifest", "")
 
@@ -773,46 +825,3 @@ class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
 
         self.cache_config.ignored_features.update(repo_features - my_features)
         self.cache_config.mandatory_features.update(repo_features & my_features)
-
-
-class AdHocCache(ChunksMixin):
-    """
-    Ad-hoc, non-persistent cache.
-
-    The AdHocCache does not maintain accurate reference count, nor does it provide a files cache
-    (which would require persistence).
-    Chunks that were not added during the current AdHocCache lifetime won't have correct size set
-    (0 bytes) and will have an infinite reference count (MAX_VALUE).
-    """
-
-    def __init__(self, manifest, warn_if_unencrypted=True, lock_wait=None, iec=False):
-        ChunksMixin.__init__(self)
-        assert isinstance(manifest, Manifest)
-        self.manifest = manifest
-        self.repository = manifest.repository
-        self.key = manifest.key
-        self.repo_objs = manifest.repo_objs
-
-        self.security_manager = SecurityManager(self.repository)
-        self.security_manager.assert_secure(manifest, self.key, lock_wait=lock_wait)
-
-    # Public API
-
-    def __enter__(self):
-        self._chunks = None
-        return self
-
-    def __exit__(self, exc_type, exc_val, exc_tb):
-        if exc_type is None:
-            self.security_manager.save(self.manifest, self.key)
-        self._chunks = None
-
-    files = None  # type: ignore
-    cache_mode = "d"
-
-    def file_known_and_unchanged(self, hashed_path, path_hash, st):
-        files_cache_logger.debug("UNKNOWN: files cache not implemented")
-        return False, None
-
-    def memorize_file(self, hashed_path, path_hash, st, chunks):
-        pass

+ 7 - 0
src/borg/constants.py

@@ -113,6 +113,13 @@ CH_DATA, CH_ALLOC, CH_HOLE = 0, 1, 2
 FILES_CACHE_MODE_UI_DEFAULT = "ctime,size,inode"  # default for "borg create" command (CLI UI)
 FILES_CACHE_MODE_DISABLED = "d"  # most borg commands do not use the files cache at all (disable)
 
+# account for clocks being slightly out-of-sync, timestamps granularity.
+# we can't go much higher here (like e.g. to 2s) without causing issues.
+TIME_DIFFERS1_NS = 20000000
+
+# similar to above, but for bigger granularity / clock differences
+TIME_DIFFERS2_NS = 3000000000
+
 # return codes returned by borg command
 EXIT_SUCCESS = 0  # everything done, no problems
 EXIT_WARNING = 1  # reached normal end of operation, but there were issues (generic warning)

+ 1 - 3
src/borg/helpers/parseformat.py

@@ -1189,7 +1189,7 @@ class BorgJsonEncoder(json.JSONEncoder):
         from ..legacyremote import LegacyRemoteRepository
         from ..remote import RemoteRepository
         from ..archive import Archive
-        from ..cache import AdHocCache, AdHocWithFilesCache
+        from ..cache import AdHocWithFilesCache
 
         if isinstance(o, (LegacyRepository, LegacyRemoteRepository)) or isinstance(o, (Repository, RemoteRepository)):
             return {"id": bin_to_hex(o.id), "location": o._location.canonical_path()}
@@ -1197,8 +1197,6 @@ class BorgJsonEncoder(json.JSONEncoder):
             return o.info()
         if isinstance(o, (AdHocWithFilesCache,)):
             return {"path": o.path}
-        if isinstance(o, AdHocCache):
-            return {}
         if callable(getattr(o, "to_json", None)):
             return o.to_json()
         return super().default(o)

+ 0 - 21
src/borg/testsuite/archiver/corruption.py

@@ -1,4 +1,3 @@
-import io
 import json
 import os
 from configparser import ConfigParser
@@ -16,26 +15,6 @@ def corrupt_archiver(archiver):
     archiver.cache_path = json.loads(cmd(archiver, "repo-info", "--json"))["cache"].get("path")
 
 
-def corrupt(file, amount=1):
-    with open(file, "r+b") as fd:
-        fd.seek(-amount, io.SEEK_END)
-        corrupted = bytes(255 - c for c in fd.read(amount))
-        fd.seek(-amount, io.SEEK_END)
-        fd.write(corrupted)
-
-
-def test_cache_files(archiver):
-    corrupt_archiver(archiver)
-    if archiver.cache_path is None:
-        pytest.skip("no cache path for this kind of Cache implementation")
-
-    cmd(archiver, "create", "test", "input")
-    corrupt(os.path.join(archiver.cache_path, "files"))
-    out = cmd(archiver, "create", "test1", "input")
-    # borg warns about the corrupt files cache, but then continues without files cache.
-    assert "files cache is corrupted" in out
-
-
 def test_old_version_interfered(archiver):
     corrupt_archiver(archiver)
     if archiver.cache_path is None:

+ 14 - 34
src/borg/testsuite/archiver/create_cmd.py

@@ -11,7 +11,6 @@ import time
 import pytest
 
 from ... import platform
-from ...cache import get_cache_impl
 from ...constants import *  # NOQA
 from ...manifest import Manifest
 from ...platform import is_cygwin, is_win32, is_darwin
@@ -525,25 +524,6 @@ def test_create_pattern_intermediate_folders_first(archivers, request):
     assert out_list.index("d x/b") < out_list.index("- x/b/foo_b")
 
 
-@pytest.mark.skipif(get_cache_impl() != "adhoc", reason="only works with AdHocCache")
-def test_create_no_cache_sync_adhoc(archivers, request):  # TODO: add test for AdHocWithFilesCache
-    archiver = request.getfixturevalue(archivers)
-    create_test_files(archiver.input_path)
-    cmd(archiver, "repo-create", RK_ENCRYPTION)
-    cmd(archiver, "repo-delete", "--cache-only")
-    create_json = json.loads(
-        cmd(archiver, "create", "--no-cache-sync", "--prefer-adhoc-cache", "--json", "test", "input")
-    )
-    info_json = json.loads(cmd(archiver, "info", "-a", "test", "--json"))
-    create_stats = create_json["cache"]["stats"]
-    info_stats = info_json["cache"]["stats"]
-    assert create_stats == info_stats
-    cmd(archiver, "repo-delete", "--cache-only")
-    cmd(archiver, "create", "--no-cache-sync", "--prefer-adhoc-cache", "test2", "input")
-    cmd(archiver, "repo-info")
-    cmd(archiver, "check")
-
-
 def test_create_archivename_with_placeholder(archivers, request):
     archiver = request.getfixturevalue(archivers)
     create_test_files(archiver.input_path)
@@ -676,7 +656,7 @@ def test_file_status(archivers, request):
     assert "A input/file1" in output
     assert "A input/file2" in output
     # should find first file as unmodified
-    output = cmd(archiver, "create", "--list", "test2", "input")
+    output = cmd(archiver, "create", "--list", "test", "input")
     assert "U input/file1" in output
     # although surprising, this is expected. For why, see:
     # https://borgbackup.readthedocs.org/en/latest/faq.html#i-am-seeing-a-added-status-for-a-unchanged-file
@@ -693,13 +673,13 @@ def test_file_status_cs_cache_mode(archivers, request):
     time.sleep(1)  # file2 must have newer timestamps than file1
     create_regular_file(archiver.input_path, "file2", size=10)
     cmd(archiver, "repo-create", RK_ENCRYPTION)
-    cmd(archiver, "create", "test1", "input", "--list", "--files-cache=ctime,size")
+    cmd(archiver, "create", "test", "input", "--list", "--files-cache=ctime,size")
     # modify file1, but cheat with the mtime (and atime) and also keep same size:
     st = os.stat("input/file1")
     create_regular_file(archiver.input_path, "file1", contents=b"321")
     os.utime("input/file1", ns=(st.st_atime_ns, st.st_mtime_ns))
     # this mode uses ctime for change detection, so it should find file1 as modified
-    output = cmd(archiver, "create", "test2", "input", "--list", "--files-cache=ctime,size")
+    output = cmd(archiver, "create", "test", "input", "--list", "--files-cache=ctime,size")
     assert "M input/file1" in output
 
 
@@ -710,12 +690,12 @@ def test_file_status_ms_cache_mode(archivers, request):
     time.sleep(1)  # file2 must have newer timestamps than file1
     create_regular_file(archiver.input_path, "file2", size=10)
     cmd(archiver, "repo-create", RK_ENCRYPTION)
-    cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test1", "input")
+    cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test", "input")
     # change mode of file1, no content change:
     st = os.stat("input/file1")
     os.chmod("input/file1", st.st_mode ^ stat.S_IRWXO)  # this triggers a ctime change, but mtime is unchanged
     # this mode uses mtime for change detection, so it should find file1 as unmodified
-    output = cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test2", "input")
+    output = cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test", "input")
     assert "U input/file1" in output
 
 
@@ -726,9 +706,9 @@ def test_file_status_rc_cache_mode(archivers, request):
     time.sleep(1)  # file2 must have newer timestamps than file1
     create_regular_file(archiver.input_path, "file2", size=10)
     cmd(archiver, "repo-create", RK_ENCRYPTION)
-    cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test1", "input")
+    cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test", "input")
     # no changes here, but this mode rechunks unconditionally
-    output = cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test2", "input")
+    output = cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test", "input")
     assert "A input/file1" in output
 
 
@@ -748,7 +728,7 @@ def test_file_status_excluded(archivers, request):
     if has_lchflags:
         assert "- input/file3" in output
     # should find second file as excluded
-    output = cmd(archiver, "create", "test1", "input", "--list", "--exclude-nodump", "--exclude", "*/file2")
+    output = cmd(archiver, "create", "test", "input", "--list", "--exclude-nodump", "--exclude", "*/file2")
     assert "U input/file1" in output
     assert "- input/file2" in output
     if has_lchflags:
@@ -781,14 +761,14 @@ def test_file_status_counters(archivers, request):
     create_regular_file(archiver.input_path, "testfile1", contents=b"test1")
     time.sleep(1.0 if is_darwin else 0.01)  # testfile2 must have newer timestamps than testfile1
     create_regular_file(archiver.input_path, "testfile2", contents=b"test2")
-    result = cmd(archiver, "create", "--stats", "test_archive2", archiver.input_path)
+    result = cmd(archiver, "create", "--stats", "test_archive", archiver.input_path)
     result = to_dict(result)
     assert result["Added files"] == 2
     assert result["Unchanged files"] == 0
     assert result["Modified files"] == 0
     # Archive a dir with 1 unmodified file and 1 modified
     create_regular_file(archiver.input_path, "testfile1", contents=b"new data")
-    result = cmd(archiver, "create", "--stats", "test_archive3", archiver.input_path)
+    result = cmd(archiver, "create", "--stats", "test_archive", archiver.input_path)
     result = to_dict(result)
     # Should process testfile2 as added because of
     # https://borgbackup.readthedocs.io/en/stable/faq.html#i-am-seeing-a-added-status-for-an-unchanged-file
@@ -826,18 +806,18 @@ def test_create_topical(archivers, request):
     output = cmd(archiver, "create", "test", "input")
     assert "file1" not in output
     # shouldn't be listed even if unchanged
-    output = cmd(archiver, "create", "test0", "input")
+    output = cmd(archiver, "create", "test", "input")
     assert "file1" not in output
     # should list the file as unchanged
-    output = cmd(archiver, "create", "test1", "input", "--list", "--filter=U")
+    output = cmd(archiver, "create", "test", "input", "--list", "--filter=U")
     assert "file1" in output
     # should *not* list the file as changed
-    output = cmd(archiver, "create", "test2", "input", "--list", "--filter=AM")
+    output = cmd(archiver, "create", "test", "input", "--list", "--filter=AM")
     assert "file1" not in output
     # change the file
     create_regular_file(archiver.input_path, "file1", size=1024 * 100)
     # should list the file as changed
-    output = cmd(archiver, "create", "test3", "input", "--list", "--filter=AM")
+    output = cmd(archiver, "create", "test", "input", "--list", "--filter=AM")
     assert "file1" in output
 
 

+ 10 - 9
src/borg/testsuite/cache.py

@@ -1,17 +1,17 @@
-import os.path
+import os
 
 import pytest
 
 from .hashindex import H
 from .key import TestKey
 from ..archive import Statistics
-from ..cache import AdHocCache
+from ..cache import AdHocWithFilesCache
 from ..crypto.key import AESOCBRepoKey
 from ..manifest import Manifest
 from ..repository import Repository
 
 
-class TestAdHocCache:
+class TestAdHocWithFilesCache:
     @pytest.fixture
     def repository(self, tmpdir):
         self.repository_location = os.path.join(str(tmpdir), "repository")
@@ -32,7 +32,7 @@ class TestAdHocCache:
 
     @pytest.fixture
     def cache(self, repository, key, manifest):
-        return AdHocCache(manifest)
+        return AdHocWithFilesCache(manifest)
 
     def test_does_not_contain_manifest(self, cache):
         assert not cache.seen_chunk(Manifest.MANIFEST_ID)
@@ -40,11 +40,6 @@ class TestAdHocCache:
     def test_seen_chunk_add_chunk_size(self, cache):
         assert cache.add_chunk(H(1), {}, b"5678", stats=Statistics()) == (H(1), 4)
 
-    def test_files_cache(self, cache):
-        assert cache.file_known_and_unchanged(b"foo", bytes(32), None) == (False, None)
-        assert cache.cache_mode == "d"
-        assert cache.files is None
-
     def test_reuse_after_add_chunk(self, cache):
         assert cache.add_chunk(H(3), {}, b"5678", stats=Statistics()) == (H(3), 4)
         assert cache.reuse_chunk(H(3), 4, Statistics()) == (H(3), 4)
@@ -52,3 +47,9 @@ class TestAdHocCache:
     def test_existing_reuse_after_add_chunk(self, cache):
         assert cache.add_chunk(H(1), {}, b"5678", stats=Statistics()) == (H(1), 4)
         assert cache.reuse_chunk(H(1), 4, Statistics()) == (H(1), 4)
+
+    def test_files_cache(self, cache):
+        st = os.stat(".")
+        assert cache.file_known_and_unchanged(b"foo", bytes(32), st) == (False, None)
+        assert cache.cache_mode == "d"
+        assert cache.files == {}

+ 0 - 1
src/borg/testsuite/conftest.py

@@ -104,7 +104,6 @@ def archiver(tmp_path, set_env_variables):
     archiver.patterns_file_path = os.fspath(tmp_path / "patterns")
     os.environ["BORG_KEYS_DIR"] = archiver.keys_path
     os.environ["BORG_CACHE_DIR"] = archiver.cache_path
-    # os.environ["BORG_CACHE_IMPL"] = "adhocwithfiles"
     os.mkdir(archiver.input_path)
     os.chmod(archiver.input_path, 0o777)  # avoid troubles with fakeroot / FUSE
     os.mkdir(archiver.output_path)