Browse Source

Merge pull request #6741 from fantasya-pbem/docs/5310_overhaul-help-patterns

docs: overhaul borg help patterns, fixes #5310
TW 3 years ago
parent
commit
4ff0a29209
1 changed files with 115 additions and 102 deletions
  1. 115 102
      src/borg/archiver.py

+ 115 - 102
src/borg/archiver.py

@@ -2485,40 +2485,35 @@ class Archiver:
 
     helptext = collections.OrderedDict()
     helptext['patterns'] = textwrap.dedent('''
-        The path/filenames used as input for the pattern matching start from the
-        currently active recursion root. You usually give the recursion root(s)
-        when invoking borg and these can be either relative or absolute paths.
-
-        If you give `/absolute/` as root, the paths going into the matcher will
-        look relative like `absolute/.../file.ext`, because file paths in Borg
-        archives are always stored normalized and relative. This means that e.g.
-        ``borg create /path/to/repo ../some/path`` will store all files as
-        `some/path/.../file.ext` and ``borg create /path/to/repo /home/user``
-        will store all files as `home/user/.../file.ext`.
-
-        A directory exclusion pattern can end either with or without a slash ('/').
-        If it ends with a slash, such as `some/path/`, the directory will be
-        included but not its content. If it does not end with a slash, such as
-        `some/path`, both the directory and content will be excluded.
-
-        File patterns support these styles: fnmatch, shell, regular expressions,
-        path prefixes and path full-matches. By default, fnmatch is used for
-        ``--exclude`` patterns and shell-style is used for the ``--pattern``
-        option. For commands that support patterns in their ``PATH`` argument
-        like (``borg list``), the default pattern is path prefix.
-
-        Starting with Borg 1.2, discovered fs paths are normalised, have leading
-        slashes removed and then are matched against your patterns.
-        Note: You need to review your include / exclude patterns and make
-        sure they do not expect leading slashes. Borg can only deal with this
-        for some very simple patterns by removing leading slashes there also.
-
-        If followed by a colon (':') the first two characters of a pattern are
-        used as a style selector. Explicit style selection is necessary when a
-        non-default style is desired or when the desired pattern starts with
-        two alphanumeric characters followed by a colon (i.e. `aa:something/*`).
-
-        `Fnmatch <https://docs.python.org/3/library/fnmatch.html>`_, selector `fm:`
+        When specifying one or more file paths in a Borg command that supports
+        patterns for the respective option or argument, you can apply the
+        patterns described here to include only desired files and/or exclude
+        unwanted ones. Patterns can be used
+
+        - for ``--exclude`` option,
+        - in the file given with ``--exclude-from`` option,
+        - for ``--pattern`` option,
+        - in the file given with ``--patterns-from`` option and
+        - for ``PATH`` arguments that explicitly support them.
+
+        Borg always stores all file paths normalized and relative to the
+        current recursion root. The recursion root is also named ``PATH`` in
+        Borg commands like `borg create` that do a file discovery, so do not
+        confuse the root with the ``PATH`` argument of e.g. `borg extract`.
+
+        Starting with Borg 1.2, paths that are matched against patterns always
+        appear relative. If you give ``/absolute/`` as root, the paths going
+        into the matcher will look relative like ``absolute/.../file.ext``.
+        If you give ``../some/path`` as root, the paths will look like
+        ``some/path/.../file.ext``.
+
+        File patterns support five different styles. If followed by a colon ':',
+        the first two characters of a pattern are used as a style selector.
+        Explicit style selection is necessary if a non-default style is desired
+        or when the desired pattern starts with two alphanumeric characters
+        followed by a colon (i.e. ``aa:something/*``).
+
+        `Fnmatch <https://docs.python.org/3/library/fnmatch.html>`_, selector ``fm:``
             This is the default style for ``--exclude`` and ``--exclude-from``.
             These patterns use a variant of shell pattern syntax, with '\\*' matching
             any number of characters, '?' matching any single character, '[...]'
@@ -2526,7 +2521,7 @@ class Archiver:
             matching any character not specified. For the purpose of these patterns,
             the path separator (backslash for Windows and '/' on other systems) is not
             treated specially. Wrap meta-characters in brackets for a literal
-            match (i.e. `[?]` to match the literal character `?`). For a path
+            match (i.e. ``[?]`` to match the literal character '?'). For a path
             to match a pattern, the full path must match, or it must match
             from the start of the full path to just before a path separator. Except
             for the root path, paths will never end in the path separator when
@@ -2534,33 +2529,31 @@ class Archiver:
             separator, a '\\*' is appended before matching is attempted. A leading
             path separator is always removed.
 
-        Shell-style patterns, selector `sh:`
+        Shell-style patterns, selector ``sh:``
             This is the default style for ``--pattern`` and ``--patterns-from``.
             Like fnmatch patterns these are similar to shell patterns. The difference
-            is that the pattern may include `**/` for matching zero or more directory
-            levels, `*` for matching zero or more arbitrary characters with the
+            is that the pattern may include ``**/`` for matching zero or more directory
+            levels, ``*`` for matching zero or more arbitrary characters with the
             exception of any path separator. A leading path separator is always removed.
 
-        Regular expressions, selector `re:`
-            Regular expressions similar to those found in Perl are supported. Unlike
-            shell patterns regular expressions are not required to match the full
+        `Regular expressions <https://docs.python.org/3/library/re.html>`_, selector ``re:``
+            Unlike shell patterns, regular expressions are not required to match the full
             path and any substring match is sufficient. It is strongly recommended to
             anchor patterns to the start ('^'), to the end ('$') or both. Path
             separators (backslash for Windows and '/' on other systems) in paths are
-            always normalized to a forward slash ('/') before applying a pattern. The
-            regular expression syntax is described in the `Python documentation for
-            the re module <https://docs.python.org/3/library/re.html>`_.
+            always normalized to a forward slash '/' before applying a pattern.
 
-        Path prefix, selector `pp:`
+        Path prefix, selector ``pp:``
             This pattern style is useful to match whole sub-directories. The pattern
-            `pp:root/somedir` matches `root/somedir` and everything therein. A leading
-            path separator is always removed.
+            ``pp:root/somedir`` matches ``root/somedir`` and everything therein.
+            A leading path separator is always removed.
 
-        Path full-match, selector `pf:`
+        Path full-match, selector ``pf:``
             This pattern style is (only) useful to match full paths.
             This is kind of a pseudo pattern as it can not have any variable or
-            unspecified parts - the full path must be given. `pf:root/file.ext` matches
-            `root/file.ext` only. A leading path separator is always removed.
+            unspecified parts - the full path must be given. ``pf:root/file.ext``
+            matches ``root/file.ext`` only. A leading path separator is always
+            removed.
 
             Implementation note: this is implemented via very time-efficient O(1)
             hashtable lookups (this means you can have huge amounts of such patterns
@@ -2573,20 +2566,20 @@ class Archiver:
 
         .. note::
 
-            `re:`, `sh:` and `fm:` patterns are all implemented on top of the Python SRE
-            engine. It is very easy to formulate patterns for each of these types which
-            requires an inordinate amount of time to match paths. If untrusted users
-            are able to supply patterns, ensure they cannot supply `re:` patterns.
-            Further, ensure that `sh:` and `fm:` patterns only contain a handful of
-            wildcards at most.
+            ``re:``, ``sh:`` and ``fm:`` patterns are all implemented on top of
+            the Python SRE engine. It is very easy to formulate patterns for each
+            of these types which requires an inordinate amount of time to match
+            paths. If untrusted users are able to supply patterns, ensure they
+            cannot supply ``re:`` patterns. Further, ensure that ``sh:`` and
+            ``fm:`` patterns only contain a handful of wildcards at most.
 
         Exclusions can be passed via the command line option ``--exclude``. When used
         from within a shell, the patterns should be quoted to protect them from
         expansion.
 
         The ``--exclude-from`` option permits loading exclusion patterns from a text
-        file with one pattern per line. Lines empty or starting with the number sign
-        ('#') after removing whitespace on both ends are ignored. The optional style
+        file with one pattern per line. Lines empty or starting with the hash sign
+        '#' after removing whitespace on both ends are ignored. The optional style
         selector prefix is also supported for patterns loaded from a file. Due to
         whitespace removal, paths with whitespace at the beginning or end can only be
         excluded using regular expressions.
@@ -2597,21 +2590,21 @@ class Archiver:
         Examples::
 
             # Exclude '/home/user/file.o' but not '/home/user/file.odt':
-            $ borg create -e '*.o' backup /
+            $ borg create -e '*.o' /path/to/repo::archive /
 
             # Exclude '/home/user/junk' and '/home/user/subdir/junk' but
             # not '/home/user/importantjunk' or '/etc/junk':
-            $ borg create -e 'home/*/junk' backup /
+            $ borg create -e 'home/*/junk' /path/to/repo::archive /
 
             # Exclude the contents of '/home/user/cache' but not the directory itself:
-            $ borg create -e home/user/cache/ backup /
+            $ borg create -e home/user/cache/ /path/to/repo::archive /
 
             # The file '/home/user/cache/important' is *not* backed up:
-            $ borg create -e home/user/cache/ backup / /home/user/cache/important
+            $ borg create -e home/user/cache/ /path/to/repo::archive / /home/user/cache/important
 
             # The contents of directories in '/home' are not backed up when their name
             # ends in '.tmp'
-            $ borg create --exclude 're:^home/[^/]+\\.tmp/' backup /
+            $ borg create --exclude 're:^home/[^/]+\\.tmp/' /path/to/repo::archive /
 
             # Load exclusions from file
             $ cat >exclude.txt <<EOF
@@ -2624,36 +2617,56 @@ class Archiver:
             # Example with spaces, no need to escape as it is processed by borg
             some file with spaces.txt
             EOF
-            $ borg create --exclude-from exclude.txt backup /
+            $ borg create --exclude-from exclude.txt /path/to/repo::archive /
 
-        A more general and easier to use way to define filename matching patterns exists
-        with the ``--pattern`` and ``--patterns-from`` options. Using these, you may
-        specify the backup roots (starting points) and patterns for inclusion/exclusion.
-        A root path starts with the prefix `R`, followed by a path (a plain path, not a
-        file pattern). An include rule starts with the prefix +, an exclude rule starts
-        with the prefix -, an exclude-norecurse rule starts with !, all followed by a pattern.
+        A more general and easier to use way to define filename matching patterns
+        exists with the ``--pattern`` and ``--patterns-from`` options. Using
+        these, you may specify the backup roots, default pattern styles and
+        patterns for inclusion and exclusion.
 
-        .. note::
+        Root path prefix ``R``
+            A recursion root path starts with the prefix ``R``, followed by a path
+            (a plain path, not a file pattern). Use this prefix to have the root
+            paths in the patterns file rather than as command line arguments.
 
-            Via ``--pattern`` or ``--patterns-from`` you can define BOTH inclusion and exclusion
-            of files using pattern prefixes ``+`` and ``-``. With ``--exclude`` and
-            ``--exclude-from`` ONLY excludes are defined.
+        Pattern style prefix ``P``
+            To change the default pattern style, use the ``P`` prefix, followed by
+            the pattern style abbreviation (``fm``, ``pf``, ``pp``, ``re``, ``sh``).
+            All patterns following this line will use this style until another style
+            is specified.
 
-        Inclusion patterns are useful to include paths that are contained in an excluded
-        path. The first matching pattern is used so if an include pattern matches before
-        an exclude pattern, the file is backed up. If an exclude-norecurse pattern matches
-        a directory, it won't recurse into it and won't discover any potential matches for
-        include rules below that directory.
+        Exclude pattern prefix ``-``
+            Use the prefix ``-``, followed by a pattern, to define an exclusion.
+            This has the same effect as the ``--exclude`` option.
 
-        .. note::
+        Exclude no-recurse pattern prefix ``!``
+            Use the prefix ``!``, followed by a pattern, to define an exclusion
+            that does not recurse into subdirectories. This saves time, but
+            prevents include patterns to match any files in subdirectories.
+
+        Include pattern prefix ``+``
+            Use the prefix ``+``, followed by a pattern, to define inclusions.
+            This is useful to include paths that are covered in an exclude
+            pattern and would otherwise not be backed up.
+
+        The first matching pattern is used, so if an include pattern matches
+        before an exclude pattern, the file is backed up. Note that a no-recurse
+        exclude stops examination of subdirectories so that potential includes
+        will not match - use normal exludes for such use cases.
+
+        **Tip: You can easily test your patterns with --dry-run and  --list**::
 
-            It's possible that a sub-directory/file is matched while parent directories are not.
-            In that case, parent directories are not backed up thus their user, group, permission,
-            etc. can not be restored.
+            $ borg create --dry-run --list --patterns-from patterns.txt /path/to/repo::archive
 
-        Note that the default pattern style for ``--pattern`` and ``--patterns-from`` is
-        shell style (`sh:`), so those patterns behave similar to rsync include/exclude
-        patterns. The pattern style can be set via the `P` prefix.
+        This will list the considered files one per line, prefixed with a
+        character that indicates the action (e.g. 'x' for excluding, see
+        **Item flags** in `borg create` usage docs).
+
+        .. note::
+
+            It's possible that a sub-directory/file is matched while parent
+            directories are not. In that case, parent directories are not backed
+            up and thus their user, group, permission, etc. cannot be restored.
 
         Patterns (``--pattern``) and excludes (``--exclude``) from the command line are
         considered first (in the order of appearance). Then patterns from ``--patterns-from``
@@ -2663,44 +2676,44 @@ class Archiver:
 
             # backup pics, but not the ones from 2018, except the good ones:
             # note: using = is essential to avoid cmdline argument parsing issues.
-            borg create --pattern=+pics/2018/good --pattern=-pics/2018 repo::arch pics
+            borg create --pattern=+pics/2018/good --pattern=-pics/2018 /path/to/repo::archive pics
 
-            # use a file with patterns:
-            borg create --patterns-from patterns.lst repo::arch
+            # backup only JPG/JPEG files (case insensitive) in all home directories:
+            borg create --pattern '+ re:\\.jpe?g(?i)$' /path/to/repo::archive /home
+
+            # backup homes, but exclude big downloads (like .ISO files) or hidden files:
+            borg create --exclude 're:\\.iso(?i)$' --exclude 'sh:home/**/.*' /path/to/repo::archive /home
+
+            # use a file with patterns (recursion root '/' via command line):
+            borg create --patterns-from patterns.lst /path/to/repo::archive /
 
         The patterns.lst file could look like that::
 
-            # "sh:" pattern style is the default, so the following line is not needed:
-            P sh
-            R /
-            # can be rebuild
+            # "sh:" pattern style is the default
+            # exclude caches
             - home/*/.cache
-            # they're downloads for a reason
-            - home/*/Downloads
-            # susan is a nice person
             # include susans home
             + home/susan
             # also back up this exact file
             + pf:home/bobby/specialfile.txt
             # don't backup the other home directories
             - home/*
-            # don't even look in /proc
-            ! proc
+            # don't even look in /dev, /proc, /run, /sys, /tmp (note: would exclude files like /device, too)
+            ! re:^(dev|proc|run|sys|tmp)
 
         You can specify recursion roots either on the command line or in a patternfile::
 
             # these two commands do the same thing
-            borg create --exclude home/bobby/junk repo::arch /home/bobby /home/susan
-            borg create --patterns-from patternfile.lst repo::arch
+            borg create --exclude home/bobby/junk /path/to/repo::archive /home/bobby /home/susan
+            borg create --patterns-from patternfile.lst /path/to/repo::archive
 
-        The patternfile::
+        patternfile.lst::
 
             # note that excludes use fm: by default and patternfiles use sh: by default.
             # therefore, we need to specify fm: to have the same exact behavior.
             P fm
             R /home/bobby
             R /home/susan
-
             - home/bobby/junk
 
         This allows you to share the same patterns between multiple repositories