Browse Source

add --json-lines option to diff command

Robert Blenis 4 years ago
parent
commit
b2dea4422e
5 changed files with 215 additions and 40 deletions
  1. 79 7
      docs/internals/frontends.rst
  2. 9 1
      docs/usage/diff.rst
  3. 10 3
      src/borg/archiver.py
  4. 39 29
      src/borg/item.pyx
  5. 78 0
      src/borg/testsuite/archiver.py

+ 79 - 7
docs/internals/frontends.rst

@@ -231,11 +231,16 @@ Standard output
 *stdout* is different and more command-dependent than logging. Commands like :ref:`borg_info`, :ref:`borg_create`
 and :ref:`borg_list` implement a ``--json`` option which turns their regular output into a single JSON object.
 
+Some commands, like :ref:`borg_list` and :ref:`borg_diff`, can produce *a lot* of JSON. Since many JSON implementations
+don't support a streaming mode of operation, which is pretty much required to deal with this amount of JSON, these
+commands implement a ``--json-lines`` option which generates output in the `JSON lines <http://jsonlines.org/>`_ format,
+which is simply a number of JSON objects separated by new lines.
+
 Dates are formatted according to ISO 8601 in local time. No explicit time zone is specified *at this time*
 (subject to change). The equivalent strftime format string is '%Y-%m-%dT%H:%M:%S.%f',
 e.g. ``2017-08-07T12:27:20.123456``.
 
-The root object at least contains a *repository* key with an object containing:
+The root object of '--json' output will contain at least a *repository* key with an object containing:
 
 id
     The ID of the repository, normally 64 hex characters
@@ -439,12 +444,7 @@ The same archive with more information (``borg info --last 1 --json``)::
 File listings
 +++++++++++++
 
-Listing the contents of an archive can produce *a lot* of JSON. Since many JSON implementations
-don't support a streaming mode of operation, which is pretty much required to deal with this amount of
-JSON, output is generated in the `JSON lines <http://jsonlines.org/>`_ format, which is simply
-a number of JSON objects separated by new lines.
-
-Each item (file, directory, ...) is described by one object in the :ref:`borg_list` output.
+Each archive item (file, directory, ...) is described by one object in the :ref:`borg_list` output.
 Refer to the *borg list* documentation for the available keys and their meaning.
 
 Example (excerpt) of ``borg list --json-lines``::
@@ -452,6 +452,78 @@ Example (excerpt) of ``borg list --json-lines``::
     {"type": "d", "mode": "drwxr-xr-x", "user": "user", "group": "user", "uid": 1000, "gid": 1000, "path": "linux", "healthy": true, "source": "", "linktarget": "", "flags": null, "mtime": "2017-02-27T12:27:20.023407", "size": 0}
     {"type": "d", "mode": "drwxr-xr-x", "user": "user", "group": "user", "uid": 1000, "gid": 1000, "path": "linux/baz", "healthy": true, "source": "", "linktarget": "", "flags": null, "mtime": "2017-02-27T12:27:20.585407", "size": 0}
 
+Archive Differencing
+++++++++++++++++++++
+
+Each archive difference item (file contents, user/group/mode) output by :ref:`borg_diff` is represented by an *ItemDiff* object.
+The propertiese of an *ItemDiff* object are:
+
+path:
+    The filename/path of the *Item* (file, directory, symlink).
+
+changes:
+    A list of *Change* objects describing the changes made to the item in the two archives. For example,
+    there will be two changes if the contents of a file are changed, and its ownership are changed.
+
+The *Change* object can contain a number of properties depending on the type of change that occured. 
+If a 'property' is not required for the type of change, it is not output.
+The possible properties of a *Change* object are:
+
+type:
+  The **type** property is always present. It identifies the type of change and will be one of these values:
+  
+  - *modified* - file contents changed.
+  - *added* - the file was added.
+  - *removed* - the file was removed.
+  - *added directory* - the directory was added.
+  - *removed directory* - the directory was removed.
+  - *added link* - the symlink was added.
+  - *removed link* - the symlink was removed.
+  - *changed link* - the symlink target was changed.
+  - *mode* - the file/directory/link mode was changed. Note - this could indicate a change from a
+    file/directory/link type to a different type (file/directory/link), such as -- a file is deleted and replaced
+    with a directory of the same name.
+  - *owner* - user and/or group ownership changed.
+
+size:
+    If **type** == '*added*' or '*removed*', then **size** provides the size of the added or removed file.
+
+added:
+    If **type** == '*modified*' and chunk ids can be compared, then **added** and **removed** indicate the amount
+    of data 'added' and 'removed'. If chunk ids can not be compared, then **added** and **removed** properties are
+    not provided and the only information available is that the file contents were modified.
+
+removed:
+    See **added** property.
+    
+old_mode:
+    If **type** == '*mode*', then **old_mode** and **new_mode** provide the mode and permissions changes.
+
+new_mode:
+    See **old_mode** property.
+ 
+old_user:
+    If **type** == '*owner*', then **old_user**, **new_user**, **old_group** and **new_group** provide the user
+    and group ownership changes.
+
+old_group:
+    See **old_user** property.
+ 
+new_user:
+    See **old_user** property.
+ 
+new_group:
+    See **old_user** property.
+    
+
+Example (excerpt) of ``borg diff --json-lines``::
+
+    {"path": "file1", "changes": [{"path": "file1", "changes": [{"type": "modified", "added": 17, "removed": 5}, {"type": "mode", "old_mode": "-rw-r--r--", "new_mode": "-rwxr-xr-x"}]}]}
+    {"path": "file2", "changes": [{"type": "modified", "added": 135, "removed": 252}]}
+    {"path": "file4", "changes": [{"type": "added", "size": 0}]}
+    {"path": "file3", "changes": [{"type": "removed", "size": 0}]}
+
+
 .. _msgid:
 
 Message IDs

+ 9 - 1
docs/usage/diff.rst

@@ -16,6 +16,7 @@ Examples
     $ echo "something" >> file2
     $ borg create ../testrepo::archive2 .
 
+    $ echo "testing 123" >> file1
     $ rm file3
     $ touch file4
     $ borg create ../testrepo::archive3 .
@@ -26,11 +27,18 @@ Examples
        +135 B    -252 B file2
 
     $ borg diff testrepo::archive2 archive3
+        +17 B      -5 B file1
     added           0 B file4
     removed         0 B file3
 
     $ borg diff testrepo::archive1 archive3
-    [-rw-r--r-- -> -rwxr-xr-x] file1
+        +17 B      -5 B [-rw-r--r-- -> -rwxr-xr-x] file1
        +135 B    -252 B file2
     added           0 B file4
     removed         0 B file3
+
+    $ borg diff --json-lines testrepo::archive1 archive3
+    {"path": "file1", "changes": [{"type": "modified", "added": 17, "removed": 5}, {"type": "mode", "old_mode": "-rw-r--r--", "new_mode": "-rwxr-xr-x"}]}
+    {"path": "file2", "changes": [{"type": "modified", "added": 135, "removed": 252}]}
+    {"path": "file4", "changes": [{"type": "added", "size": 0}]}
+    {"path": "file3", "changes": [{"type": "removed", "size": 0}]}

+ 10 - 3
src/borg/archiver.py

@@ -1149,8 +1149,13 @@ class Archiver:
     def do_diff(self, args, repository, manifest, key, archive):
         """Diff contents of two archives"""
 
-        def print_output(diff, path):
-            print("{:<19} {}".format(diff, path))
+        def print_json_output(diff, path):
+            print(json.dumps({"path": path, "changes": [j for j, str in diff]}))
+
+        def print_text_output(diff, path):
+            print("{:<19} {}".format(' '.join([str for j, str in diff]), path))
+
+        print_output = print_json_output if args.json_lines else print_text_output
 
         archive1 = archive
         archive2 = Archive(repository, key, manifest, args.archive2,
@@ -1167,7 +1172,7 @@ class Archiver:
 
         diffs = Archive.compare_archives_iter(archive1, archive2, matcher, can_compare_chunk_ids=can_compare_chunk_ids)
         # Conversion to string and filtering for diff.equal to save memory if sorting
-        diffs = ((path, str(diff)) for path, diff in diffs if not diff.equal)
+        diffs = ((path, diff.changes()) for path, diff in diffs if not diff.equal)
 
         if args.sort:
             diffs = sorted(diffs)
@@ -3709,6 +3714,8 @@ class Archiver:
                                help='Override check of chunker parameters.')
         subparser.add_argument('--sort', dest='sort', action='store_true',
                                help='Sort the output lines by file path.')
+        subparser.add_argument('--json-lines', action='store_true',
+                               help='Format output as JSON Lines. ')
         subparser.add_argument('location', metavar='REPO::ARCHIVE1',
                                type=location_validator(archive=True),
                                help='repository location and ARCHIVE1 name')

+ 39 - 29
src/borg/item.pyx

@@ -418,27 +418,31 @@ class ItemDiff:
         self._numeric_owner = numeric_owner
         self._can_compare_chunk_ids = can_compare_chunk_ids
         self.equal = self._equal(chunk_iterator1, chunk_iterator2)
-
-    def __repr__(self):
-        if self.equal:
-            return 'equal'
-
         changes = []
 
         if self._item1.is_link() or self._item2.is_link():
-            changes.append(self._link_string())
+            changes.append(self._link_diff())
 
         if 'chunks' in self._item1 and 'chunks' in self._item2:
-            changes.append(self._content_string())
+            changes.append(self._content_diff())
 
         if self._item1.is_dir() or self._item2.is_dir():
-            changes.append(self._dir_string())
+            changes.append(self._dir_diff())
 
         if not (self._item1.get('deleted') or self._item2.get('deleted')):
-            changes.append(self._owner_string())
-            changes.append(self._mode_string())
+            changes.append(self._owner_diff())
+            changes.append(self._mode_diff())
+
+        # filter out empty changes
+        self._changes = [ch for ch in changes if ch] 
 
-        return ' '.join((x for x in changes if x))
+    def changes(self):
+        return self._changes
+
+    def __repr__(self):
+        if self.equal:
+            return 'equal'
+        return ' '.join(str for d,str in self._changes)
 
     def _equal(self, chunk_iterator1, chunk_iterator2):
         # if both are deleted, there is nothing at path regardless of what was deleted
@@ -461,46 +465,52 @@ class ItemDiff:
 
         return True
 
-    def _link_string(self):
+    def _link_diff(self):
         if self._item1.get('deleted'):
-            return 'added link'
+            return ({"type": 'added link'}, 'added link')
         if self._item2.get('deleted'):
-            return 'removed link'
+            return ({"type": 'removed link'}, 'removed link')
         if 'source' in self._item1 and 'source' in self._item2 and self._item1.source != self._item2.source:
-            return 'changed link'
+            return ({"type": 'changed link'}, 'changed link')
 
-    def _content_string(self):
+    def _content_diff(self):
         if self._item1.get('deleted'):
-            return ('added {:>13}'.format(format_file_size(self._item2.get_size())))
+            sz = self._item2.get_size()
+            return ({"type": "added", "size": sz}, 'added {:>13}'.format(format_file_size(sz)))
         if self._item2.get('deleted'):
-            return ('removed {:>11}'.format(format_file_size(self._item1.get_size())))
+            sz = self._item1.get_size()
+            return ({"type": "removed", "size": sz}, 'removed {:>11}'.format(format_file_size(sz)))
         if not self._can_compare_chunk_ids:
-            return 'modified'
+            return ({"type": "modified"}, "modified")
         chunk_ids1 = {c.id for c in self._item1.chunks}
         chunk_ids2 = {c.id for c in self._item2.chunks}
         added_ids = chunk_ids2 - chunk_ids1
         removed_ids = chunk_ids1 - chunk_ids2
         added = self._item2.get_size(consider_ids=added_ids)
         removed = self._item1.get_size(consider_ids=removed_ids)
-        return ('{:>9} {:>9}'.format(format_file_size(added, precision=1, sign=True),
-                                     format_file_size(-removed, precision=1, sign=True)))
-
-    def _dir_string(self):
+        return ({"type": "modified", "added": added, "removed": removed},
+            '{:>9} {:>9}'.format(format_file_size(added, precision=1, sign=True),
+            format_file_size(-removed, precision=1, sign=True)))
+ 
+    def _dir_diff(self):
         if self._item2.get('deleted') and not self._item1.get('deleted'):
-            return 'removed directory'
+            return ({"type": 'removed directory'}, 'removed directory')
         if self._item1.get('deleted') and not self._item2.get('deleted'):
-            return 'added directory'
+            return ({"type": 'added directory'}, 'added directory')
 
-    def _owner_string(self):
+    def _owner_diff(self):
         u_attr, g_attr = ('uid', 'gid') if self._numeric_owner else ('user', 'group')
         u1, g1 = self._item1.get(u_attr), self._item1.get(g_attr)
         u2, g2 = self._item2.get(u_attr), self._item2.get(g_attr)
         if (u1, g1) != (u2, g2):
-            return '[{}:{} -> {}:{}]'.format(u1, g1, u2, g2)
+            return ({"type": "owner", "old_user": u1, "old_group": g1, "new_user": u2, "new_group": g2},
+                    '[{}:{} -> {}:{}]'.format(u1, g1, u2, g2))
 
-    def _mode_string(self):
+    def _mode_diff(self):
         if 'mode' in self._item1 and 'mode' in self._item2 and self._item1.mode != self._item2.mode:
-            return '[{} -> {}]'.format(stat.filemode(self._item1.mode), stat.filemode(self._item2.mode))
+            mode1 = stat.filemode(self._item1.mode)
+            mode2 = stat.filemode(self._item2.mode)
+            return ({"type": "mode", "old_mode": mode1, "new_mode": mode2}, '[{} -> {}]'.format(mode1, mode2))
 
     def _content_equal(self, chunk_iterator1, chunk_iterator2):
         if self._can_compare_chunk_ids:

+ 78 - 0
src/borg/testsuite/archiver.py

@@ -4060,9 +4060,87 @@ class DiffArchiverTestCase(ArchiverTestCaseBase):
             if are_hardlinks_supported():
                 assert 'input/hardlink_target_replaced' not in output
 
+        def do_json_asserts(output, can_compare_ids):
+            def get_changes(filename, data):
+                chgsets = [j['changes'] for j in data if j['path'] == filename]
+                assert len(chgsets) < 2
+                # return a flattened list of changes for given filename
+                return [chg for chgset in chgsets for chg in chgset]
+
+            # convert output to list of dicts
+            joutput = [json.loads(line) for line in output.split('\n') if line]
+
+            # File contents changed (deleted and replaced with a new file)
+            expected = {'type': 'modified', 'added': 4096, 'removed': 1024} if can_compare_ids else {'type': 'modified'}
+            assert expected in get_changes('input/file_replaced', joutput)
+
+            # File unchanged
+            assert not any(get_changes('input/file_unchanged', joutput))
+
+            # Directory replaced with a regular file
+            if 'BORG_TESTS_IGNORE_MODES' not in os.environ:
+                assert {'type': 'mode', 'old_mode': 'drwxr-xr-x', 'new_mode': '-rwxr-xr-x'} in \
+                    get_changes('input/dir_replaced_with_file', joutput)
+
+            # Basic directory cases
+            assert {'type': 'added directory'} in get_changes('input/dir_added', joutput)
+            assert {'type': 'removed directory'} in get_changes('input/dir_removed', joutput)
+
+            if are_symlinks_supported():
+                # Basic symlink cases
+                assert {'type': 'changed link'} in get_changes('input/link_changed', joutput)
+                assert {'type': 'added link'} in get_changes('input/link_added', joutput)
+                assert {'type': 'removed link'} in get_changes('input/link_removed', joutput)
+
+                # Symlink replacing or being replaced
+                assert any(chg['type'] == 'mode' and chg['new_mode'].startswith('l') for chg in
+                    get_changes('input/dir_replaced_with_link', joutput))
+                assert any(chg['type'] == 'mode' and chg['old_mode'].startswith('l') for chg in
+                    get_changes('input/link_replaced_by_file', joutput))
+
+                # Symlink target removed. Should not affect the symlink at all.
+                assert not any(get_changes('input/link_target_removed', joutput))
+
+            # The inode has two links and the file contents changed. Borg
+            # should notice the changes in both links. However, the symlink
+            # pointing to the file is not changed.
+            expected = {'type': 'modified', 'added': 13, 'removed': 0} if can_compare_ids else {'type': 'modified'}
+            assert expected in get_changes('input/empty', joutput)
+            if are_hardlinks_supported():
+                assert expected in get_changes('input/hardlink_contents_changed', joutput)
+            if are_symlinks_supported():
+                assert not any(get_changes('input/link_target_contents_changed', joutput))
+
+            # Added a new file and a hard link to it. Both links to the same
+            # inode should appear as separate files.
+            assert {'type': 'added', 'size': 2048} in get_changes('input/file_added', joutput)
+            if are_hardlinks_supported():
+                assert {'type': 'added', 'size': 2048} in get_changes('input/hardlink_added', joutput)
+
+            # check if a diff between non-existent and empty new file is found
+            assert {'type': 'added', 'size': 0} in get_changes('input/file_empty_added', joutput)
+
+            # The inode has two links and both of them are deleted. They should
+            # appear as two deleted files.
+            assert {'type': 'removed', 'size': 256} in get_changes('input/file_removed', joutput)
+            if are_hardlinks_supported():
+                assert {'type': 'removed', 'size': 256} in get_changes('input/hardlink_removed', joutput)
+
+            # Another link (marked previously as the source in borg) to the
+            # same inode was removed. This should not change this link at all.
+            if are_hardlinks_supported():
+                assert not any(get_changes('input/hardlink_target_removed', joutput))
+
+            # Another link (marked previously as the source in borg) to the
+            # same inode was replaced with a new regular file. This should not
+            # change this link at all.
+            if are_hardlinks_supported():
+                assert not any(get_changes('input/hardlink_target_replaced', joutput))
+
         do_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1a'), True)
         # We expect exit_code=1 due to the chunker params warning
         do_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1b', exit_code=1), False)
+        do_json_asserts(self.cmd('diff', self.repository_location + '::test0', 'test1a', '--json-lines'), True)
 
     def test_sort_option(self):
         self.cmd('init', '--encryption=repokey', self.repository_location)