Use a local index and SHA-256 digests to track Kindle books #6

gfitzp · 2025-02-19T23:39:19Z

Made some updates to make my previous Kindle logic a bit more generic so the ETag comparison works with both Kindle and ePub formats, and saves a local cache file for both file types.

pbryan · 2025-02-23T20:22:14Z

Apologies, Glenn, I've been slammed this week. I'll be reviewing this today.

gfitzp · 2025-02-23T23:33:02Z

No worries, no rush!

pbryan

The most significant change I'm suggesting here is to drop the idea of tracking ETags, and focusing on using file hashes, as they allow for more durable file tracking. It's not foolproof though (a file could be rewritten by something else, changing its hash, but it will be far superior to tracking by file path.)

.gitignore

sebsync.py

gfitzp · 2025-02-27T03:21:26Z

I think I've got everything taken care of, if you'd like to take another look when you get a chance! I tested it with locally renaming and deleting both .azw3 and .epub files and it seems to work as I expected it to.

pbryan

Thanks for continuing to work on this. There are some more issues I've raised to consider. I hope you find this helpful.

pbryan · 2025-02-27T06:30:21Z

sebsync.py

 import xml.etree.ElementTree as ElementTree
 import zipfile

 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
+from platformdirs import *


This will need to be added as a dependency in pyproject.toml.

pbryan · 2025-02-27T06:32:27Z

sebsync.py

+            if not path.is_file():
+                continue
+            try:
+                if path.name in local_cache:


If the file were moved or renamed, how would it be located in the index? (See my comment below on line 171.) I suggest that the hash itself become the dictionary key (or alternatively, if you prefer a more structured approach, just a list of dicts with key-value pairs for each book, hexdigest being one of the keys as you're doing below.)

pbryan · 2025-02-27T06:34:59Z

sebsync.py

                        path=path,
-                        modified=fromisoformat(modified.text),
+                        modified=fromisoformat(local_cache[path.name].get("modified")),


I suggest that the modification timestamp should be from the filesystem metadata. The other side of that coin is we should set the file modification time when we download it so that it can be compared in books_are_different, which already performs such a comparison.

pbryan · 2025-02-27T06:40:49Z

sebsync.py

+        # first create a list of all the local titles
+        local_titles = {}
+
+        if options.type == "kindle":


Isn't it guaranteed to be kindle already from line 543 above?

pbryan · 2025-02-27T06:42:39Z

sebsync.py

+        if options.type == "kindle":
+            stored_files = options.books.glob("**/*.azw3")
+            downloaded_files = options.downloads.glob("**/*.azw3")
+        else:


This code will never execute.

pbryan · 2025-02-27T06:46:55Z

sebsync.py

+                            local_ebooks.append(local_ebook)
+                            local_cache[path.name] = local_cache[entry]
+                            local_cache.pop(entry, None)
+            except Exception as e:


I understand why to check for this exception in the case of EPUBs (corrupt zip file, unexpected internal files, unexpected XML structure). Unclear to me when an exception would be raised for a kinde book, especially as we never actually open it to process its content.

pbryan · 2025-02-27T06:48:39Z

sebsync.py

                    )
                    local_ebooks.append(local_ebook)
-        except Exception:
-            echo_status(path, Status.UNKNOWN)
+                else:


OK, so I see here now the attempt to use the hash. It seems simpler to just do this and not even try to index by path name, which I would expect in a majority of cases, will never match due to being moved or renamed.

pbryan · 2025-02-27T06:51:56Z

.gitignore

@@ -9,3 +9,4 @@ dist/
 .vscode
 .venv
 prof/
+


An empty line?

pbryan · 2025-02-27T06:53:18Z

sebsync.py

 import xml.etree.ElementTree as ElementTree
 import zipfile

 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
+from platformdirs import *


Please, no wildcard imports.

pbryan · 2025-02-27T06:55:05Z

sebsync.py

@@ -184,6 +239,16 @@ def download_ebook(url: str, path: Path, status: str) -> None:
        for chunk in response.iter_content(chunk_size=1 * 1024 * 1024):
            file.write(chunk)
    download.replace(path)
+    return response.headers


Why return headers?

gfitzp · 2025-02-28T16:36:12Z

Thanks; yes, the feedback is helpful. I'm also kicking myself for not catching some basic errors. I apologize if there was more hand-holding than you might have anticipated as programming is more of a hobby for me than a profession. There's also a lot more functionality within sebsync than I anticipated, simply because I wasn't using the application the same way: I was just downloading from SE and keeping my local directory up to date with the current SE library, and used that directory to import into Calibre, and wasn't trying to commingle the sebsync library with my working one. I'll be busy for the next few days, but I'll take a look at these issues when I get a chance.

# Conflicts: # sebsync.py

gfitzp · 2025-03-08T02:12:49Z

Thanks again for your help and patience! I realized I was overthinking things a bit so I started anew rather than play whack-a-mole, and looked more into uv, and hopefully this attempt might be a bit better. Let me know if further changes might be necessary.

Glenn Fitzpatrick added 3 commits February 18, 2025 21:14

Added Kindle support via cache file

32e21a6

Formatted code with Black

ea6c497

Add ETag comparison for both Kindle and ePub files

5912d47

gfitzp mentioned this pull request Feb 19, 2025

Support Kindle ebooks #2

Open

Prevent modifications when using --dry-run

971e8d2

pbryan requested changes Feb 25, 2025

View reviewed changes

Glenn Fitzpatrick added 9 commits February 26, 2025 16:57

Formatted with ruff

70d3206

Remove eTag functionality

8bb8f6e

Use platformdirs to save cache index

0a9e7c2

Use platformdirs to save cache index

aa82e57

Merge branch 'etag' of github.com:gfitzp/sebsync into etag

8dbddcb

Restore HEAD request to proper spot in books_are_different

0b7b54c

Use JSON instead of pickle

409a5d3

Convert date modified from ISO format when creating local_ebooks

d7fb640

Use cache index and hexdigests only for Kindle files

e8c473a

Glenn Fitzpatrick added 2 commits February 26, 2025 22:25

One last Kindle-centric fix

03cec4e

Errant debug statement

e57bb5a

gfitzp changed the title ~~Added logic for Kindle ebooks and using ETag comparison when determining if local and remote files are the same~~ Use a local index and SHA-256 digests to track Kindle books Feb 27, 2025

pbryan requested changes Feb 27, 2025

View reviewed changes

Glenn Fitzpatrick added 2 commits March 7, 2025 20:59

Keep track of Kindle files via hexdigests

85aded6

Merge branch 'hexdigest' into etag

6c35a15

# Conflicts: # sebsync.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a local index and SHA-256 digests to track Kindle books #6

Use a local index and SHA-256 digests to track Kindle books #6

gfitzp commented Feb 19, 2025

pbryan commented Feb 23, 2025

gfitzp commented Feb 23, 2025

pbryan left a comment

gfitzp commented Feb 27, 2025 •

edited

Loading

pbryan left a comment

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

pbryan Feb 27, 2025

gfitzp commented Feb 28, 2025

gfitzp commented Mar 8, 2025

@@ @@ -9,3 +9,4 @@ dist/ @@
               .vscode
               .venv
               prof/

Use a local index and SHA-256 digests to track Kindle books #6

Are you sure you want to change the base?

Use a local index and SHA-256 digests to track Kindle books #6

Conversation

gfitzp commented Feb 19, 2025

pbryan commented Feb 23, 2025

gfitzp commented Feb 23, 2025

pbryan left a comment

Choose a reason for hiding this comment

gfitzp commented Feb 27, 2025 • edited Loading

pbryan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfitzp commented Feb 28, 2025

gfitzp commented Mar 8, 2025

gfitzp commented Feb 27, 2025 •

edited

Loading