-
-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'check if file already in db by URLs' does not work #1667
Comments
Thank you for this report. I am not sure if this is a bug, but I think I can say there is bad user feedback on what is going on here. The 'have we seen this file before' logic in the downloader can get pretty tricky. There's a bunch of situations where we cannot be confident in the result, and here the system generally falls back to 'I do not know for sure, so we'll let the download go ahead'. An example of this is when a URL that has an ostensible match to a file also has matches to other files. These duplicate mappings can be added by various means, either merging in the duplication system, or a booru that suggests an incorrect 'source URL', or as in your case a manual copy. In this case, hydrus has good evidence that the URL-mapping it matched is not definite, and in your case it was indeed correct--the URL would result in downloading a new file for which it has no delete record, and so it imports it. Had the 'clear deletion record' not been set, I think the download would have either fetched a hash and matched it to 'previously deleted', or it would have downloaded the file again, calculated hash locally, and then come to the same result. There is a similar logical exception for a file that has two refer-capable URLs within the same domain. somebooru.net/123 and somebooru.net/567--hydrus is not certain about which URL or file mapping is correct here, so it discards that URL as a potential to rely on. I will remove the bug label, but this is a good place to say that I should have some UI, let's say somewhere in the file log, that can record or otherwise better explain the logic behind the downloader engine's decisions here. Maybe I can set a 'note' for odd situations like this, despite them coming up as 'success' in the end. |
Oh--I should say, if you want to build a workflow out of this sort of conversion, I recommend you move your URLs rather than copying. I think that'll preserve the 1-to-1 nature of your file-url mapping store, even if it doesn't actually reflect the reality of which files are actually at those URL endpoints. I did look at some advanced logic that would navigate this situation of n-mapped-urls when the n-files are actually duplicates, but it wasn't a trivial problem to solve, and this stuff can get complicated, so I stepped back for KISS reasons. I will likely revisit seriously when we get to more automated duplicate merging and en masse file conversion tech. |
I have read the API document in detail again and do some tests, Anyway,thanks for your work.I have already managed over 1.2TB of files using this,the detailed API document makes my work easier,and this is definitely the best file manager I have ever used. |
Great--that sounds good. I am really glad you like my program! Let me know if you run into any more trouble. |
Hydrus version
v606
Qt major version
Qt 6
Operating system
Windows 11
Install method
Extract
Install and OS comments
No response
Bug description and reproduction
In my use case,I need to use URLs to check duplication,
but image downloaded again althrough another image has same url.
A simple way to reproduce bug:
it downloaded again althrough modified file has "https://danbooru.donmai.us/posts/1" in urls
step 4 is important,without it then without bug.
Log output
The text was updated successfully, but these errors were encountered: