'check if file already in db by URLs' does not work #1667

444man · 2025-01-24T23:23:47Z

Hydrus version

v606

Qt major version

Qt 6

Operating system

Windows 11

Install method

Extract

Install and OS comments

No response

Bug description and reproduction

In my use case,I need to use URLs to check duplication，
but image downloaded again althrough another image has same url.

A simple way to reproduce bug：

download "https://danbooru.donmai.us/posts/1" with url download
modify file (hash changed) in external program and import to hydrus
copy all urls of old file to modified file
choose old file then [delete] then [delete physically now] then [clear delection record]
download "https://danbooru.donmai.us/posts/1" again with url download

it downloaded again althrough modified file has "https://danbooru.donmai.us/posts/1" in urls
step 4 is important,without it then without bug.

Log output

The text was updated successfully, but these errors were encountered:

hydrusnetwork · 2025-01-25T21:46:18Z

Thank you for this report. I am not sure if this is a bug, but I think I can say there is bad user feedback on what is going on here.

The 'have we seen this file before' logic in the downloader can get pretty tricky. There's a bunch of situations where we cannot be confident in the result, and here the system generally falls back to 'I do not know for sure, so we'll let the download go ahead'. An example of this is when a URL that has an ostensible match to a file also has matches to other files. These duplicate mappings can be added by various means, either merging in the duplication system, or a booru that suggests an incorrect 'source URL', or as in your case a manual copy. In this case, hydrus has good evidence that the URL-mapping it matched is not definite, and in your case it was indeed correct--the URL would result in downloading a new file for which it has no delete record, and so it imports it. Had the 'clear deletion record' not been set, I think the download would have either fetched a hash and matched it to 'previously deleted', or it would have downloaded the file again, calculated hash locally, and then come to the same result.

There is a similar logical exception for a file that has two refer-capable URLs within the same domain. somebooru.net/123 and somebooru.net/567--hydrus is not certain about which URL or file mapping is correct here, so it discards that URL as a potential to rely on.

I will remove the bug label, but this is a good place to say that I should have some UI, let's say somewhere in the file log, that can record or otherwise better explain the logic behind the downloader engine's decisions here. Maybe I can set a 'note' for odd situations like this, despite them coming up as 'success' in the end.

hydrusnetwork · 2025-01-25T21:50:00Z

Oh--I should say, if you want to build a workflow out of this sort of conversion, I recommend you move your URLs rather than copying. I think that'll preserve the 1-to-1 nature of your file-url mapping store, even if it doesn't actually reflect the reality of which files are actually at those URL endpoints.

I did look at some advanced logic that would navigate this situation of n-mapped-urls when the n-files are actually duplicates, but it wasn't a trivial problem to solve, and this stuff can get complicated, so I stepped back for KISS reasons. I will likely revisit seriously when we get to more automated duplicate merging and en masse file conversion tech.

444man · 2025-01-26T14:21:40Z

Oh--I should say, if you want to build a workflow out of this sort of conversion, I recommend you move your URLs rather than copying. I think that'll preserve the 1-to-1 nature of your file-url mapping store, even if it doesn't actually reflect the reality of which files are actually at those URL endpoints.

I did look at some advanced logic that would navigate this situation of n-mapped-urls when the n-files are actually duplicates, but it wasn't a trivial problem to solve, and this stuff can get complicated, so I stepped back for KISS reasons. I will likely revisit seriously when we get to more automated duplicate merging and en masse file conversion tech.

I have read the API document in detail again and do some tests,
GET /add_urls/get_url_files will return status of old file and new file,so I think that is the reason.
POST /add_urls/associate_url save me.After POST /add_files/clear_file_deletion_record,send all urls of old file to delete by POST /add_urls/associate_url ,and then the workflow runs as I wish.

Anyway,thanks for your work.I have already managed over 1.2TB of files using this,the detailed API document makes my work easier,and this is definitely the best file manager I have ever used.

hydrusnetwork · 2025-01-27T23:07:35Z

Great--that sounds good. I am really glad you like my program! Let me know if you run into any more trouble.

444man added the bug label Jan 24, 2025

hydrusnetwork added feature-request system:user-interface Looks and actions of the user interface and removed bug labels Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'check if file already in db by URLs' does not work #1667

'check if file already in db by URLs' does not work #1667

444man commented Jan 24, 2025

hydrusnetwork commented Jan 25, 2025

hydrusnetwork commented Jan 25, 2025

444man commented Jan 26, 2025

hydrusnetwork commented Jan 27, 2025

'check if file already in db by URLs' does not work #1667

'check if file already in db by URLs' does not work #1667

Comments

444man commented Jan 24, 2025

Hydrus version

Qt major version

Operating system

Install method

Install and OS comments

Bug description and reproduction

Log output

hydrusnetwork commented Jan 25, 2025

hydrusnetwork commented Jan 25, 2025

444man commented Jan 26, 2025

hydrusnetwork commented Jan 27, 2025