Skip to content

Conversation

@chaitanyak175
Copy link

@chaitanyak175 chaitanyak175 commented Oct 22, 2025

Motivation and context

The issue addressed is #8076, which describes a problem where image names could conflict during dataset export when multiple images share the same name across tasks or jobs.
This caused overwriting or inconsistent naming during export.

This PR modifies the mangle_image_name function to handle duplicate image names more gracefully:

  • Returns the original filename for the first occurrence.
  • If a duplicate name is found and task_id is provided, the filename now includes -task_{task_id}.
  • If the name is still taken, a numeric suffix (_1, _2, etc.) is appended.

This ensures consistent and conflict-free exports, especially in multi-task environments.

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (#8076)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@sonarqubecloud
Copy link

names[(subset, name_base)] += 1
return osp.extsep.join([name_base, ext])

if task_id is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary complex. Why not just append the task id if it exists and keep the original logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll update the codebase.

@alexyao2015
Copy link

This PR combined with the description seems to be fully written by an LLM. If you write things like that, you should be reviewing it before forcing others to do so

@chaitanyak175
Copy link
Author

Sorry won't happen again I'll manually write the code and send a PR

Copy link
Contributor

@zhiltsov-max zhiltsov-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sending the PR, please check the comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid changing the file directly, there is an explanatory comment in the beginning of the file.


def mangle_image_name(name: str, subset: str, names: defaultdict[tuple[str, str], int]) -> str:
name, ext = name.rsplit(osp.extsep, maxsplit=1)
def mangle_image_name(name: str, subset: str, names: defaultdict[tuple[str, str], int], task_id: Optional[int] = None) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def mangle_image_name(name: str, subset: str, names: defaultdict[tuple[str, str], int], task_id: Optional[int] = None) -> str:
def mangle_image_name(name: str, subset: str, names: defaultdict[tuple[str, str], int], *, task_id: int) -> str:

Comment on lines +2037 to +2040
if not names[(subset, mangled_base)]:
names[(subset, name_base)] += 1
names[(subset, mangled_base)] += 1
return osp.extsep.join([mangled_base, ext])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block can probably be merged with the while body below.


if not names[(subset, mangled_base)]:
names[(subset, name_base)] += 1
names[(subset, mangled_base)] += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be done in the while block below as well.


i = 1
while i < sys.maxsize:
mangled_base = f"{name_base}_{names[(subset, name_base)]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable can discard the mangled_base defined above, effectively restoring the old behavior. Consider doing mangled_name = f"{mangled_base}_{names[(subset, name_base)]}" instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test for this problem with export, it should probably be added in tests/python/rest_api/test_tasks.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants