Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo dataset added to labelstudio on every run #126

Open
themattinthehatt opened this issue Dec 9, 2024 · 8 comments
Open

demo dataset added to labelstudio on every run #126

themattinthehatt opened this issue Dec 9, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@themattinthehatt
Copy link
Collaborator

Another thing I noticed is that whenever I restart the app, it spawns an additional "mouse mirror" project in label studio. As if it's adding to the label studio database every time it is restarted.

Originally posted by @hummuscience in #125 (comment)

@themattinthehatt
Copy link
Collaborator Author

I haven't seen that before - here is the logic that checks to see if the demo dataset has been imported:

Pose-app/app.py

Line 116 in 3942b2c

# check to see if the demo dataset has already been imported

Do you see the demo dataset anywhere in ....Pose-app/data/labelstudio_db/export/{proj_id}-info.json files? The demo dataset should have an associated json with project title "mirror-mouse-example" and the "task_number" field should be >=90, indicating all labeled frames were uploaded.

@hummuscience
Copy link

Screenshot 2024-12-09 at 18 45 10

I just restarted the app and this is what I see.

Inside the mentioned folder it looks like this:

-rw-r--r--. 1 abdelhaym departmentn5 348K Dec  9 18:44 project-2-at-2024-12-09-17-44-0a980389.json
-rw-r--r--. 1 abdelhaym departmentn5  505 Dec  9 18:44 project-2-at-2024-12-09-17-44-0a980389-info.json
-rw-r--r--. 1 abdelhaym departmentn5 348K Dec  9 18:45 project-2-at-2024-12-09-17-45-0a980389.json
-rw-r--r--. 1 abdelhaym departmentn5  505 Dec  9 18:45 project-2-at-2024-12-09-17-45-0a980389-info.json
-rw-r--r--. 1 abdelhaym departmentn5 348K Dec  9 18:46 project-2-at-2024-12-09-17-46-0a980389.json
-rw-r--r--. 1 abdelhaym departmentn5  505 Dec  9 18:46 project-2-at-2024-12-09-17-46-0a980389-info.json

@themattinthehatt
Copy link
Collaborator Author

If you open up the *-info.json files do you see two in there for mirror-mouse-example?

I haven't run into this issue before locally or on the cloud, I wonder if running on your network is part of the issue? Though I'm not sure off the top of my head why that would be any different as far as the creation of this LS demo dataset is concerned.

@hummuscience
Copy link

These ar ethe contents of the -info file

{"project":{"title":"mirror-mouse-example","id":2,"created_at":"2024-12-09T17:44:31Z","created_by":"user@localhost","task_number":90,"annotation_number":90},"platform":{"version":"1.13.1+0.gd9b816a.dirty"},"download":{"GET":{"exportType":["JSON"],"download_all_tasks":["False"],"download_resources":["False"]},"time":"2024-12-10T11:51:47Z","result_filename":"\/home\/abdelhaym\/Pose-app\/data\/labelstudio_db\/export\/project-2-at-2024-12-10-11-51-0a980389.json","md5":"0a9803897066fb505806d07141f65196"}}

Is it normal that Pose App keept creating more and more of these files? there are now 2k files in the folder since yesterday

@themattinthehatt
Copy link
Collaborator Author

no definitely not normal - on startup the app checks all the *-info files and if it finds one with the title "mirror-mouse-example" and "task_number">=90 it should skip that step. not sure why the file you shared above isn't triggering that. one hack for now would be to update the "import_demo_dataset" function in app.py, which looks like this:

    def import_demo_dataset(self, src_dir_abs, dst_dir_abs):

        """NOTE
        This is an ugly solution. Previously this function was called from the app constructor,
        which required label studio to be started inside the constructor as well. This led to
        issues with ports. Therefore this import needs to happen in the app's run method.
        However, this means that various parts of this function will execute several times before
        it is finished. Furthermore, this function runs *every* time the app is called.
        """

        if self.import_demo_count > 0:
            return

to this:

    def import_demo_dataset(self, src_dir_abs, dst_dir_abs):

        """NOTE
        This is an ugly solution. Previously this function was called from the app constructor,
        which required label studio to be started inside the constructor as well. This led to
        issues with ports. Therefore this import needs to happen in the app's run method.
        However, this means that various parts of this function will execute several times before
        it is finished. Furthermore, this function runs *every* time the app is called.
        """
        return
        if self.import_demo_count > 0:
            return

(i.e. don't run this function)

Not a pretty fix but should work. I can try looking into the issue on my side but I haven't seen this behavior before so might be a bit tricky. Can you run git rev-parse HEAD inside the Pose-app repo and tell me which commit you're on?

@themattinthehatt themattinthehatt added the bug Something isn't working label Dec 16, 2024
@olivier-cuttlefish
Copy link

I actually have the same issue, my dataset page is flooded with the demo dataset.
git rev-parse HEAD returned the following: edfe0e5ef825611d791f82151cc71efd22840eac

What could be an easy fix to clear out all those duplicates without reinstalling the app ?

Cheers

@themattinthehatt
Copy link
Collaborator Author

@olivier-cuttlefish yes you can delete labelstudio projects manually but it's a bit of a pain - you have to click through a couple of menus for each duplicate.
click on three dots in upper right corner of project tile > Settings > Danger Zone > Delete Project

Can you let me know what platform you're on? Linux vs WSL, local or cloud? I'm not able to reproduce the issue on my end so I'll have to sort that out before I can work on a bug fix.

Thanks for sharing your HEAD id; you're pretty up-to-date, so there's no issue with out-of-date code being the culprit.

@themattinthehatt
Copy link
Collaborator Author

also linking my reply from above, you can just add a single "return" statement in app.py to keep this behavior from happening until I get a more permanent bugfix
#126 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants