Skip to content

"Tensor had NaN values" when encountering improbable label values #12

@obarnstedt

Description

@obarnstedt

Hi and thanks for all the great work!
I just wanted to point to a potential problem we've encountered during training of DGP. With our dataset, we could successfully run the first 50k iterations of "DGP on labeled frames only", but then for "Running DGP" encountered

tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[{{node VerifyFinite/CheckNumerics}}]]

This occurs in line 818 of fitdgp.py:
[loss_eval, _] = sess.run([loss, train_op], feed_dict)
After some debugging, I could trace the error to labeled frames in which the labels were accidentally set out of the normal range (DLC deletes markers set at x=0, y=0, but here they were accidentally at x=1, y=4; normally, labels were x/y>200). After removing these improbable labels, training continued normally.
It's great that we now had a chance to clean our training dataset, but it would be better if there was a way for DGP to maybe just ignore such labels while giving a precise Warning message to alert the user. Otherwise, it's quite hard for the user to figure out where the actual problem is.
Thanks,
Oliver

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions