Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF test fails on Xavier platform #207

Open
dellaert opened this issue Nov 8, 2020 · 5 comments
Open

TF test fails on Xavier platform #207

dellaert opened this issue Nov 8, 2020 · 5 comments

Comments

@dellaert
Copy link
Member

dellaert commented Nov 8, 2020

There was one test in feature/box_scaling that failed for me on Xavier NX:

Test Case 'TensorFlowMatrixTests.testConcat' passed (4.669 seconds)
Test Case 'TensorFlowMatrixTests.test_log' started at 2020-11-07 19:45:31.242
/home/dellaert/git/SwiftFusion/Tests/SwiftFusionTests/Core/TensorFlowMatrixTests.swift:38: error: TensorFlowMatrixTests.test_log : XCTAssertTrue failed - value mismatch:
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]
is not equal to
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]
with accuracy 1e-08
Test Case 'TensorFlowMatrixTests.test_log' failed (0.054 seconds)

Might be related to the infinity?

@dellaert
Copy link
Member Author

dellaert commented Nov 8, 2020

PS passes fine on Mac

@ProfFan
Copy link
Collaborator

ProfFan commented Nov 8, 2020

Test Case 'TensorFlowMatrixTests.test_log' started at 2020-11-08 14:43:21.650
/workspaces/SwiftFusion/Tests/SwiftFusionTests/Core/TensorFlowMatrixTests.swift:38: error: TensorFlowMatrixTests.test_log : XCTAssertTrue failed - value mismatch:
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]
is not equal to
[              -inf,                0.0, 0.6931471805599453, 1.0986122886681098,
 1.3862943611198906, 1.6094379124341003]

For me on Linux as well, I think it is very likely to be related to CUDA.

CC @marcrasi @BradLarson is this precision-related?

@BradLarson
Copy link

That test is trying to take log(0), which seems like a bad thing to be testing for. I wouldn't be surprised to have that break in different ways on different platforms. Maybe having the range start at 1 would be a safer lower value?

@ProfFan
Copy link
Collaborator

ProfFan commented Nov 10, 2020

I think it is still perfectly valid to assume -inf==-inf since that is guaranteed by the IEEE754 standard I think. I could be wrong so please correct me if not :)

@dellaert
Copy link
Member Author

I agree with @BradLarson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants