Skip to content

Bug: Unordered Unique Variables lead to non-deterministic data sampling despite seeding #30

@lfrommelt

Description

@lfrommelt

To replicate, execute the following example, restart the runtime and do it again. In the given two-variables-example there is a 50% chance that the variables will be ordered differently between executions, leading to different evaluations:

from equation_tree import EquationTree
from sympy import symbols
import numpy as np

# get an arbitrary equation Tree object with at least two variables
x1, x2 = symbols('x1 x2')
expr = x1**x2
equation=EquationTree.from_sympy(expr, variable_test=lambda x: "x" in x)

# setting a global seed for numpy will make sure, that we re-sample the same "crossings"
np.random.seed(10)

# However, the order of variables_unique is random, due to set being an unordered data type
print(equation.variables_unique)

# Hence the re-sampled values for x1 and x2 might be swapped, changing the evaluation result as well
print(equation.get_evaluation(num_samples=2))

For me in order to make it work, I wrapped the return of the variables_unique property inside a call of sorted() in line 555 in tree.py. If that is not intended in order to get returned the original set type, at least the argument of enumerate in line 1370 should be sorted.

This one was extremely hard to find, partially because I had basically the same bug in my own script as well (i.e. on sympy.free_variables, they do it like EquationTree), giving a very confusing 25% chance of succesfully replicating my experiments xD

Cheers,
Leonard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions