Skip to content

BayesDB unpredictably forgets a model learned with the Loom backend #591

@versar

Description

@versar

On multiple occasions, I have built a CrossCat model from some data using the Loom backend, then successfully performed a variety of queries. Suddenly, I experience an error where bayeslite seems to forget that the model has been trained using ANALYZE. Errors occur when I attempt to perform some commands (like SIMULATE and more ANALYZE) but not others (like ESTIMATE DEPENDENCE and ESTIMATE SIMILARITY). This is quite frustrating, especially after ANALYZE was already run for a long time and the CrossCat model looked fine on multiple queries. The only effective solution when I have encountered this bug has been to delete the .bdb file and start from scratch.

Error with ANALYZE
To demonstrate that the generator exists for the purpose of this issue report, I tried %mml INITIALIZE 16 MODELS FOR <generator>. In the notebook I was using, <generator> = 'glove_loom'. As expected when the generator exists, I get the error Generator 'glove_loom' already has models: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. I had already been working with this .bdb file for multiple days using a number of different queries, after training the generator on a large data set using ANALYZE.

When I tried ANALYZE after verifying that the generator exists as above, I got an error. The command I tried was %mml ANALYZE "glove_loom" FOR 30 ITERATION WAIT; and it failed. The stack trace outputted to the notebook cell is attached here.
stack_trace_ANALZYE.txt

Error with SIMULATE
I tried the command %bql SIMULATE <variable> FROM <population> LIMIT 1. This returned an error message 'NoneType' object has no attribute '_predict'. Note: this SIMULATE query is the first time in my workflow that I noticed this set of errors. I attached here a stack trace from performing the same SIMULATE query through the bayeslite api as bdb.execute('SIMULATE <variable> FROM <population> LIMIT 1').fetchall(). In this case, <variable> = 'glove_vector_90' and <population> = 'glove'. This is in the same notebook producing the ANALYZE error above.
stack_trace_simulate.txt

No error with ESTIMATE DEPENDENCE
In the same notebook producing the errors above, the following works as expected:
ESTIMATE DEPENDENCE PROBABILITY FROM PAIRWISE VARIABLES OF glove MODELED BY glove_loom

No error with ESTIMATE SIMILARITY
In the same notebook producing the errors above, the following works as expected:
ESTIMATE SIMILARITY IN THE CONTEXT OF "glove_vector_247" FROM PAIRWISE glove MODELED BY glove_loom

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions