Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⭐️ Entity embedder interface is here #286

Merged
merged 13 commits into from
Jan 20, 2025
Merged

⭐️ Entity embedder interface is here #286

merged 13 commits into from
Jan 20, 2025

Conversation

EssamWisam
Copy link
Collaborator

Add an interface for EntityEmbedder that can wrap any basic deep learning model in an unsupervised model.

using MLJFlux
using MLJ
using CategoricalArrays

N = 200
X = (;
    Column1 = repeat(Float32[1.0, 2.0, 3.0, 4.0, 5.0], Int(N / 5)),
    Column2 = categorical(repeat(['a', 'b', 'c', 'd', 'e'], Int(N / 5))),
    Column3 = categorical(repeat(["b", "c", "d", "f", "f"], Int(N / 5)), ordered = true),
    Column4 = repeat(Float32[1.0, 2.0, 3.0, 4.0, 5.0], Int(N / 5)),
    Column5 = randn(Float32, N),
    Column6 = categorical(
        repeat(["group1", "group1", "group2", "group2", "group3"], Int(N / 5)),
    ),
)

y = categorical([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])           # Classification

NeuralNetworkClassifier = @load NeuralNetworkClassifier pkg=MLJFlux

clf = NeuralNetworkClassifier(embedding_dims=Dict(:Column2 => 2, :Column3 => 2))

emb = EntityEmbedder(clf)

mach = machine(emb, X, y)

fit!(mach)

Xnew = transform(mach, X)
Xnew

PR Checklist

  • Tests are added
  • Documentation, if applicable

@ablaom
Copy link
Collaborator

ablaom commented Dec 28, 2024

Thanks @EssamWisam for this valuable contribution.

Do we know why buildkite tests are failing?

@EssamWisam
Copy link
Collaborator Author

Everything runs fine locally. For buildkite logs:
image

Potential change that triggered this is adding the rng argument here:

        clf = models[1](
                builder = MLJFlux.Short(n_hidden = 5, dropout = 0.2),
                optimiser = Optimisers.Adam(0.01),
                batch_size = 8,
                epochs = 100,
                acceleration = CUDALibs(),
                optimiser_changes_trigger_retraining = true,
                embedding_dims = embedding_dims[3],
                rng=42
            )

It's needed to check that the output is the same with or wirhout the wrapper.

@ablaom
Copy link
Collaborator

ablaom commented Dec 28, 2024

Thanks @EssamWisam for the diagnostics. That's very helpful. The good news is that a warning we added previously has correctly flagged the issue.

The build kite tests include GPU tests, which is why you are not seeing the issue locally, I expect. Reproducibility using RNGs on a GPU is a can of worms, and we need to dodge that. The layers are initialised on the CPU and moved across. It's just the dropout that causes the problems.

Can you please try this:

  • get rid of the dropout (set to 0)
  • in a prelminary step, define stable_rng=StableRNG(123), and in your classifier set rng=stable_rng instead of seeding it.
  • make a second classifier clf2 = deepcopy(clf). Use clf2 instead of clf in your second machine (line 204). (At present, when you fit using clf, it's rng hyperparameter is being mutated. By making a deep copy, we ensure clf2 has an rng beginning in the same state. Probably, we should change fit to make a deep copies of the rng before use, to ensure the model is never mutated, but that's a discussion for another day.)

Unrelated comment: It looks like models[1] was (at some time) meant to be models[i]. Is the more restrictive test intentional? Probably testing one of the models is sufficient, so I don't have a problem with this. I just noticed it.

@EssamWisam
Copy link
Collaborator Author

I did deep copying and no dropout as you said. When I use the stable_rng variable the equality fails so I left it as 42.

@EssamWisam
Copy link
Collaborator Author

It looks like models[1] was (at some time) meant to be models[i].

Indeed, I can look into this later.

@ablaom
Copy link
Collaborator

ablaom commented Dec 30, 2024

Okay, sorry, we need something without a dropout layer at all. How about something like builder = MLP(hidden=(10, 10))?

@EssamWisam
Copy link
Collaborator Author

Done.

Copy link
Collaborator

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work. Please see what you can do with my suggestions.

@EssamWisam
Copy link
Collaborator Author

@ablaom I took actions regarding all the points. Please check if it's ready now and thank you.

Copy link
Collaborator

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work, thanks.

One final thing I noticed is that NeuralNetworkBinaryClassifier is supported by the wrapper, but this is missing from the docs.

Otherwise good to go

@EssamWisam
Copy link
Collaborator Author

This is great work, thanks.

One final thing I noticed is that NeuralNetworkBinaryClassifier is supported by the wrapper, but this is missing from the docs.

Otherwise good to go

I used to think that NeuralNetworkBinaryClassifier is redundant (and did not know its exposed in docs) because NeuralNetworkClassifier exists and Sigmoid function is a special case of Softmax when there are two classes.

I do indeed wonder why both exist, now that I realize that the binary classifier is exposed in MLJFlux docs.

In any case, I added it in the docs here. Feel free to merge.

@ablaom
Copy link
Collaborator

ablaom commented Jan 20, 2025

I used to think that NeuralNetworkBinaryClassifier is redundant (and did not know its exposed in docs) because NeuralNetworkClassifier exists and Sigmoid function is a special case of Softmax when there are two classes.

This classifier was added by @tiemvanderdeure . My understanding is that it provides some binary-specific optimisations, which was worth doing as it is such a common use case.

@EssamWisam
Copy link
Collaborator Author

I used to think that NeuralNetworkBinaryClassifier is redundant (and did not know its exposed in docs) because NeuralNetworkClassifier exists and Sigmoid function is a special case of Softmax when there are two classes.

This classifier was added by @tiemvanderdeure . My understanding is that it provides some binary-specific optimisations, which was worth doing as it is such a common use case.

If a Sigmoid is used then I can imagine that the final matrix multiplication does becomes faster because the corresponding matrix now has one less column. That said, if it's true that they are mathematically equivalent, in the way I understand, then it may be better to automatically switch the NeuralNetworkClassifier to Sigmoid when there are two classes and only expose it.

Just sharing thoughts.

@ablaom ablaom merged commit 111b308 into dev Jan 20, 2025
6 checks passed
@ablaom ablaom deleted the entity-embedder branch January 20, 2025 02:48
This was referenced Jan 20, 2025
@tiemvanderdeure
Copy link
Contributor

Yes, I added the binary classifier because it wasn't possible to use a sigmoid finalizer or binary cross-entropy loss with the existing classifier.

That said, if it's true that they are mathematically equivalent, in the way I understand, then it may be better to automatically switch the NeuralNetworkClassifier to Sigmoid when there are two classes and only expose it.

I'm not against this at all. I wasn't aware that they are mathematically equivalent. I think I did some testing and got better results with the binary case (or it was just faster, I don't exactly remember).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants