Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent Bug during loss computation #152

Closed
leonardtschora opened this issue Jun 7, 2021 · 4 comments · Fixed by #153
Closed

Silent Bug during loss computation #152

leonardtschora opened this issue Jun 7, 2021 · 4 comments · Fixed by #153
Assignees
Labels
bug Something isn't working

Comments

@leonardtschora
Copy link
Collaborator

Hi everyone,

I think the loss computation of the generic fit! function is bugged.

The loss function is defined as loss_func(x, y) = loss(chain(x), y) and computed as mean(loss_func(X[i], y[i]) for i=1:length(X)). However, the result of chain(X[i]) yields a batch_size * 1 matrix that causes a silent bug during the loss computation.

The default loss is Flux.Losses,mse which is computed as agg((ŷ .- y) .^ 2). We expect the result of (ŷ .- y) to be a vector of length batch_size containing the differences between prediction and labels. However, given that is a Matrix, the result of the broadcast yields a batch_size * batch_size Matrix containing inconsistent values, that is next reduced to a scalar with the agg function. This causes the loss computation to yield no errors but the computed value is wrong!

Here is a MWE to illustrate:

using  MLJ, MLJFlux, Flux, DataFrames

# Define a small network structure
mutable struct TESTBuilder <: MLJFlux.Builder end
MLJFlux.build(builder::TESTBuilder, n_in, n_out) = Chain(Dense(n_in, n_out))

n = 100
d = 5
# We generate dummy data in the appropriate format
X_ = DataFrame(rand(Float64, n, d), :auto);
y_ = X_.x1 .^2 + X_.x2 .* X_.x3 - 4 * X_.x4

# Create the model
Model = @load NeuralNetworkRegressor pkg=MLJFlux verbosity=0
model = Model(builder = TESTBuilder(), batch_size=25, epochs=100)

# Check that the model works
mach = machine(model, X_, y_)
fit!(mach)
predict(mach, X_)
loss(predict(mach, X_), y_)

# Let's dig into the code.
# In MLJFlux.Regressor fit function
n_input = Tables.schema(X_).names |> length
data = MLJFlux.collate(model, X_, y_)
target_is_multivariate = Tables.istable(y)
if target_is_multivariate
    target_column_names = Tables.schema(y).names
else
    target_column_names = [""] # We won't be using this
end
n_output = length(target_column_names)
chain = MLJFlux.build(model.builder, n_input, n_output)

# We then call MLJFlux.core fit! function
loss = model.loss
X = data[1]
y = data[2]
loss_func(x, y) = loss(chain(x), y)
prev_loss = mean(loss_func(X[i], y[i]) for i=1:length(X)) # The loss is computed but is wrong!

# Let's check it : 
i = 1
x = X[i]
ytrue = y[1]
ypred = chain(x)
loss(ytrue, ypred)
ypred .- ytrue # We obtain a 25 * 25 matrix...

# Let's compute manually the mse loss
function mymse(ytrue, ypred)
    acc = 0.0
    for (yp, yt) in zip(ypred, ytrue)
        acc += (yp - yt)^2
    end
    acc /= length(ytrue) 
end
mymse(ytrue, ypred)
loss(ytrue, ypred) # The 2 results are drasticly different.

I think this issue is quite important because it potentially make all models not-converging if using a batch_size > 1. Moreover, it gives the user unreliable results.

I have thought of 2 ways to tackle it :

  1. Use safe loss functions such as mymse
  2. Convert the result of the call to chain in to a safe format

Anyway, let me know the updates on this issue, I think it is very important.

Thanks by advance.

@ablaom ablaom added the bug Something isn't working label Jun 8, 2021
@ablaom
Copy link
Collaborator

ablaom commented Jun 8, 2021

Yes, @Leonardbcm, this is a bug. Great catch! This is invaluable user feedback.

Rather than messing with the result of chain, what do you think of applying your inserting your tomat mapping from #142?

loss_func(x, y) = loss(chain(x), tomat(y))   # option 1

Or, perhaps even more natural, would be redefine reformat to return a 1 x n matrix, instead of a vector:

reformat(y, ::Type{<:AbstractVector{<:Continuous}}) = reshape(y, 1, length(y))   # option 2

@leonardtschora
Copy link
Collaborator Author

This option seems fine to me. The new test is passing on my computer and I seem to have more coherent results in my model trainings.

I hope there are not too many other silent bugs like this one. Comparaison of performances between scikit learn's model become handy at this point!

Thanks for the quick and solid support.

@ablaom
Copy link
Collaborator

ablaom commented Jun 9, 2021

Comparaison of performances between scikit learn's model become handy at this point!

Agreed. Can one control initialisation of weights there (to avoid RNG dependencies)?

Be great if you have a chance to look into this yourself: #157

@leonardtschora
Copy link
Collaborator Author

Yes it is possible and as for everything else in Python, a bit less straigth-forward. One has to define a new class from tf.keras.initializers.Initializer. The following example should be an entry point (I used it for another experiment and should be running).

Anyway, the recipe in well described.

# Kernel and bias initializers
import tensorflow as tf
class ManualIntializer(tf.keras.initializers.Initializer):
    def __init__(self, weights):
        self.weights = weights
        
    def __call__(self, shape, dtype=None):
        w = tf.convert_to_tensor(self.weights, dtype=dtype)
        w = tf.reshape(w, shape)        
        return w

    def get_config(self):  # To support serialization
        return {'weights': self.weights}

I'll see if I can manage to spare some time to make a small comparison experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants