-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silent Bug during loss computation #152
Comments
Yes, @Leonardbcm, this is a bug. Great catch! This is invaluable user feedback. Rather than messing with the result of loss_func(x, y) = loss(chain(x), tomat(y)) # option 1 Or, perhaps even more natural, would be redefine reformat(y, ::Type{<:AbstractVector{<:Continuous}}) = reshape(y, 1, length(y)) # option 2 |
This option seems fine to me. The new test is passing on my computer and I seem to have more coherent results in my model trainings. I hope there are not too many other silent bugs like this one. Comparaison of performances between scikit learn's model become handy at this point! Thanks for the quick and solid support. |
Agreed. Can one control initialisation of weights there (to avoid RNG dependencies)? Be great if you have a chance to look into this yourself: #157 |
Yes it is possible and as for everything else in Python, a bit less straigth-forward. One has to define a new class from Anyway, the recipe in well described. # Kernel and bias initializers
import tensorflow as tf
class ManualIntializer(tf.keras.initializers.Initializer):
def __init__(self, weights):
self.weights = weights
def __call__(self, shape, dtype=None):
w = tf.convert_to_tensor(self.weights, dtype=dtype)
w = tf.reshape(w, shape)
return w
def get_config(self): # To support serialization
return {'weights': self.weights} I'll see if I can manage to spare some time to make a small comparison experiment. |
Hi everyone,
I think the loss computation of the generic fit! function is bugged.
The loss function is defined as
loss_func(x, y) = loss(chain(x), y)
and computed asmean(loss_func(X[i], y[i]) for i=1:length(X))
. However, the result ofchain(X[i])
yields abatch_size * 1
matrix that causes a silent bug during the loss computation.The default loss is
Flux.Losses,mse
which is computed asagg((ŷ .- y) .^ 2)
. We expect the result of(ŷ .- y)
to be a vector of lengthbatch_size
containing the differences between prediction and labels. However, given thatŷ
is a Matrix, the result of the broadcast yields abatch_size * batch_size
Matrix containing inconsistent values, that is next reduced to a scalar with theagg
function. This causes the loss computation to yield no errors but the computed value is wrong!Here is a MWE to illustrate:
I think this issue is quite important because it potentially make all models not-converging if using a
batch_size > 1
. Moreover, it gives the user unreliable results.I have thought of 2 ways to tackle it :
mymse
chain
in to a safe formatAnyway, let me know the updates on this issue, I think it is very important.
Thanks by advance.
The text was updated successfully, but these errors were encountered: