-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
early_stopping_rounds? #158
Comments
I don't see that parameter listed here... what makes you think it exists? Also, I'm not sure what the difference is supposed to be between this and |
yeah I think unfortunately it's not part of the model parameter
|
btw related question, is it possible to error when unsupported keywords gets passed to XGBoost? for example the |
I would really like to, this has been an annoyance of mine as well. I've also been burned by this many times. I just opened an issue for this. |
From past work in R, early stopping is a convenience feature provided by the package as opposed to a libxgboost function. The feature relies on capturing the evaluation log and applying a criteria of when to stop. The optimum num_round value can change with other parameters and it is difficult to tune num_round parameter independent of others. There are implications in how one does a grid search. If it's important enough to a workflow, it is possible for one to write their own early stopping function. In order to plot learning curves from cross validation, without duplicating the metrics already available in libxgboost, it was necessary to 'intercept' and parse the evaluation log. I posted my modest attempt(i.e. hack) in #148. It has been working quite well although am sure it could be improved. All are welcome to use and adapt to their needs. For early stopping one would need to modify and parse each line of the log as it is recorded; the posted function parses all the lines together after the run. Using this approach I've been able to create a function to duplicate the cv learning curves one is accustomed to in R (if that is one's workflow). An example .... |
yeah, well, we have to do this externally in Julia because we can't pass a callable to |
This isn't that hard to do with current XGBoost.jl, but yes, it is annoying that one can't access the evaluation metrics it's already computing via I'd be happy to help get through a PR that makes this more ergonomic. I'm open to replacing the |
Not sure I understand. I am able to set the evaluation metrics with the eval_metric keyword parameter. The results show up in the log and are parse-'able. It does require a bit of an end-around by writing a personal version of one or two of the functions in booster.jl but the crux of it is only a handful of lines. |
Recopied from #148. This is my function to capture the logs.
|
I'm not enthused about the prospect of opening the pandora's box of parsing log output, it's bad enough that we are already trying to do this to identify warning messages, in my opinion it would be much nicer to just have it run evaluation normally so the whole thing is accessible and controllable. That said, I wouldn't stand in the way of it, particularly if you left it flexible enough to leave it open for later improvements, it could be replaced with a non-parsing version at some point in the future. |
I am not advocating to add this to the package, just offering a solution to those who 'need' the functionality. I am not a programmer and am not sophisticated to know the downsides. I needed this functionality and did what I could to make it work. The only difference between my function and what is already in booster.jl is that instead of sending a string to INFO , it is saved in a vector. The rest is just combining 3 existing booster.jl functions in to one. The R package does essentially the same thing but has a callback system. Creating that sort of thing is over my head, so I opted for what I could manage. It is working quite well for me. Parsing turned out to be easier than I feared - even with multiple metrics. Of course if libxgboost changes their reporting string structure I am back to revise. Before going this route I did write my own evaluation routines but it became unmanageable. Each objective has its own set of possible metrics and after 3 or 4 it became tedious - there are a lot of objective/metric pairs. Am still trying to get the hang of MLJ and I suppose that is out there for the user. Since this approach uses libxgboost for the metrics, everything is in line with their documentation. I am quite fine with how you have XGBoost.jl structured. I've learned where most things are; if I need a wrinkle here or there am happy to personalize my functions - I thought that was kind of the Julia way??? You do an excellent job maintaining this very important package and providing prompt feedback to users. I thank you !!!! |
actually
I'm not sure if it's for or against the idea to parsing logs to get loss during training, I mean given we already do this we might as well utilize existing infrastructure :) |
It's not infrastructure, it's one badly-written regex, and it's done because there is no alternative. For evaluations there is an alternative, there is no clear reason why this needs to be done by the Again, I'm not going to stand in the way of a PR that provides parsing of the logs to extract evaluation results as a feature, but I think we should keep it flexible enough that it could be made more reasonable in the future. |
one reason to not do it in Julia would be performance, also keep in mind we'd have to some how evaluate it with the same metric as the training and handle GPU etc. |
Why should it be more expensive to run via a separate call to the
We already have calls to do all this stuff, it shouldn't be hard. I don't think it should be more complicated than providing a function argument to |
Actually, it was the need for performance that lead me to explore parsing the log. One could argue that learning curves are old fashioned but that is how I was trained to build a gradient boost model, so for me learning curve it is. I did write a cross validation function based on predict and it works okay. However, consider the task. In a typical 10 fold learning curve one requires 20 calls to predict per round (10 for train and 10 for test). A typical model has 500 rounds. That's 10,000 calls to predict and 10,000 calls to a function that processes the return in to a metric. Although predict allows one to specify a particular round the lower rounds are not cache'd. Predicting round 100 drops through all 1-99 trees even though it went through the same trees when predicting for round 99. For even a simple model this is a lot of stuff. Pre-allocating space for the predictions and as efficient broadcasting as I could, the processing time was considerably improved but still a bit long. Reducing the curve to every 10 rounds brought the processing time to what I considered appropriate. I haven't figured out yet how to do this with MLJ. I am sure it will be better at the 1,000 metric calculations but the 1,000 calls to libxgboost for prediction will be the same. In comparison, the time to parse an evaluation log is trivial. After I re-wrote my learning curve function with data from the parsed log, the run times with evaluations every round are comparable to the 'predict' method every 10 rounds. Also, I don't need to keep track of or code the default metrics. For my needs this seemed the better solution. I did observe that booster.jl grows every tree one round at a time ( a call to libxgboost for each round). I was wondering/hoping if libxgboost relegated any cache space for prior levels predictions as that is needed to grow the next tree, but that is off topic. My best effort at creating a 'predict' approach may not be the best metric but it is what I have; parsing the evaluation log won out for my purpose and level of skill. This is all predicated on simple cross validation. If one is in to nested cross validation, comparing different algorithms, then MLJ would seem to be the best ( if not only) way to go. Aside from my bashfulness, the reason I hesitate to submit a PR is that I am next to useless when it comes to git and GitHub. It's all I can do to keep my personal packages straight and not corrupt my Julia environment. I am afraid if I fork and download, my laptop might blow up due to my ignorance. All in all, XGBoost.jl is just fine. |
Hey all, I too have had a need for early stopping rounds. Understand that there could be a few ways to approach it, but it's a start and want to work towards getting this functionality added. |
I don't see it being explicitly tested and supported but if it's part of C++ library it should just work?
The text was updated successfully, but these errors were encountered: