-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update backtest scripts to use ocf-data-sampler, for site #313
Comments
I think @AUdaltsova has been working on this, so Ill let @AUdaltsova update you. |
Hi @zaryab-ali, thanks for jumping in! You are right, it refers to the script you've mentioned and this UK GSP script, which is what I've been working on. We still very much need the site version sorted as well, so the help would be very welcome if it's something you want to do! The general idea behind these scripts is to create samples for a given time window, run them through a pretrained model, and save inference results. In general, I expect you'll be able to get rid of some functions that build the datapipe from scratch and use the site dataset as is. Things like I appreciate that it's a lot to get through those scripts and what they're trying to do! Certainly took me a while :) So don't hesitate to ping me if you need help. Also, I expect I'll be adding the uk gsp script soon, so if you want you can wait for that to have something for reference. |
@zaryab-ali hi again! Just wanted to let you know that I've put the version of uk gsp backtest I've been working on here, so you can take a look if you want! NB that this script uses an older version of Hope that's helpful, let me know if you have any questions! |
@AUdaltsova thank you for the reference code, |
@zaryab-ali no problem, thanks for volunteering! As far as I know we don't operate a community slack or anything similar at the moment, so feel free to just ping me here. We have a discussions page if you have other questions unrelated to this issue |
Hi @zaryab-ali, I think you've deleted your comment about data download, just wanted to check if you need any help with that? |
@AUdaltsova thanks for reaching out, i was having an issue with the huggingface data but thats resolved, i was hoping you could help me with another issue, i followed the readme instructions to the dot but i am getting this error when running the save_batches.py |
Hi! Glad to hear it's resolved! Re: config error, you need to change where datamodule points in the |
@AUdaltsova hi again, i've made the necessary changes to backtest_sites.py ( might be a little rough and might need reviewing and changes ), i wanted to ask which branch of pvnet should i create a pull request in |
also i think the site: let me know if i am mistaken about any of this |
@AUdaltsova i created a pull request here, i noticed that some of the changes i mentioned in the above comment about also i wanted to know if you could share link to any data other than the pv link in the read me, because right now i am using sort of a hacky way by creating my own csv and netcdf, because like i mentioned earlier, the coloumn names dont match with what ocf-data-sampler is expecting, basically what i wanna know is that is this something that can be solved by changing the csv and netcdf or do i need to add some logic in backtest_sites.py to handle this during run time |
@zaryab-ali hi again! That's great news, thanks for being on it! Re: config changes, I think we've merged the dev branch we were using for all ocf-data-sampler compatible changes into main right about the time you've left those comments, which might be the source of the confusion. PVNet's main branch should be the one relying on ocf-data-sampler now, so your PR is in the right place I think. I'll leave further comments on the PR itself to keep it in one place. Re: pv data structure, this script is not supposed to be working with the uk_pv dataset we have on HF, instead it should open local data with paths given in the config files, so yeah, if it works with csv and netcdf you've created it should be fine. I'll see if I can find an example of those structures for you so you can double-check everything is working as it should |
@AUdaltsova I'm working on the changes you suggested on the PR, let me know when you can provide the example data you mentioned above for double-checking |
@zaryab-ali hi again! I don't think we have any sites data in open use at the moment, but this script creating data for testing should be helpful! You can just use whatever it spits out or maybe extend the time a bit so you have more data to play with. Though I've realised we don't actually have any models trained with ocf-data-sampler for sites yet, so it will be really hard to validate it's working correctly. I think a first pass is still going to be useful but unless you're willing to train a model yourself just for the sake of doing this, it can get really tricky to finish the script. Just to be super clear, I think it will still be useful to merge your PR after some changes because it does a lot of useful work for this script and is not going to break anything anyway, and then if you want I can ping you when we have the models to validate this and you can do another PR to finish this? If this sounds okay, I can comment more on your PR with what I would check for in these circumstances, and we can go from there. |
@AUdaltsova i have made the changes you suggested on the PR, and i guess you are right, training my own model would be a little excessive just for backtesting(plus i dont have the hardware to train a model to yeild any meaningful results), lemme know if you need me to make any other changes besides the one you mentioned in the PR #323, so i can make the changes and update the PR and we can do further testing when a model does become available |
@peterdudfield you mentioned this in phase 2 in openclimatefix/ocf-data-sampler#98
if my understaning is correct, we are trying to change ocf_datapipes to ocf-data-sampler and the changes need to made in this file https://github.com/openclimatefix/PVNet/blob/main/scripts/backtest_sites.py
any guidence or reference for getting started would be really helpful
The text was updated successfully, but these errors were encountered: