Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the document about training data #1267

Open
Technew94 opened this issue Jan 15, 2025 · 3 comments
Open

Improve the document about training data #1267

Technew94 opened this issue Jan 15, 2025 · 3 comments

Comments

@Technew94
Copy link

Dear developer,

After reading the documents, if I understand you correctly, training dataset is settled from "sitsdata' package right?
Then if I want to do the classification in e.g. Germany, then what kind of training dataset I should use? Should I lable the targets e.g. 'forest', 'pasture' and so on mannuly then create a new training dataset by myself?
If you can add some paragraphy to explain this quesiton and add some examples about how to prepare a "self-prepared" training dataset would be very helpful for beginers like me.

This may be a silly suggestion, but thank you for your time and patience.

All the best

@gilbertocamara
Copy link
Contributor

Dear @Technew94 Many thanks for your suggestion. It is an important point that needs to be better addressed in the documentation. In fact, users can enter their data in sits for any part of the globe. I will give you a simple example below:

# select a study area 
cube_cloud <- sits_cube(
      source = "MPC",  #could be CDSE, AWS or many other cloud services supported by sits
      collection = "SENTINEL-2-L2A"  # use sits_list_collections() to see which collections are supported
      roi = c(lon_min = ..., lat_min = ..., lon_max = ..., lat_max = ...), # region of interest - can also be a MGRS tile
     bands = c("....") # put the bands you want 
     start_date = ....   # initial date of your data series
    end_date = ....     # final date of your series
)

# regularize the data cube
cube_reg <- sits_regularize(
        cube = cube_cloud,
        res = 10,    # in meters
        period = "P1M", # monthly data is one option see documentation for more
        output_dir = <where the regular data cube will be stored>
)
# get the time series 
# suppose you have a shapefile with points where labels are informed in column "LABEL"
time_series_data <- sits_get_data(
       cube = cube_reg,
       samples = <shapefile>,
      label_attr =  "LABEL"
) 

The above code will allow you to create data cubes for Germany and retrieve a time series from a point shapefile. All of this is described in the documentation. However, I fully agree the docs can be improved. We are working hard in doing so.

@Technew94
Copy link
Author

Dear gilberto,
Thanks for your detailed explaination here. If I understand you correctly, when I want to do the classification from 20190101 to 20210101, then I can use a shapefile with points from 20190102 or even other time inside the timeseries to do classification, am I right? If so, people can create a shapefile from any tile as it is inside the time series.
If possible, in the document could you please add a simple shapefile in the section of "self-prepared" part? It can be an example for beginers like me.

Thanks again!

@gilbertocamara
Copy link
Contributor

Dear @Technew94 thanks for the nice words!! Usually, shapefiles have no temporal information. So you can use a shapefile from any date to get training data from the cube. Of course, the model will perform better if the data collection that produced the shapefile occurs between the start and end dates of the data cube.

Please note that sits uses the convention YYYY-MM-DD for dates, as in "2024-02-03".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants