Skip to content

Conversation

@RuslanAntjuschin
Copy link

This PR adds a python method to create few-shot training subsets for BirdSet datasets. There is a small usage example in the added notebook.

@lurauch
Copy link
Contributor

lurauch commented Mar 4, 2025

thank you!

a few remarks:

  • the object orientation is IMO not necessary, you could just have 2 different options ("lenient" and "strict") and then define these parameters in the code for example
  • i tried your code, there a few steps missing that would be nice to have:
    • defining the path from where to load the dataset
    • defining the path and automatically saving the dataset directly to the respective folder
    • removing some columns
    • one-hot-encode the labels

e.g., you could add this to the code:

 def one_hot_encode_batch(batch, num_classes):
      """
      Converts integer class labels in a batch to one-hot encoded tensors.
      """
      label_list = batch["labels"]
      batch_size = len(label_list)
      one_hot = torch.zeros((batch_size, num_classes), dtype=torch.float32)
      for i, label in enumerate(label_list):
          one_hot[i, label] = 1
      return {"labels": one_hot}
      dataset = DatasetDict({"train": Dataset.from_list(selected_samples), "test": dataset["test_5s"]})
print("Selecting relevant columns and renaming...", flush=True)
columns_to_keep = ["filepath", "ebird_code_multilabel", "detected_events", "start_time", "end_time"]

dataset = DatasetDict({
split: dataset[split].select_columns(columns_to_keep).rename_column("ebird_code_multilabel", "labels")
for split in dataset.keys()
})

print("Applying one-hot encoding to labels...", flush=True)
dataset = dataset.map(lambda batch: one_hot_encode_batch(batch, NUM_CLASSES), batched=True)

print("Saving processed dataset to disk...", flush=True)
dataset.save_to_disk(SAVE_DIR)
print("Dataset saved to", SAVE_DIR, flush=True)

return dataset

@RuslanAntjuschin
Copy link
Author

@lurauch I've added the requested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants