Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No software in the context #39

Open
Samuel-Scalbert opened this issue Jan 16, 2025 · 3 comments
Open

No software in the context #39

Samuel-Scalbert opened this issue Jan 16, 2025 · 3 comments

Comments

@Samuel-Scalbert
Copy link

For the second time when i process the file hal-03882318 wiht softcite's docker i have this mention with no software in the context:

{
  "type": "software",
  "software-type": "software",
  "software-name": {
    "rawForm": "nnU-Net",
    "normalizedForm": "nnU-Net",
    "offsetStart": 49,
    "offsetEnd": 56
  },
  "context": "best network configuration. In the end, the model (or ensemble) which got the best pe",
  }
}
@jameshowison
Copy link

That is odd. Can you point to the pdf you are processing?

@Samuel-Scalbert
Copy link
Author

yes of course here is the link : https://hal.science/hal-03882318/document

@willbeason
Copy link

willbeason commented Jan 27, 2025

The paragraph in question:

We randomly split all 23 patients into four folds using a cross-validation scheme. Table 1 presents the partition of the
dataset. Multiple images (i.e., planning and daily images) from individual patients were not distributed among datasets. The
test data was only used to evaluate the performance of the model in this fold and was not involved in training. nnU-Net further
divided the training data into training and validation sets and performed a five-fold cross-validation to automatically select the
best network configuration. In the end, the model (or ensemble) which got the best performance was chosen to perform the
inference on the test sets of this fold. The number of epochs during training was 1000 for every fold. The evaluation of the
segmented volumes is described in part 2.4.1. Table 2 reports the network configurations generated by nnU-Net for the
considered dataset.

So nnU-Net does appear in the paragraph, and the paragraph (at least in PDF form) includes linebreaks for each line of text. In this case, "best network configuration" is at the end of a sentence beginning "nnU-Net" but is at the beginning of a line of text. I wonder if possibly sentence segmentation logic got confused by the multiple line breaks in the middle of the sentence, or the fact that the sentence begins with a lowercase letter?

I'd expect the context to be:

nnU-Net further divided the training data into training and validation sets and performed a five-fold cross-validation to automatically select the best network configuration.

(line breaks removed)

Another possibility (or coincidence) is that the sentence beginning with "nnU-Net" is the fifth sentence in the paragraph, and it is the fifth line of text that was chosen as the context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants