Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization of the segmentation of the text #258

Open
FLazzerini opened this issue Mar 10, 2025 · 2 comments
Open

Visualization of the segmentation of the text #258

FLazzerini opened this issue Mar 10, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@FLazzerini
Copy link

(With reference to the latest version of the Lucullus)
We have implemented the division of the text using <seg> to separate the sections. Here's the marking for sections 1 and 2 of the Lucullus:

 <text>
      <body>
         <div type="book" n="2">
            <p><seg n="001" xml:id="Luc-001">
                  <w xml:id="Luc-001-1">Magnum</w>
                  <w xml:id="Luc-001-2">ingenium</w>
                  <w xml:id="Luc-001-3">L.</w>
                  <w xml:id="Luc-001-4">Luculli</w> 
(...)
                  <w xml:id="Luc-001-107">superiorum</w>;</seg>
                  <seg n="002" xml:id="Luc-002">
                  <w xml:id="Luc-002-1">idque</w>
                  <w xml:id="Luc-002-2">eo</w>
                  <w xml:id="Luc-002-3">fuit</w>
                  <w xml:id="Luc-002-4">mirabilius</w>
(...)
                  </seg>

Now the document is visualized with this division, but not consistently. In the Lucullus, the first two <seg>s are visualized together as one portion of the text. It's not clear what criterion the software is using, because this subdivision does not correspond to the division into <p>s either.
Screenshot: the passage in question, with the beginning of <seg n="002"> marked in red.

Image

@RobertoRDT
Copy link
Member

RobertoRDT commented Mar 10, 2025

After exploring the configuration options of the current prototype there is indeed a problem with text segmentation, because at the moment even specifying <seg> as the minimal text unit to display EVT conflates <p> and <seg> elements in an unclear way. This is also due to the fact that <p> elements in the Lucullus have no attributes that could be used to show headings and/or numbers in the selector.

Provisory conclusions from today's test and discussion:

  • all structural elements that the editor wants to include in the document structure navigation should have n attributes;

  • xml:id attributes would also be a good idea, e.g. to provide navigation hints when looking for text, named entities, etc.;

  • document structure will be built on the basis of

    1. the configuration options: an "editionStructureSeparation": "div, p, seg" option tells EVT that the smallest text unit <seg> should be used to define the document structure navigation, i.e. that the text frame will show complete <seg> elements;
    2. the attribute values of the elements singled out in the configuration option: I would suggest that n is used first, if not present then xml:id, if this is not present then an automatic sort of numbering should be created by EVT (it really is a last resort though).
  • note that if is the smallest text unit driving navigation, choosing a <seg> would only show its content, choosing a <p> will show its content starting from the beginning, the same will happen for <div> elements (also when nested, also other bigger structural elements);

  • at the moment only the text frame selector is available for navigation purposes;

  • the hierarchy in the selector should be highlighted thanks to separator lines and text indenting;

  • when a full navigation panel will be available, phrase level elements such as <seg> won't be included in the selector; we could also handle this in a dynamic way: the TOC will have all structure levels, the selector just the first 2-3.

Navigation in EVT 2:

Image

@RobertoRDT RobertoRDT added the bug Something isn't working label Mar 10, 2025
@FLazzerini
Copy link
Author

Noted. I have numbered the <p>s with the attribute n as well.
To recap the situation and the needed functions:
The structural elements of Lucullus will be:

  • 1 <div> (it is the second of Cicero's Academici libri, so there will be only one <div type="book" n="2">
  • 148 <seg>s (the standard established subdivision into sections)
  • a number of <p>s at the discretion of the present editor (myself).

Matteo Di Franco will have a similar situation with his edition of Aelius Aristides.

What we do and do not need EVT to do:

  • We need way to navigate between <seg>s. This can take any convenient form (I appreciate that a drop-down menu may not be the most convenient option with so many sections), but it is the standard reference subdivision that any user would need to engage with a classical text.
  • We don't need a way to navigate between <p>s. I see no shortcomings to there being a way to do that, it's just not needed.
  • That being said, we need a way to make the file slimmer when it is loaded by the application, otherwise it will be unusable. We have seen that the file becomes lighter when only a portion of the text is loaded at a time. It would make sense for this to correspond to the <seg>, maybe with a litte bit of the text that preceeds and follows each <seg> just for the sake of continuity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants