-
Notifications
You must be signed in to change notification settings - Fork 736
Block Level Identifiers #1482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Block Level Identifiers #1482
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thanks for taking on this big task!
I am still getting myself familiar with the PR, so my comments are meant as a conversation at this stage more than just feedback.
I have left a few comments, questions and suggestions.
I admit the PR is massive, if there were also a way to breaking it down into smaller ones it would make it easier to review.
Finally, can you tell me what AI support you used if any?
Thanks again!
packages/foam-vscode/package.json
Outdated
| "test": "yarn test-setup && node ./out/test/run-tests.js", | ||
| "test:unit": "yarn test-setup && node ./out/test/run-tests.js --unit", | ||
| "test:e2e": "yarn test-setup && node ./out/test/run-tests.js --e2e", | ||
| "test:tdd": "yarn build:node && jest --runInBand", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this allow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test:tdd command allows individual testing of any parsing focused test.ts file, which was mainly used for heavy testing of the block id parsing test file. For now I will rename it better unless it's not wanted.
| id?: string; // A unique identifier for the section within the note. | ||
| label: string; | ||
| range: Range; | ||
| blockId?: string; // The optional block identifier, if one exists (e.g., '^my-id'). | ||
| isHeading?: boolean; // A boolean flag to clearly distinguish headings from other content blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few points/questions:
- it's an interesting idea "computing" the
idof a section at parsing time, so if we need to reference it afterwards we can easily do so, and it allows us to slightly "decouple" label from id - is this the reason why you introduced that? - given
idandisHeading, do we have enough to identify a block? Isn'tblockId === idfor a block Section? - from glancing at where
isHeadingis used, do we need it explicitly or can we just check if theidstarts with^as a convetion? - also,
isHeadingrefers to the anchor, but this object describes a section. I wonder if we should call this something likeisBlock(in which case we are inverting the semantics though) or something likeisHeaderSection(which I think it's a bit uglier but "compatible" with the existing PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had these questions also about the simplicity of the data model.
We do extend Foam's heading section label for a different use for block ids instead of changing the section object. Always using the computed id instead of findSection would be a performance improvement, but I also wanted to maintain familiarity of the object and not change every existing reference to label across the project.
After some experiments, it has been found that id, blockId, and isHeading are each fairly essential additions to the section object. There is not a great way to get rid of any one of them without making downstream code on the consumers end more complicated.
We could rename isHeading to hasHeader, but that's ambiguous about containment. The meaning of isHeading for me, is that it is a markdown heading section as opposed to an html header or yaml frontmatter, and also as opposed to a block identifier section. isHeading as a value contained in the Section object has more clarity than checking for the ^ of the block identifier, which is a complicated string comparison.
We are generally forced to check for isHeading in many places on the consumer end. Heading sections and block id sections (and each kind of block id) simply have many different ways of displaying across different interfaces. It would not be a good idea to try to pack all of those displays into the Section object itself, the current separation of concerns across different renderers is the correct approach.
Let's say from this point that what the consumers really need more than anything is a unified section results list, and this is exactly what we construct among the collection of heading sections and block id sections.
blockId is used in places like link-completion.ts and wikilink-diagnostics.ts. Were it removed, we would have to check isHeading, and then reconstruct the string from the ^ and id each time it needed to be used in the consumers. It is a good addition to the Section object. Of course, having a blockId value is enough to confirm being a block id, which suggests the use of isHeading instead.
For interest, the Section object I was giving consideration to before I encountered a lot of problems with implementation was this:
label: string;
range: Range;
canonicalId: string; // used when creating new links, slug for a heading, blockid with carat for a block
linkableIds: string[]; // list of all valid ids that resolve to this section, including canonicalId, also could be used for multiple block ids on a line
edit:
isHeading acts as a good reminder to need to act on its presence in the Section object. We might also make use of discriminated unions or explicit type fields for more strict enforcement.
id can be used for a heading and a heading block id to refer to the same section, like when we reconstruct heading block id backlink hrefs to point to the correct heading section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the part of the PR I want to pay the closest attention to, as the data model for me is extremely important, so bear with me as I am trying to fully understand these choices and their implications.
We do extend Foam's heading section label for a different use for block ids instead of changing the section object.
Can you please help me understand that? In which way are you changing that?
At a high level, I agree that label and id are not necessarily the same. At the same time, right now findSection does a simple label comparison, and in the case of a block, I could see how it would simply be a section where the label is the ID of the block (e.g. ^block-53, notice I am including the ^).
Does it feel semantically incorrect to you?
After some experiments, it has been found that id, blockId, and isHeading are each fairly essential additions to the section object. There is not a great way to get rid of any one of them without making downstream code on the consumers end more complicated
I would like to understand why.
I am asking for a simple purpose: I would like to avoid a sub-optimal change to the model that is driven by "practical" concerns like how many downstream refactorings it requires, because the refactoring is a one off - whereas a change to the core model will be carried over time (accumulates tech debt, and especially on the "public model", which is a sort of API, feels like a heavy burden)
We could rename isHeading to hasHeader, but that's ambiguous about containment. The meaning of isHeading for me, is that it is a markdown heading section as opposed to an html header or yaml frontmatter, and also as opposed to a block identifier section. isHeading as a value contained in the Section object has more clarity than checking for the ^ of the block identifier, which is a complicated string comparison
What about having a type property, which could be 'section' | 'block'. It's semantically closer to the object semantics ("what type of section is this?") and it's extensible shall we need new types in the future (I don't think we need, but it was interesting what you were saying about html/yaml sections)
Heading sections and block id sections (and each kind of block id) simply have many different ways of displaying across different interfaces
In what way they are different? The only thing I can think of is that a section has a header whereas a block will not, beside that I don't see any difference. Am I missing something?
Let's say from this point that what the consumers really need more than anything is a unified section results list, and this is exactly what we construct among the collection of heading sections and block id sections
I agree, this is why I am trying to keep them as "aligned" as possible. My goal would be that from the user's POV you can treat the two in (almost) the same way. This is why I am a bit hesitant about having e.g. custom fields (like blockId) that only really apply to only one type of section.
id can be used for a heading and a heading block id to refer to the same section, like when we reconstruct heading block id backlink hrefs to point to the correct heading section.
Sorry I didn't understand this, can you add an example?
packages/foam-vscode/src/features/preview/wikilink-embed.spec.ts
Outdated
Show resolved
Hide resolved
| await deleteFile(noteB.uri); | ||
| }); | ||
|
|
||
| describe('Block Identifiers', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new tests for the block identifiers. shouldn't all the other tests "just work" as I would assume the change to backwards compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary changes in expected test data in this file are a result of improved transclusion parsing to remove incorrect double paragraph elements. There is also updated expected test data in markdown-parser.test.ts that is a result of improving getBlockFor range selection accuracy.
| document.getText() | ||
| ); | ||
|
|
||
| const diagnostics = resource.links.flatMap(link => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting approach as opposed to a filter/reduce
| } else { | ||
| // No fragment: transclude the whole note (excluding frontmatter if present) | ||
| // Remove YAML frontmatter if present | ||
| noteText = noteText.replace(/^---[\s\S]*?---\s*/, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this could be in utils/md.ts as it might be helpful somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used stripFrontMatter instead, it seems to be usable on any string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces block level identifiers (using the ^ prefix) to support transclusions, backlinks, and improved navigation within markdown files. Key changes include modifications to the markdown parser, diagnostic providers, link navigation and embed modules, and extensive new tests covering paragraphs, list items, headings, code blocks, and tables.
Reviewed Changes
Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tsconfig.json | Enables JS support and sets CommonJS module for VS Code integration. |
| test-data/block-identifiers/*.md | Adds sample markdown files using the ^ syntax for block identifiers. |
| src/**/*.ts (diagnostics, navigation, embed, hover, link completion, parser, graph, etc.) | Updates multiple core modules and providers to recognize and process block identifiers, with new tests verifying their correct behavior. |
| package.json | Updates dependency types to support the new syntax features. |
Comments suppressed due to low confidence (2)
packages/foam-vscode/src/core/model/note.ts:95
- Consider adding inline documentation to clarify the normalization and matching rules for block identifiers. In particular, explain how the caret prefix is handled and how slugified heading IDs are compared.
): Section | null {
packages/foam-vscode/src/core/utils/md.ts:76
- It would be beneficial to add documentation explaining the block extraction behavior in getBlockFor, including how the number of lines (nLines) is determined. This clarification will help maintain consistency with test expectations and future modifications.
): { block: string; nLines: number } {
I'm not sure about splitting the PR, the main thing is that the AST parser was introduced not just for performance but that I couldn't get the line and indent level heuristics to cooperate. The rest is pretty tightly coupled after making that choice. I can definitely give explanation for specific changes that were made to each file.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple more comments, and especially some questions around the Section object (as anything that is inside the model package is basically Foam's API and foundational for everything else, I need high understanding of everything that goes in there).
Thanks again for your help!
| * - Removes block IDs from the rendered text for all block types. | ||
| * - For paragraphs and list items, cleans the block ID from the text. | ||
| */ | ||
| export function markdownItblockIdRemoval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting idea, i like it
| id?: string; // A unique identifier for the section within the note. | ||
| label: string; | ||
| range: Range; | ||
| blockId?: string; // The optional block identifier, if one exists (e.g., '^my-id'). | ||
| isHeading?: boolean; // A boolean flag to clearly distinguish headings from other content blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the part of the PR I want to pay the closest attention to, as the data model for me is extremely important, so bear with me as I am trying to fully understand these choices and their implications.
We do extend Foam's heading section label for a different use for block ids instead of changing the section object.
Can you please help me understand that? In which way are you changing that?
At a high level, I agree that label and id are not necessarily the same. At the same time, right now findSection does a simple label comparison, and in the case of a block, I could see how it would simply be a section where the label is the ID of the block (e.g. ^block-53, notice I am including the ^).
Does it feel semantically incorrect to you?
After some experiments, it has been found that id, blockId, and isHeading are each fairly essential additions to the section object. There is not a great way to get rid of any one of them without making downstream code on the consumers end more complicated
I would like to understand why.
I am asking for a simple purpose: I would like to avoid a sub-optimal change to the model that is driven by "practical" concerns like how many downstream refactorings it requires, because the refactoring is a one off - whereas a change to the core model will be carried over time (accumulates tech debt, and especially on the "public model", which is a sort of API, feels like a heavy burden)
We could rename isHeading to hasHeader, but that's ambiguous about containment. The meaning of isHeading for me, is that it is a markdown heading section as opposed to an html header or yaml frontmatter, and also as opposed to a block identifier section. isHeading as a value contained in the Section object has more clarity than checking for the ^ of the block identifier, which is a complicated string comparison
What about having a type property, which could be 'section' | 'block'. It's semantically closer to the object semantics ("what type of section is this?") and it's extensible shall we need new types in the future (I don't think we need, but it was interesting what you were saying about html/yaml sections)
Heading sections and block id sections (and each kind of block id) simply have many different ways of displaying across different interfaces
In what way they are different? The only thing I can think of is that a section has a header whereas a block will not, beside that I don't see any difference. Am I missing something?
Let's say from this point that what the consumers really need more than anything is a unified section results list, and this is exactly what we construct among the collection of heading sections and block id sections
I agree, this is why I am trying to keep them as "aligned" as possible. My goal would be that from the user's POV you can treat the two in (almost) the same way. This is why I am a bit hesitant about having e.g. custom fields (like blockId) that only really apply to only one type of section.
id can be used for a heading and a heading block id to refer to the same section, like when we reconstruct heading block id backlink hrefs to point to the correct heading section.
Sorry I didn't understand this, can you add an example?
|
I have added a unified section object according to the requested parameters. Note that this does not allow us to avoid forking code in multiple files for header sections and block ID sections, but it should now be more obvious that this check is required in new files we might want to add. While working on this solution, I did catch a few more edge case problems that were not noticed previously. One of the main changes I needed to make was to give up trying to clean the transclusion duplicate paragraph tags (so I returned to the original Foam expected test data), it's technically invalid but it's pretty complicated to match the open and closing tags given different inputs and it renders the same either way.
This occurs in wikilink-navigation.ts, lines 78-117. For the header: Header Text ^header-idWe want both "Header Text" and "^header-id" to be separate usable sections for the user that point to the same header section. We did additionally add a new section level parameter to the unified section object. This is primarily used in wikilink-embed.ts for determining the end of a section for correct transclusion. Beyond this immediate requirement, making the level part of the data model opens up possibilities for future features and would be beneficial in other potential downstream consumers, even if they don't strictly require it now. These include:
Lastly, we could consider writing a Pandoc lua script file that would allow publishing for Foam in general and with the new block ID feature. There are existing examples that would make the script pretty simple to write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really loving the direction this is going, and amazing job on the testing!
I have left various comments, looking forward to your thoughts
| const markdown = ` | ||
| ## My Heading ^heading-id | ||
| `; | ||
| const actual = parse(markdown); | ||
| expect(actual.sections).toEqual([ | ||
| { | ||
| id: 'my-heading', | ||
| blockId: '^heading-id', | ||
| type: 'heading', | ||
| label: 'My Heading', | ||
| level: 2, // Add level property | ||
| range: Range.create(1, 0, 2, 0), | ||
| }, | ||
| ]); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, very good example you picked here.
In my mind, this markdown should actually produce 2 "sections", one triggered by the heading, and one triggered by the block-id.
tbh I am not sure why anyone would write something like this, but being valid markdown I agree we should handle it.
Also, I am seeing that in the case of the heading, the label doesn't include the block-id, whereas in the test case above of the paragraph, the label includes it. I am wondering whether I like this difference in behavior, what od you think?
| expect(actual.sections).toEqual([ | ||
| { | ||
| id: 'parent-id', | ||
| blockId: '^parent-id', | ||
| type: 'block', | ||
| label: `- Parent item ^parent-id | ||
| - Child item 1 | ||
| - Child item 2`, | ||
| range: Range.create(1, 0, 3, 16), | ||
| }, | ||
| ]); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the label is really the content? I wonder if for lists this should just be the name of the item the block is declared on (maybe not including the block id itself?). In this test case it would be Parent item.
The range can be used to extract the content (as we would do for a heading section).
Notice a lot of my comments are about unifying as much as possible the semantics of block sections with heading sections.
In fact, in my view they are exactly the same thing, only defined by different mechanisms (which is why I wrote that comment about # my section ^section-id generating 2 sections. it's kinda having the same block with two anchors pointing at it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(notice that I am not sure how I would "generate" the label for tables or code blocks, any ideas? maybe the de-slugified block id? just thinking out loud)
| // A section created from a markdown heading | ||
| export interface HeadingSection extends BaseSection { | ||
| type: 'heading'; | ||
| level: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the level property. In theory we could also use it for list items (maybe on a different number-space (e.g. starting at 10) then headers), which i think we can also easily get from the md parser, but I don't consider it a blocker for moving forward.
Actually we could even have it for paragraphs (which i guess would just be level 10 or 0 depending how we want to do the numbering for non-heading sections)
| export interface HeadingSection extends BaseSection { | ||
| type: 'heading'; | ||
| level: number; | ||
| blockId?: string; // A heading can ALSO have a block-id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echoing what i wrote above, if a heading has a block id, I wonder if we should consider that text to generate 2 sections, one of type type: 'block' and the other of type type: 'section'.
| public getIdentifier( | ||
| forResource: URI, | ||
| exclude?: URI[], | ||
| section?: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you adding this section parameter? I am not seeing it being used anywhere
| return item; | ||
| const items = resource.sections.flatMap(section => { | ||
| const sectionItems: vscode.CompletionItem[] = []; | ||
| switch (section.type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder whether we are actually converging to a consistent treatment of the two types of sections, especially if we don't do special treatment for headings with block ids
| : Range.createFromPosition(Position.create(0, 0), Position.create(0, 0)); | ||
| const targetSelectionRange = section | ||
| ? section.range | ||
| ? (section as any).labelRange || section.range // Use labelRange for headings, fallback to full section range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure the special casing is necessary, but happy to go with how you feel works best
| .slice(section.range.start.line, section.range.end.line) | ||
| .join('\n'); | ||
| if (section.type === 'heading') { | ||
| // For headings, extract all content from that heading to the next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you already get this from the range property? Why are you re-computing it?
the range property should also already take into account the right level
|
|
||
| if (isSome(section)) { | ||
| rows = rows.slice(section.range.start.line, section.range.end.line); | ||
| if (section.type === 'heading') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, you could use the range and add 1 to the row in the start position
| md: markdownit, | ||
| workspace: FoamWorkspace | ||
| workspace: FoamWorkspace, | ||
| options?: { root?: vscode.Uri } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find where this is being used and where it's being set?
|
Incredible work on this! I am excited to see this merged. |
|
hi @ryanncode - just wanted to check in and see your thoughts on the PR, thanks! |
|
This is starting to vibe like LogSeq and SiYuan ngl |
|
@TomLucidor not sure I understand, can you elaborate? |
|
Obsidian is traditionally a page-based PKM (link by file so it is easier to code), while LogSeq and SiYuan are block-based PKM (and linking is more modular with outline-like structure). The way they design how the latter works is kinda weird, but often useful when people cannot stand long articles. |
Trying to get time to get back on working on this. The first problem I encountered is that I can't fulfill both of your first two requests. I spent a good deal of time working on the first request, only to find that it conflicts with the more important requirements of the second request. In general, I don't see a way to get the VS Code interface to cooperate with a perfectly logical and semantic structure to the code. The best I can do is to clean up at least some of the output on the consumers end. Will submit something new within 1-2 weeks. |
|
Sorry, just to be clear I understand, can you tell me what the first and second requests you are referring to are? |
This feature adds ^abc123 style block level identifiers. We can use block identifiers generally in every way that you could use a header section for backlinks and transclusions, like ![[doc#^block-id]]. The block identifier must be placed either in-line at the end of a paragraph or list item, or on its own line following a markdown node (full list, block quote, code block, table).
I have also found this was previously requested in this issue:
#294
Some conditions we are faced with:
Other notes: