Skip to content

Conversation

@ryanncode
Copy link

@ryanncode ryanncode commented Jun 24, 2025

This feature adds ^abc123 style block level identifiers. We can use block identifiers generally in every way that you could use a header section for backlinks and transclusions, like ![[doc#^block-id]]. The block identifier must be placed either in-line at the end of a paragraph or list item, or on its own line following a markdown node (full list, block quote, code block, table).

I have also found this was previously requested in this issue:
#294

Some conditions we are faced with:

  • In Markdown Preview, block id backlinks link to their nearest header section. Across many attempts, it has been found that VS Code sandboxed default Markdown Preview window is heavily restricted from setting custom html id tags or querying them. There may be a solution I'm not familiar with. In every other case including editor view, hover view, transclusion, autocomplete, etc., we link directly to the block id as needed. To me this seems like a fair tradeoff, since the main interest would be block identifier transclusions.
  • This is not compatible with Prettier since it is very opinionated about how Markdown nodes must be ordered, and will not allow the block id to be placed on the last line. My suggestion would be to pair Markdown All in One with Markdown Linter, which is a trade off between strict automation of the layout vs more user control over which errors they want to catch.
  • For simplicity we choose to only recognize the last block id on the line, it could be improved to allow multiple block ids on the same line (to be linked to for separate contexts).

Other notes:

  • The parser has been reworked to be AST-based instead of line # and indentation level heuristic based, which should be an overall improvement in performance. This could use some testing in large documents or workspaces with many documents and so on.
  • Since this does require heavy changes, I can follow up until we confirm every part of the feature is well tested. I did add many tests to start off with, and they should be currently passing.

Copy link
Collaborator

@riccardoferretti riccardoferretti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks for taking on this big task!
I am still getting myself familiar with the PR, so my comments are meant as a conversation at this stage more than just feedback.
I have left a few comments, questions and suggestions.
I admit the PR is massive, if there were also a way to breaking it down into smaller ones it would make it easier to review.
Finally, can you tell me what AI support you used if any?
Thanks again!

"test": "yarn test-setup && node ./out/test/run-tests.js",
"test:unit": "yarn test-setup && node ./out/test/run-tests.js --unit",
"test:e2e": "yarn test-setup && node ./out/test/run-tests.js --e2e",
"test:tdd": "yarn build:node && jest --runInBand",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this allow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test:tdd command allows individual testing of any parsing focused test.ts file, which was mainly used for heavy testing of the block id parsing test file. For now I will rename it better unless it's not wanted.

Comment on lines 43 to 47
id?: string; // A unique identifier for the section within the note.
label: string;
range: Range;
blockId?: string; // The optional block identifier, if one exists (e.g., '^my-id').
isHeading?: boolean; // A boolean flag to clearly distinguish headings from other content blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points/questions:

  • it's an interesting idea "computing" the id of a section at parsing time, so if we need to reference it afterwards we can easily do so, and it allows us to slightly "decouple" label from id - is this the reason why you introduced that?
  • given id and isHeading, do we have enough to identify a block? Isn't blockId === id for a block Section?
  • from glancing at where isHeading is used, do we need it explicitly or can we just check if the id starts with ^ as a convetion?
  • also, isHeading refers to the anchor, but this object describes a section. I wonder if we should call this something like isBlock (in which case we are inverting the semantics though) or something like isHeaderSection (which I think it's a bit uglier but "compatible" with the existing PR)

Copy link
Author

@ryanncode ryanncode Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had these questions also about the simplicity of the data model.

We do extend Foam's heading section label for a different use for block ids instead of changing the section object. Always using the computed id instead of findSection would be a performance improvement, but I also wanted to maintain familiarity of the object and not change every existing reference to label across the project.

After some experiments, it has been found that id, blockId, and isHeading are each fairly essential additions to the section object. There is not a great way to get rid of any one of them without making downstream code on the consumers end more complicated.

We could rename isHeading to hasHeader, but that's ambiguous about containment. The meaning of isHeading for me, is that it is a markdown heading section as opposed to an html header or yaml frontmatter, and also as opposed to a block identifier section. isHeading as a value contained in the Section object has more clarity than checking for the ^ of the block identifier, which is a complicated string comparison.

We are generally forced to check for isHeading in many places on the consumer end. Heading sections and block id sections (and each kind of block id) simply have many different ways of displaying across different interfaces. It would not be a good idea to try to pack all of those displays into the Section object itself, the current separation of concerns across different renderers is the correct approach.

Let's say from this point that what the consumers really need more than anything is a unified section results list, and this is exactly what we construct among the collection of heading sections and block id sections.

blockId is used in places like link-completion.ts and wikilink-diagnostics.ts. Were it removed, we would have to check isHeading, and then reconstruct the string from the ^ and id each time it needed to be used in the consumers. It is a good addition to the Section object. Of course, having a blockId value is enough to confirm being a block id, which suggests the use of isHeading instead.

For interest, the Section object I was giving consideration to before I encountered a lot of problems with implementation was this:

label: string;
range: Range;
canonicalId: string; // used when creating new links, slug for a heading, blockid with carat for a block
linkableIds: string[]; // list of all valid ids that resolve to this section, including canonicalId, also could be used for multiple block ids on a line

edit:

isHeading acts as a good reminder to need to act on its presence in the Section object. We might also make use of discriminated unions or explicit type fields for more strict enforcement.

id can be used for a heading and a heading block id to refer to the same section, like when we reconstruct heading block id backlink hrefs to point to the correct heading section.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the part of the PR I want to pay the closest attention to, as the data model for me is extremely important, so bear with me as I am trying to fully understand these choices and their implications.

We do extend Foam's heading section label for a different use for block ids instead of changing the section object.

Can you please help me understand that? In which way are you changing that?
At a high level, I agree that label and id are not necessarily the same. At the same time, right now findSection does a simple label comparison, and in the case of a block, I could see how it would simply be a section where the label is the ID of the block (e.g. ^block-53, notice I am including the ^).
Does it feel semantically incorrect to you?

After some experiments, it has been found that id, blockId, and isHeading are each fairly essential additions to the section object. There is not a great way to get rid of any one of them without making downstream code on the consumers end more complicated

I would like to understand why.
I am asking for a simple purpose: I would like to avoid a sub-optimal change to the model that is driven by "practical" concerns like how many downstream refactorings it requires, because the refactoring is a one off - whereas a change to the core model will be carried over time (accumulates tech debt, and especially on the "public model", which is a sort of API, feels like a heavy burden)

We could rename isHeading to hasHeader, but that's ambiguous about containment. The meaning of isHeading for me, is that it is a markdown heading section as opposed to an html header or yaml frontmatter, and also as opposed to a block identifier section. isHeading as a value contained in the Section object has more clarity than checking for the ^ of the block identifier, which is a complicated string comparison

What about having a type property, which could be 'section' | 'block'. It's semantically closer to the object semantics ("what type of section is this?") and it's extensible shall we need new types in the future (I don't think we need, but it was interesting what you were saying about html/yaml sections)

Heading sections and block id sections (and each kind of block id) simply have many different ways of displaying across different interfaces

In what way they are different? The only thing I can think of is that a section has a header whereas a block will not, beside that I don't see any difference. Am I missing something?

Let's say from this point that what the consumers really need more than anything is a unified section results list, and this is exactly what we construct among the collection of heading sections and block id sections

I agree, this is why I am trying to keep them as "aligned" as possible. My goal would be that from the user's POV you can treat the two in (almost) the same way. This is why I am a bit hesitant about having e.g. custom fields (like blockId) that only really apply to only one type of section.

id can be used for a heading and a heading block id to refer to the same section, like when we reconstruct heading block id backlink hrefs to point to the correct heading section.

Sorry I didn't understand this, can you add an example?

await deleteFile(noteB.uri);
});

describe('Block Identifiers', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new tests for the block identifiers. shouldn't all the other tests "just work" as I would assume the change to backwards compatible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary changes in expected test data in this file are a result of improved transclusion parsing to remove incorrect double paragraph elements. There is also updated expected test data in markdown-parser.test.ts that is a result of improving getBlockFor range selection accuracy.

document.getText()
);

const diagnostics = resource.links.flatMap(link => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting approach as opposed to a filter/reduce

} else {
// No fragment: transclude the whole note (excluding frontmatter if present)
// Remove YAML frontmatter if present
noteText = noteText.replace(/^---[\s\S]*?---\s*/, '');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this could be in utils/md.ts as it might be helpful somewhere else

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used stripFrontMatter instead, it seems to be usable on any string

@riccardoferretti riccardoferretti requested a review from Copilot June 24, 2025 15:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces block level identifiers (using the ^ prefix) to support transclusions, backlinks, and improved navigation within markdown files. Key changes include modifications to the markdown parser, diagnostic providers, link navigation and embed modules, and extensive new tests covering paragraphs, list items, headings, code blocks, and tables.

Reviewed Changes

Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.

File Description
tsconfig.json Enables JS support and sets CommonJS module for VS Code integration.
test-data/block-identifiers/*.md Adds sample markdown files using the ^ syntax for block identifiers.
src/**/*.ts (diagnostics, navigation, embed, hover, link completion, parser, graph, etc.) Updates multiple core modules and providers to recognize and process block identifiers, with new tests verifying their correct behavior.
package.json Updates dependency types to support the new syntax features.
Comments suppressed due to low confidence (2)

packages/foam-vscode/src/core/model/note.ts:95

  • Consider adding inline documentation to clarify the normalization and matching rules for block identifiers. In particular, explain how the caret prefix is handled and how slugified heading IDs are compared.
  ): Section | null {

packages/foam-vscode/src/core/utils/md.ts:76

  • It would be beneficial to add documentation explaining the block extraction behavior in getBlockFor, including how the number of lines (nLines) is determined. This clarification will help maintain consistency with test expectations and future modifications.
): { block: string; nLines: number } {

@ryanncode
Copy link
Author

First of all, thanks for taking on this big task! I am still getting myself familiar with the PR, so my comments are meant as a conversation at this stage more than just feedback. I have left a few comments, questions and suggestions. I admit the PR is massive, if there were also a way to breaking it down into smaller ones it would make it easier to review. Finally, can you tell me what AI support you used if any? Thanks again!

I'm not sure about splitting the PR, the main thing is that the AST parser was introduced not just for performance but that I couldn't get the line and indent level heuristics to cooperate. The rest is pretty tightly coupled after making that choice. I can definitely give explanation for specific changes that were made to each file.

  • Primary AI dev tools are RooCode for architecture planning and task orchestration, and Copilot for smaller editing tasks.

  • Removed a few leftover artifacts from dev process, for example stuff related to trying to use javascript injections that I ended up not using.

Copy link
Collaborator

@riccardoferretti riccardoferretti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple more comments, and especially some questions around the Section object (as anything that is inside the model package is basically Foam's API and foundational for everything else, I need high understanding of everything that goes in there).
Thanks again for your help!

* - Removes block IDs from the rendered text for all block types.
* - For paragraphs and list items, cleans the block ID from the text.
*/
export function markdownItblockIdRemoval(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea, i like it

Comment on lines 43 to 47
id?: string; // A unique identifier for the section within the note.
label: string;
range: Range;
blockId?: string; // The optional block identifier, if one exists (e.g., '^my-id').
isHeading?: boolean; // A boolean flag to clearly distinguish headings from other content blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the part of the PR I want to pay the closest attention to, as the data model for me is extremely important, so bear with me as I am trying to fully understand these choices and their implications.

We do extend Foam's heading section label for a different use for block ids instead of changing the section object.

Can you please help me understand that? In which way are you changing that?
At a high level, I agree that label and id are not necessarily the same. At the same time, right now findSection does a simple label comparison, and in the case of a block, I could see how it would simply be a section where the label is the ID of the block (e.g. ^block-53, notice I am including the ^).
Does it feel semantically incorrect to you?

After some experiments, it has been found that id, blockId, and isHeading are each fairly essential additions to the section object. There is not a great way to get rid of any one of them without making downstream code on the consumers end more complicated

I would like to understand why.
I am asking for a simple purpose: I would like to avoid a sub-optimal change to the model that is driven by "practical" concerns like how many downstream refactorings it requires, because the refactoring is a one off - whereas a change to the core model will be carried over time (accumulates tech debt, and especially on the "public model", which is a sort of API, feels like a heavy burden)

We could rename isHeading to hasHeader, but that's ambiguous about containment. The meaning of isHeading for me, is that it is a markdown heading section as opposed to an html header or yaml frontmatter, and also as opposed to a block identifier section. isHeading as a value contained in the Section object has more clarity than checking for the ^ of the block identifier, which is a complicated string comparison

What about having a type property, which could be 'section' | 'block'. It's semantically closer to the object semantics ("what type of section is this?") and it's extensible shall we need new types in the future (I don't think we need, but it was interesting what you were saying about html/yaml sections)

Heading sections and block id sections (and each kind of block id) simply have many different ways of displaying across different interfaces

In what way they are different? The only thing I can think of is that a section has a header whereas a block will not, beside that I don't see any difference. Am I missing something?

Let's say from this point that what the consumers really need more than anything is a unified section results list, and this is exactly what we construct among the collection of heading sections and block id sections

I agree, this is why I am trying to keep them as "aligned" as possible. My goal would be that from the user's POV you can treat the two in (almost) the same way. This is why I am a bit hesitant about having e.g. custom fields (like blockId) that only really apply to only one type of section.

id can be used for a heading and a heading block id to refer to the same section, like when we reconstruct heading block id backlink hrefs to point to the correct heading section.

Sorry I didn't understand this, can you add an example?

@ryanncode ryanncode marked this pull request as ready for review July 15, 2025 03:14
@ryanncode
Copy link
Author

ryanncode commented Jul 15, 2025

I have added a unified section object according to the requested parameters. Note that this does not allow us to avoid forking code in multiple files for header sections and block ID sections, but it should now be more obvious that this check is required in new files we might want to add.

While working on this solution, I did catch a few more edge case problems that were not noticed previously. One of the main changes I needed to make was to give up trying to clean the transclusion duplicate paragraph tags (so I returned to the original Foam expected test data), it's technically invalid but it's pretty complicated to match the open and closing tags given different inputs and it renders the same either way.

id can be used for a heading and a heading block id to refer to the same section, like when we reconstruct heading block id backlink hrefs to point to the correct heading section.
Sorry I didn't understand this, can you add an example?

This occurs in wikilink-navigation.ts, lines 78-117.

For the header:

Header Text ^header-id

We want both "Header Text" and "^header-id" to be separate usable sections for the user that point to the same header section.

We did additionally add a new section level parameter to the unified section object. This is primarily used in wikilink-embed.ts for determining the end of a section for correct transclusion.

Beyond this immediate requirement, making the level part of the data model opens up possibilities for future features and would be beneficial in other potential downstream consumers, even if they don't strictly require it now. These include:

  • Outline and Tree Views: Any feature that displays a document's structure (like the "Outline" view in VS Code or a panel showing a note's headings) could use the level to render the hierarchy correctly with proper indentation, without having to re-parse the file. The current backlink tree view in tree-view-utils.ts doesn't do this, but it's a logical extension.

  • Navigation Features: A "Go to Next/Previous Heading" command could be made more intelligent. Instead of just jumping to any heading, it could jump to the next heading of the same level, which is a common navigation pattern in structured documents.

  • Refactoring Tools: A tool that refactors a document's headings (e.g., promoting or demoting all subheadings under a specific parent) would need the level information to perform its task correctly.

Lastly, we could consider writing a Pandoc lua script file that would allow publishing for Foam in general and with the new block ID feature. There are existing examples that would make the script pretty simple to write.

Copy link
Collaborator

@riccardoferretti riccardoferretti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really loving the direction this is going, and amazing job on the testing!
I have left various comments, looking forward to your thoughts

Comment on lines +32 to +46
const markdown = `
## My Heading ^heading-id
`;
const actual = parse(markdown);
expect(actual.sections).toEqual([
{
id: 'my-heading',
blockId: '^heading-id',
type: 'heading',
label: 'My Heading',
level: 2, // Add level property
range: Range.create(1, 0, 2, 0),
},
]);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, very good example you picked here.
In my mind, this markdown should actually produce 2 "sections", one triggered by the heading, and one triggered by the block-id.
tbh I am not sure why anyone would write something like this, but being valid markdown I agree we should handle it.

Also, I am seeing that in the case of the heading, the label doesn't include the block-id, whereas in the test case above of the paragraph, the label includes it. I am wondering whether I like this difference in behavior, what od you think?

Comment on lines +194 to +205
expect(actual.sections).toEqual([
{
id: 'parent-id',
blockId: '^parent-id',
type: 'block',
label: `- Parent item ^parent-id
- Child item 1
- Child item 2`,
range: Range.create(1, 0, 3, 16),
},
]);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the label is really the content? I wonder if for lists this should just be the name of the item the block is declared on (maybe not including the block id itself?). In this test case it would be Parent item.
The range can be used to extract the content (as we would do for a heading section).

Notice a lot of my comments are about unifying as much as possible the semantics of block sections with heading sections.
In fact, in my view they are exactly the same thing, only defined by different mechanisms (which is why I wrote that comment about # my section ^section-id generating 2 sections. it's kinda having the same block with two anchors pointing at it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(notice that I am not sure how I would "generate" the label for tables or code blocks, any ideas? maybe the de-slugified block id? just thinking out loud)

// A section created from a markdown heading
export interface HeadingSection extends BaseSection {
type: 'heading';
level: number;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the level property. In theory we could also use it for list items (maybe on a different number-space (e.g. starting at 10) then headers), which i think we can also easily get from the md parser, but I don't consider it a blocker for moving forward.
Actually we could even have it for paragraphs (which i guess would just be level 10 or 0 depending how we want to do the numbering for non-heading sections)

export interface HeadingSection extends BaseSection {
type: 'heading';
level: number;
blockId?: string; // A heading can ALSO have a block-id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echoing what i wrote above, if a heading has a block id, I wonder if we should consider that text to generate 2 sections, one of type type: 'block' and the other of type type: 'section'.

public getIdentifier(
forResource: URI,
exclude?: URI[],
section?: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you adding this section parameter? I am not seeing it being used anywhere

return item;
const items = resource.sections.flatMap(section => {
const sectionItems: vscode.CompletionItem[] = [];
switch (section.type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder whether we are actually converging to a consistent treatment of the two types of sections, especially if we don't do special treatment for headings with block ids

: Range.createFromPosition(Position.create(0, 0), Position.create(0, 0));
const targetSelectionRange = section
? section.range
? (section as any).labelRange || section.range // Use labelRange for headings, fallback to full section range
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure the special casing is necessary, but happy to go with how you feel works best

.slice(section.range.start.line, section.range.end.line)
.join('\n');
if (section.type === 'heading') {
// For headings, extract all content from that heading to the next.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you already get this from the range property? Why are you re-computing it?
the range property should also already take into account the right level


if (isSome(section)) {
rows = rows.slice(section.range.start.line, section.range.end.line);
if (section.type === 'heading') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, you could use the range and add 1 to the row in the start position

md: markdownit,
workspace: FoamWorkspace
workspace: FoamWorkspace,
options?: { root?: vscode.Uri }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find where this is being used and where it's being set?

@tctlrd
Copy link

tctlrd commented Sep 4, 2025

Incredible work on this! I am excited to see this merged.

@riccardoferretti
Copy link
Collaborator

hi @ryanncode - just wanted to check in and see your thoughts on the PR, thanks!

@TomLucidor
Copy link

This is starting to vibe like LogSeq and SiYuan ngl

@riccardoferretti
Copy link
Collaborator

@TomLucidor not sure I understand, can you elaborate?

@TomLucidor
Copy link

Obsidian is traditionally a page-based PKM (link by file so it is easier to code), while LogSeq and SiYuan are block-based PKM (and linking is more modular with outline-like structure). The way they design how the latter works is kinda weird, but often useful when people cannot stand long articles.

@ryanncode
Copy link
Author

hi @ryanncode - just wanted to check in and see your thoughts on the PR, thanks!

Trying to get time to get back on working on this. The first problem I encountered is that I can't fulfill both of your first two requests. I spent a good deal of time working on the first request, only to find that it conflicts with the more important requirements of the second request. In general, I don't see a way to get the VS Code interface to cooperate with a perfectly logical and semantic structure to the code. The best I can do is to clean up at least some of the output on the consumers end. Will submit something new within 1-2 weeks.

@riccardoferretti
Copy link
Collaborator

Sorry, just to be clear I understand, can you tell me what the first and second requests you are referring to are?
And in which way they conflict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants