Skip to content

Conversation

@maju-degrandi
Copy link
Collaborator

This pull request adds the complete implementation of the transpiler responsible for converting documents written in the sweet flavor of the Vinum language into the dry flavor.

Key features included in this implementation:

  • Full sweet-to-dry transformation: The transpiler parses the sweet syntax, including nested and chained function calls, and generates the corresponding dry-style blocks.
  • Accurate lexical analysis: The tokenizer distinguishes between content and function names correctly, even in edge cases involving dots, parentheses, and arbitrary characters.
  • Right-associative parsing: Supports nested call chains with proper block insertion, preserving syntactic structure.
  • Output handling: Transpiled results are written to a new .vin file derived from the input filename, using the _dry suffix.
  • Extensibility: Designed with extensible features in mind; formatting rules like paragraph detection (e.g., via double newlines) can be added later.

This version prioritizes syntactic correctness and faithful interpretation of the sweet input, ensuring that all elements — including text, parentheses, and function markers — are processed as intended.

arthurvergacas and others added 18 commits December 4, 2024 22:41
Mudanças no lexer para não diferenciar regras na tokenização -> passando tudo para o bison
Cases like (content).name and .name are working in flex and bison
There is still a lack of facilities for sweet, the "markdown" defined by Monaco
The error that occurred when blocks of different types with contents of more than one word were linked together has been resolved added start states on Flex.

Now there is just one thread error still unresolved:
blocks of type .block.global need to be transpiled to [global [block]], but are transpiling to [block][global]. (I think this could be resolved fixing the parser, but I don't know how yet)

Furthermore, we still don't accept '(', ')' and '.' as content.
This error was resolved with changes to the parser. A new "call" rule has been created, which takes care of function calls in the ".function" format. Furthermore, the construction of "program" is given by a more complex string manipulation instead of just concatenations, given that the "call" rule returns to "program" with the prepared block, simply inserting its content in the correct position.

NOTE: the characters '(', ')' and '.' are not yet allowed as block content.
Some developers are using Ubuntu 22.04. This distribution packages a
_very_ old version of meson.

Add support for it by checking the meson version to add code that would
break older versions. Also, use the old file name for meson options.
…d output handling

This update significantly improves the tokenizer and parser to correctly interpret the syntax of the "sweet" flavor of the Vinum language.

Changes and improvements include:

- **Improved Tokenization**:
  - The previous implementation incorrectly classified most words as `NAME`, unnecessarily increasing the number of parse tree nodes. This is now fixed: names are only recognized after a dot (`.`) followed by an identifier, as intended by the language design.
  - Introduced a custom lexer state (`AFTER_DOT`) to properly tokenize the sequence `.<name>`. If a dot is not followed by a valid identifier, it is now treated as part of `CONTENT`.
  - The lexer now correctly accepts all characters, including `(`, `)`, and `.`, as valid parts of `CONTENT` when they do not form a function call. This ensures that full text content is preserved and interpreted correctly.

- **Parser Functionality**:
  - Blocks with or without parentheses are now properly parsed, including arbitrarily nested function calls, right-associative chains, and inline content.
  - Function calls without arguments (`.name`) are correctly recognized and translated into dry-style `[name]` blocks.
  - Chained calls like `(a (b) (c).d).e` are parsed into right-associative dry blocks: `[e a (b) [d c]]`.

- **Output File Generation**:
  - The transpiler now writes the dry-style output to a `.vin` file derived from the input filename by appending the `_dry.vin` suffix.
  - Existing dry blocks (already in `[]` form) are preserved as-is.

- **Design Prioritization**:
  - The current focus is on syntactic correctness and faithful transformation from sweet to dry syntax.
  - Features such as double line breaks marking new paragraphs will be implemented later as extensions.

Overall, this refactor brings the transpiler much closer to the intended behavior of the Vinum language by improving syntax recognition, handling edge cases in content, and producing clean and structured output files.
By ensuring calls to external functions
always have an ARGS node, we can prevent
them from being skipped during evaluation
fix: prevent external calls from being skipped
Copy link
Member

@Grillo-0 Grillo-0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your changes to the current develop branch

@@ -0,0 +1,1475 @@
/* A Bison parser, made by GNU Bison 3.8.2. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file

@@ -0,0 +1,14 @@
(foo).bar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the vunit test library, we could use it in here for testing.

@@ -1,9 +1,9 @@
AM_CFLAGS = -Wall -Wextra -Werror -g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use autotools anymore, please use the meson build system

@@ -0,0 +1,93 @@
/* A Bison parser, made by GNU Bison 3.8.2. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want the file generated by lex/bison on the repo. Please add the to the .gitignore

@Grillo-0
Copy link
Member

About the commit's.

For now, just keep adding commits addressing the reviews. In the end - as it's just one "thing" being added to the project - we want just one commit, but it's faster doing this way and squashing all of them after.

arthurvergacas and others added 7 commits June 30, 2025 16:47
Mudanças no lexer para não diferenciar regras na tokenização -> passando tudo para o bison
Cases like (content).name and .name are working in flex and bison
There is still a lack of facilities for sweet, the "markdown" defined by Monaco
The error that occurred when blocks of different types with contents of more than one word were linked together has been resolved added start states on Flex.

Now there is just one thread error still unresolved:
blocks of type .block.global need to be transpiled to [global [block]], but are transpiling to [block][global]. (I think this could be resolved fixing the parser, but I don't know how yet)

Furthermore, we still don't accept '(', ')' and '.' as content.
This error was resolved with changes to the parser. A new "call" rule has been created, which takes care of function calls in the ".function" format. Furthermore, the construction of "program" is given by a more complex string manipulation instead of just concatenations, given that the "call" rule returns to "program" with the prepared block, simply inserting its content in the correct position.

NOTE: the characters '(', ')' and '.' are not yet allowed as block content.
…d output handling

This update significantly improves the tokenizer and parser to correctly interpret the syntax of the "sweet" flavor of the Vinum language.

Changes and improvements include:

- **Improved Tokenization**:
  - The previous implementation incorrectly classified most words as `NAME`, unnecessarily increasing the number of parse tree nodes. This is now fixed: names are only recognized after a dot (`.`) followed by an identifier, as intended by the language design.
  - Introduced a custom lexer state (`AFTER_DOT`) to properly tokenize the sequence `.<name>`. If a dot is not followed by a valid identifier, it is now treated as part of `CONTENT`.
  - The lexer now correctly accepts all characters, including `(`, `)`, and `.`, as valid parts of `CONTENT` when they do not form a function call. This ensures that full text content is preserved and interpreted correctly.

- **Parser Functionality**:
  - Blocks with or without parentheses are now properly parsed, including arbitrarily nested function calls, right-associative chains, and inline content.
  - Function calls without arguments (`.name`) are correctly recognized and translated into dry-style `[name]` blocks.
  - Chained calls like `(a (b) (c).d).e` are parsed into right-associative dry blocks: `[e a (b) [d c]]`.

- **Output File Generation**:
  - The transpiler now writes the dry-style output to a `.vin` file derived from the input filename by appending the `_dry.vin` suffix.
  - Existing dry blocks (already in `[]` form) are preserved as-is.

- **Design Prioritization**:
  - The current focus is on syntactic correctness and faithful transformation from sweet to dry syntax.
  - Features such as double line breaks marking new paragraphs will be implemented later as extensions.

Overall, this refactor brings the transpiler much closer to the intended behavior of the Vinum language by improving syntax recognition, handling edge cases in content, and producing clean and structured output files.
@artP2
Copy link
Contributor

artP2 commented Jul 2, 2025

Create the vinumt subproject, and include it in vinumc as a shared library using meson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants