diff --git a/UPGRADING.md b/UPGRADING.md new file mode 100644 index 0000000..4227709 --- /dev/null +++ b/UPGRADING.md @@ -0,0 +1,294 @@ +# Upgrading Markbridge + +## 0.x — migration-API redesign + +This release reshapes the top-level API around `Conversion`/`Parse` +result types and a single `renderer:` kwarg for render-side +customization. There is no backwards-compatibility shim — the changes +are mechanical but every importer call site needs to be updated. + +### Convenience methods now return a `Conversion`, not a `String` + +```ruby +# Before +markdown = Markbridge.bbcode_to_markdown(input) +markdown.gsub(/.../, "...") # String operation + +# After +result = Markbridge.bbcode_to_markdown(input) +result.markdown.gsub(/.../, "...") # explicit access + +# Or, if you only need the string for puts/interpolation: +puts result # to_s delegates to markdown +"got #{result}" # works +``` + +`Conversion` carries `markdown`, `ast`, `format`, `unknown_tags`, +`diagnostics`, `emissions`, `errors`. It does *not* delegate other +String methods — `result.gsub(...)` will raise `NoMethodError`. Use +`result.markdown.gsub(...)`. + +### Singleton config and per-process default registries are gone + +The following are removed: + +- `Markbridge.configuration` +- `Markbridge.configure { |c| c.escape_hard_line_breaks = ... }` +- `Markbridge.reset_defaults!` +- `Markbridge.default_handlers` +- `Markbridge.default_html_handlers` +- `Markbridge.default_text_formatter_handlers` +- `Markbridge.default_tag_library` +- `Markbridge::Configuration` (the class) + +To customize rendering, build a `Renderer` once via the new factory +and pass it through `renderer:`: + +```ruby +# Before +Markbridge.configure { |c| c.escape_hard_line_breaks = true } +Markbridge.default_tag_library.register(MyAst::Bold, MyTag.new) +Markbridge.bbcode_to_markdown(input) + +# After +RENDERER = + Markbridge.discourse_renderer( + tags: { MyAst::Bold => MyTag.new }, + escape_hard_line_breaks: true, + ) +Markbridge.bbcode_to_markdown(input, renderer: RENDERER) +``` + +Build the renderer once outside your migration loop and reuse it +across thousands of posts. + +### `tags:`, `tag_library:`, `escaper:`, `escape_hard_line_breaks:` removed from per-call signature + +All four moved into `Markbridge.discourse_renderer(...)`. The four +`*_to_markdown` methods plus `Markbridge.convert` now accept only: + +- `handlers:` — parser handler registry +- `renderer:` — pre-built Renderer +- `raise_on_error:` — boolean (default `true`) + +### MediaWiki kwarg renamed: `inline_tag_registry:` → `handlers:` + +```ruby +# Before +Markbridge.parse_mediawiki(input, inline_tag_registry: my_registry) +Markbridge::Parsers::MediaWiki::Parser.new(inline_tag_registry: my_registry) + +# After +Markbridge.parse_mediawiki(input, handlers: my_registry) +Markbridge::Parsers::MediaWiki::Parser.new(handlers: my_registry) +``` + +The accepted *type* is unchanged — still an `InlineTagRegistry`. Only +the parameter name moves, for parity with the BBCode/HTML/TextFormatter +parsers. + +### TextFormatter handlers must accept `processor:` + +`Parsers::TextFormatter::Handlers::BaseHandler#process` now has a +three-arg signature: + +```ruby +# Before +def process(element:, parent:) + +# After +def process(element:, parent:, processor: nil) +``` + +Update every custom subclass under your importer's TextFormatter +handler tree. The `processor:` argument is the parser instance and +exposes `process_children(xml_element, ast_node)` for handlers that +want to recurse into children manually. + +### Proc/lambda handlers no longer supported + +Both HTML and TextFormatter previously accepted a `Proc`/lambda as a +handler. They now accept only objects responding to `#process(...)`. +Existing default handlers were already class-based; the only places +this affected built-in code were `
`/`
` lambdas (now +`HTML::Handlers::SelfClosingHandler`) and the +`examples/custom_text_formatter_mappings.rb` lambdas (now Handler +classes). + +Migration: define a tiny class extending the parser's `BaseHandler` +and move your lambda body into `#process(element:, parent:[, processor:])`. + +```ruby +# Before +registry.register("HIGHLIGHT", ->(element:, parent:, processor:) { + parent << HighlightNode.new(...) + nil +}) + +# After +class HighlightHandler < Markbridge::Parsers::TextFormatter::Handlers::BaseHandler + def initialize; @element_class = HighlightNode; end + attr_reader :element_class + + def process(element:, parent:, processor:) + parent << HighlightNode.new(...) + nil + end +end +registry.register("HIGHLIGHT", HighlightHandler.new) +``` + +The `BBCode` parser has always required class handlers (its +`on_open`/`on_close` lifecycle doesn't fit the lambda shape). All +three parsers now follow the same rule. + +### Resolution lives in handlers, not Tags + +The migration use case resolves placeholders (uploads, mentions, +internal links) at parse time via custom handler subclasses. The +handler stores the source-side reference in the converter's +upload/user/topic store, gets back a stable identifier, and pins +it on the AST node directly. Renderer Tags remain trivial output +formatting — no per-post state, no side-channel. + +```ruby +# Custom AST node carrying the resolved id +class AttachmentPlaceholder < Markbridge::AST::Node + attr_reader :upload_id + def initialize(upload_id:); super(); @upload_id = upload_id; end +end + +# Handler: resolves at parse, pins id on the node +class AttachmentHandler < Markbridge::Parsers::BBCode::Handlers::BaseHandler + def initialize(uploads:); @uploads = uploads; end + def on_open(token:, context:, registry:, tokens: nil) + upload = @uploads.store_or_lookup(token.attrs[:option]) + context.add_child(AttachmentPlaceholder.new(upload_id: upload.id)) + end + def element_class; AttachmentPlaceholder; end +end + +# Tag: trivial output formatter, no state +class AttachmentTag < Markbridge::Renderers::Discourse::Tag + def render(element, _interface) = "[upload|#{element.upload_id}]" +end + +RENDERER = Markbridge.discourse_renderer( + tags: { AttachmentPlaceholder => AttachmentTag.new }, +) +``` + +`interface.emit` and `Conversion#emissions` (intermediate API in +earlier drafts of this redesign) are not part of the shipped API. +Resolution-aware base handlers belong in the converter framework +that wraps Markbridge; per-format converters (phpBB, vBulletin, +SMF, IPB attachment handlers) subclass them. + +### `RawHandler` no longer requires `language:` on the AST class + +`Markbridge::Parsers::BBCode::Handlers::RawHandler` used to call +`@element_class.new(language:)` unconditionally. Custom AST classes +reused with `RawHandler` had to declare a `language:` kwarg even when +unused. Now the handler introspects the AST class once and only passes +`language:` when the class accepts it. No code action needed unless +you'd previously added a dummy `def initialize(language: nil); super(); end` +just to satisfy the handler — you can remove it. + +### Selective Markdown escaping (`allow:`) + +Importers that want list markers (or other block-level constructs) +to survive escaping no longer need to subclass `MarkdownEscaper`: + +```ruby +# Before +class ListPermissiveEscaper < Markbridge::Renderers::Discourse::MarkdownEscaper + private + def escape_block_level(content, prev_was_paragraph) + case content.getbyte(0) + when 0x2D, 0x2A, 0x2B then return content, false if content.match?(/\A[-*+]\s/) + when 0x30..0x39 then return content, false if content.match?(/\A\d+[.)]\s/) + end + super + end +end +RENDERER = Markbridge.discourse_renderer(escaper: ListPermissiveEscaper.new) + +# After +RENDERER = Markbridge.discourse_renderer(allow: :lists) +``` + +Recognised keys: `:bullet_list`, `:ordered_list`, `:atx_heading`, +`:block_quote`. Aliases: `:lists` → `[:bullet_list, :ordered_list]`. +Unknown keys raise `ArgumentError`. Thematic breaks (`---`, `***`) +and setext underlines (`===`) are still escaped — the kwarg +allow-lists specific block markers, not whole sections of the +escaper. + +### Disabling Markdown escaping wholesale + +For migration paths where the source content is already trusted +Markdown: + +```ruby +NO_ESCAPE = Markbridge.discourse_renderer(escape: false) +Markbridge.bbcode_to_markdown(input, renderer: NO_ESCAPE) +``` + +Internally this swaps in `Markbridge::Renderers::Discourse::IdentityEscaper` +(a tiny `#escape(text) → text || ""` class). `escape: false` is +mutually exclusive with `escape_hard_line_breaks:` / `allow:` — +those configure `MarkdownEscaper`, which `escape: false` replaces +wholesale. An explicit `escaper:` always wins over either. + +For *per-AST-node* opt-out, `AST::MarkdownText` already exists and +bypasses the escaper for that node only. + +### Modifying the AST between parse and render + +Two new shapes let you mutate the parsed AST before rendering, e.g. +to append attachments that weren't in the source post: + +```ruby +# Block form on every *_to_markdown / convert method +Markbridge.bbcode_to_markdown(input, renderer: RENDERER) do |ast| + attachments.each { |a| ast << OrphanAttachment.new(source_id: a.id) } +end + +# Or pass a Parse explicitly to .render +parse = Markbridge.parse_bbcode(input) +parse.ast << OrphanAttachment.new(source_id: 7) +result = Markbridge.render(parse, renderer: RENDERER, raise_on_error: false) +# result.unknown_tags / .diagnostics / .format are preserved from the Parse. +``` + +`Markbridge.render` accepts either a `Parse` (preferred — preserves +`unknown_tags`/`diagnostics`/source `format`) or a bare AST node +(fields default to empty / `:discourse`). Mutations made between +parse and render persist in `Conversion#ast`. + +### Per-row failure isolation + +For migration loops, set `raise_on_error: false` to surface render +exceptions on `Conversion#errors` instead of crashing the loop: + +```ruby +posts.each do |post| + result = Markbridge.bbcode_to_markdown(post.body, renderer: RENDERER, raise_on_error: false) + if result.errors.any? + log_failure(post, result.errors) + else + write_markdown(post, result.markdown) + end +end +``` + +The default is still `raise_on_error: true`, preserving the prior +behavior of letting exceptions propagate. + +### See also + +- `examples/forum_migration.rb` — canonical end-to-end importer shape + exercising every new path: `discourse_renderer` factory, `tags:`, + `unregister:`, `allow: :lists`, the AST-mutation block, + `raise_on_error: false`, `Markbridge.convert(format:)` dispatch. +- `docs/extending.md` — how to add custom tags and handlers. diff --git a/docs/extending.md b/docs/extending.md index 49dba40..010c3de 100644 --- a/docs/extending.md +++ b/docs/extending.md @@ -194,6 +194,26 @@ def self.default end ``` +### Auto-passthrough for unregistered AST classes + +A custom AST class that has *no* Tag bound to it doesn't need a +"passthrough" Tag — `Renderer#render` falls through to +`render_children` automatically (see `lib/markbridge/renderers/discourse/renderer.rb`). +You only need to register a Tag when the class needs a non-trivial +rendering. To remove a built-in binding so this passthrough kicks in, +use `TagLibrary#unregister`: + +```ruby +library.unregister(AST::Color) # Color now renders as just its children +library.unregister(AST::Size) # Size too +``` + +Or, more concisely, via the `Markbridge.discourse_renderer` factory: + +```ruby +Markbridge.discourse_renderer(unregister: [AST::Color, AST::Size]) +``` + ### Step 6: Add Requires **File:** `lib/markbridge/ast.rb` diff --git a/docs/parsers/mediawiki.md b/docs/parsers/mediawiki.md index 6babd82..c645e2a 100644 --- a/docs/parsers/mediawiki.md +++ b/docs/parsers/mediawiki.md @@ -99,7 +99,7 @@ ast = parser.parse("highlighted") registry = Markbridge::Parsers::MediaWiki::InlineTagRegistry.build_from_default do |r| r.register("mark", :formatting, Markbridge::AST::Bold) end -parser = Markbridge::Parsers::MediaWiki::Parser.new(inline_tag_registry: registry) +parser = Markbridge::Parsers::MediaWiki::Parser.new(handlers: registry) ``` ### Via Top-Level API diff --git a/docs/renderers/discourse.md b/docs/renderers/discourse.md index 9367f6e..8aefe30 100644 --- a/docs/renderers/discourse.md +++ b/docs/renderers/discourse.md @@ -621,31 +621,11 @@ end ## Configuration -### Global Configuration - -Use `Markbridge.configure` to set options that apply to all `*_to_markdown` convenience methods: - -```ruby -Markbridge.configure do |config| - # Strip trailing spaces before newlines to prevent hard line breaks (
). - # Defaults to false (Discourse has this disabled by default). - config.escape_hard_line_breaks = true -end - -Markbridge.bbcode_to_markdown("[b]Hello[/b]") # uses configured settings -``` - -You can also read the current configuration: - -```ruby -Markbridge.configuration.escape_hard_line_breaks # => false (default) -``` - -Available settings: - -| Setting | Default | Description | -|---------|---------|-------------| -| `escape_hard_line_breaks` | `false` | Strip trailing spaces before newlines to prevent `
` | +Markbridge has no global configuration. Render-side options (custom +escaper, custom Tags, custom postprocessor) are passed per call via a +configured `Renderer`. The escaper and postprocessor will become +configurable in a follow-up step of the API redesign; meanwhile, the +default Renderer is used for every convenience-method call. ### Using Default Library diff --git a/examples/basic_usage.rb b/examples/basic_usage.rb index 4de91a5..7a266fe 100644 --- a/examples/basic_usage.rb +++ b/examples/basic_usage.rb @@ -3,71 +3,35 @@ require_relative "../lib/markbridge/bbcode" # Example 1: Basic formatting -bbcode = "[b]Bold[/b] and [i]italic[/i] text" -markdown = Markbridge.bbcode_to_markdown(bbcode) -puts markdown +result = Markbridge.bbcode_to_markdown("[b]Bold[/b] and [i]italic[/i] text") +puts result.markdown # => "**Bold** and *italic* text" # Example 2: Code blocks -bbcode = "[code]def hello\n puts 'world'\nend[/code]" -markdown = Markbridge.bbcode_to_markdown(bbcode) -puts markdown +result = Markbridge.bbcode_to_markdown("[code]def hello\n puts 'world'\nend[/code]") +puts result.markdown # => "```\ndef hello\n puts 'world'\nend\n```" -# Example 3: Custom handlers and tags -# Create custom handler registry -handlers = Markbridge::Parsers::BBCode::HandlerRegistry.new - -# Add a simple custom element -class CustomElement < Markbridge::AST::Element +# Example 3: Custom Tag via the renderer factory +class ShoutTag < Markbridge::Renderers::Discourse::Tag + def render(element, interface) + interface.render_children(element, context: interface.with_parent(element)).upcase + end end -# Register handler for custom tag -custom_handler = Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(CustomElement) -handlers.register("custom", custom_handler) -handlers.register_element_handler(CustomElement, custom_handler) - -# # Create custom tag library for rendering -# tag_library = Markbridge::Renderers::Discourse::TagLibrary.new -# custom_tag = -# Markbridge::Renderers::Discourse::Tag.new do |element, renderer| -# content = renderer.render_children(element) -# "<<#{content}>>" -# end -# tag_library.register(CustomElement, custom_tag) -# -# markdown = -# Markbridge.bbcode_to_markdown( -# "[custom]test[/custom]", -# handlers:, -# tag_library: -# ) -# puts markdown -# # => "<>" - -# Example 4: Parse to AST and inspect -ast = Markbridge.parse_bbcode("[b]Hello[/b]") -puts ast.inspect -# You'll see Bold instead of BoldElement - -# Example 5: Nested lists (ordered and unordered) -bbcode = <<~BBCODE - [list] - [*]Item 1 - [*]Item 2 - [list=1] - [*]Subitem 2.1 - [*]Subitem 2.2 - [/list] - [*]Item 3 - [/list] -BBCODE - -markdown = Markbridge.bbcode_to_markdown(bbcode) -puts markdown -# => -# - Item 1 -# - Item 2 -# 1. Subitem 2.1 -# 2. Subitem 2.2 -# - Item 3 +renderer = Markbridge.discourse_renderer(tags: { Markbridge::AST::Bold => ShoutTag.new }) +result = Markbridge.bbcode_to_markdown("[b]hello[/b] world", renderer:) +puts result.markdown +# => "HELLO world" + +# Example 4: Inspect parse-side data without rendering +parse = Markbridge.parse_bbcode("[b]hi[/b][unknownext]x[/unknownext]") +puts "AST root has #{parse.ast.children.size} children" +puts "unknown tags: #{parse.unknown_tags}" + +# Example 5: Conversion result is more than just a string +result = Markbridge.bbcode_to_markdown("[b]hi[/b]") +puts "markdown: #{result.markdown.inspect}" +puts "format: #{result.format}" +puts "errors: #{result.errors.inspect}" +puts "string-coerce works: #{result}" # via to_s diff --git a/examples/custom_text_formatter_mappings.rb b/examples/custom_text_formatter_mappings.rb index 9154a3e..ffdd8fe 100644 --- a/examples/custom_text_formatter_mappings.rb +++ b/examples/custom_text_formatter_mappings.rb @@ -3,19 +3,24 @@ # Example: Customizing the s9e/TextFormatter XML parser # -# This demonstrates how to extend or override element mappings in the s9e/TextFormatter parser -# to handle custom XML elements or change the default behavior. +# Demonstrates how to extend or override element mappings in the +# s9e/TextFormatter parser by registering Handler classes. Every +# handler must respond to `#process(element:, parent:, processor:)` +# and return either an AST element (parser recurses into children) +# or nil (leaf — no further processing). require "bundler/setup" require "markbridge/textformatter" -# Example 1: Add a custom element mapping using a lambda -# ======================================================= +# ---------------------------------------------------------------- +# Example 1: Add a custom element mapping +# ---------------------------------------------------------------- # -# Suppose your forum uses a custom BBCode plugin that adds a element -# to the s9e/TextFormatter XML output. +# Suppose your forum uses a custom BBCode plugin that adds a +# element to the s9e/TextFormatter XML output. Provide +# a Handler that constructs your AST node and recurses into the +# element's children. -# Create a custom AST node (or reuse existing one) class HighlightNode < Markbridge::AST::Element attr_reader :color @@ -25,45 +30,51 @@ def initialize(color: "yellow") end end -# Create parser with custom lambda handler +class HighlightHandler < Markbridge::Parsers::TextFormatter::Handlers::BaseHandler + def initialize + @element_class = HighlightNode + end + + def process(element:, parent:, processor:) + attrs = extract_attributes(element) + node = HighlightNode.new(color: attrs[:color] || "yellow") + parent << node + processor.process_children(element, node) + nil # we recursed manually; don't double-process + end + + attr_reader :element_class +end + parser = Markbridge::Parsers::TextFormatter::Parser.new do |registry| - # Add lambda handler for custom HIGHLIGHT element - registry.register( - "HIGHLIGHT", - lambda do |element:, parent:, processor:| - attrs = {} - element.attributes.each { |name, attr| attrs[name.downcase.to_sym] = attr.value } - node = HighlightNode.new(color: attrs[:color] || "yellow") - parent << node - processor.process_children(element, node) - end, - ) - end - -# Parse XML with custom element + registry.register("HIGHLIGHT", HighlightHandler.new) + end + xml = 'Normal text highlighted text more text' ast = parser.parse(xml) -puts "Example 1: Custom element mapping with lambda" +puts "Example 1: Custom element mapping" puts "AST contains #{ast.children.length} elements" highlight = ast.children.find { |c| c.is_a?(HighlightNode) } puts "Highlight color: #{highlight&.color}" puts -# Example 2: Override default element mapping with a handler class -# ================================================================== +# ---------------------------------------------------------------- +# Example 2: Override a default mapping with a custom handler class +# ---------------------------------------------------------------- # -# You can override default mappings by creating custom handler classes. +# Just register your handler under the same name; it overwrites the +# default. Returning the constructed element lets the parser recurse +# into children automatically. class CustomQuoteHandler < Markbridge::Parsers::TextFormatter::Handlers::BaseHandler def initialize @element_class = Markbridge::AST::Quote end - def process(element:, parent:, processor:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) - # Add custom logic here - for example, default author to "Anonymous" quote = Markbridge::AST::Quote.new( author: attrs[:author] || "Anonymous", @@ -72,7 +83,7 @@ def process(element:, parent:, processor:) username: attrs[:username], ) parent << quote - processor.process_children(element, quote) + quote # returning the node lets the parser process children into it end attr_reader :element_class @@ -80,79 +91,97 @@ def process(element:, parent:, processor:) parser = Markbridge::Parsers::TextFormatter::Parser.new do |registry| - # Override the default QUOTE handler with our custom handler registry.register("QUOTE", CustomQuoteHandler.new) end xml = 'Custom quote handling' ast = parser.parse(xml) -puts "Example 2: Override default mapping with handler class" +puts "Example 2: Override default mapping" puts "Quote author: #{ast.children.first.author}" puts -# Example 3: Building from defaults with multiple customizations using lambdas -# ============================================================================== +# ---------------------------------------------------------------- +# Example 3: Multiple customizations on top of the defaults +# ---------------------------------------------------------------- +# +# A leaf-node handler returns nil; the parser does not recurse. +# A wrapping handler returns the AST node it just appended. + +class CustomSpoilerHandler < Markbridge::Parsers::TextFormatter::Handlers::BaseHandler + def initialize + @element_class = Markbridge::AST::Spoiler + end + + def process(element:, parent:, processor: nil) + attrs = extract_attributes(element) + node = Markbridge::AST::Spoiler.new(title: attrs[:title] || "Click to reveal") + parent << node + node + end + + attr_reader :element_class +end + +class CustomTextHandler < Markbridge::Parsers::TextFormatter::Handlers::BaseHandler + def initialize + @element_class = Markbridge::AST::Text + end + + def process(element:, parent:, processor: nil) + attrs = extract_attributes(element) + parent << Markbridge::AST::Text.new("[CUSTOM: #{attrs[:value]}]") + nil # leaf + end + + attr_reader :element_class +end + +class MentionHandler < Markbridge::Parsers::TextFormatter::Handlers::BaseHandler + def initialize + @element_class = Markbridge::AST::Text + end + + def process(element:, parent:, processor: nil) + attrs = extract_attributes(element) + parent << Markbridge::AST::Text.new("@#{attrs[:username]}") + nil # leaf + end + + attr_reader :element_class +end parser = Markbridge::Parsers::TextFormatter::Parser.new do |registry| - # Override default spoiler with lambda - registry.register( - "SPOILER", - lambda do |element:, parent:, processor:| - attrs = {} - element.attributes.each { |name, attr| attrs[name.downcase.to_sym] = attr.value } - node = Markbridge::AST::Spoiler.new(title: attrs[:title] || "Click to reveal") - parent << node - processor.process_children(element, node) - end, - ) - - # Map unknown custom element to text (leaf node, no children) - registry.register( - "CUSTOM", - lambda do |element:, parent:, processor:| - attrs = {} - element.attributes.each { |name, attr| attrs[name.downcase.to_sym] = attr.value } - parent << Markbridge::AST::Text.new("[CUSTOM: #{attrs[:value]}]") - end, - ) - - # Add support for user mentions (leaf node) - registry.register( - "MENTION", - lambda do |element:, parent:, processor:| - attrs = {} - element.attributes.each { |name, attr| attrs[name.downcase.to_sym] = attr.value } - parent << Markbridge::AST::Text.new("@#{attrs[:username]}") - end, - ) + registry.register("SPOILER", CustomSpoilerHandler.new) + registry.register("CUSTOM", CustomTextHandler.new) + registry.register("MENTION", MentionHandler.new) end xml = 'Hidden ' ast = parser.parse(xml) -puts "Example 3: Multiple customizations with lambdas" +puts "Example 3: Multiple customizations" puts "AST has #{ast.children.length} top-level elements" puts -# Example 4: Using HandlerRegistry directly with handler objects -# ================================================================ +# ---------------------------------------------------------------- +# Example 4: Building a HandlerRegistry directly +# ---------------------------------------------------------------- # -# For more control, you can create a custom registry and pass it to the parser. +# For more control, build the registry yourself and pass it via +# `handlers:` instead of using the block form. -# Create a handler class for VIDEO elements class VideoHandler < Markbridge::Parsers::TextFormatter::Handlers::BaseHandler def initialize @element_class = Markbridge::AST::Url end - def process(element:, parent:, processor:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) - # Map VIDEO to a URL node (could create custom Video node instead) node = Markbridge::AST::Url.new(href: attrs[:url]) parent << node - processor.process_children(element, node) + node # parser will process children into the returned node end attr_reader :element_class @@ -160,24 +189,24 @@ def process(element:, parent:, processor:) registry = Markbridge::Parsers::TextFormatter::HandlerRegistry.new registry.register_defaults # Load default handlers - -# Add custom handler registry.register("VIDEO", VideoHandler.new) -# Create parser with custom registry parser = Markbridge::Parsers::TextFormatter::Parser.new(handlers: registry) xml = '' ast = parser.parse(xml) -puts "Example 4: Custom handler registry with handler objects" +puts "Example 4: Custom handler registry" puts "Parsed video as URL: #{ast.children.first.href}" puts -# Example 5: Preserving unknown elements vs. custom handling -# =========================================================== +# ---------------------------------------------------------------- +# Example 5: Unknown elements +# ---------------------------------------------------------------- +# +# By default, unknown elements are preserved as text and tracked +# in `parser.unknown_tags`. -# By default, unknown elements are preserved as text default_parser = Markbridge::Parsers::TextFormatter::Parser.new xml = 'content' diff --git a/examples/forum_migration.rb b/examples/forum_migration.rb new file mode 100644 index 0000000..7023111 --- /dev/null +++ b/examples/forum_migration.rb @@ -0,0 +1,195 @@ +#!/usr/bin/env ruby +# frozen_string_literal: true + +# Example: Forum migration to Discourse +# +# This is the canonical end-to-end example for the Markbridge migration +# API. It exercises every feature an importer cares about: +# +# - Custom AST node + BBCode handler ([font]) +# - Handler delegation via `overlay` (custom -style URL handler +# wrapping the default) +# - `Markbridge.discourse_renderer(...)` factory +# - `tags:` Hash, `unregister:` Array, `allow: :lists` for selective +# escaper passthrough +# - The AST-mutation block on `Markbridge.convert` (append orphan +# attachments before rendering) +# - `Conversion#errors` / `Conversion#unknown_tags` +# - `raise_on_error: false` for per-row failure isolation +# - `Markbridge.convert(input, format:)` dispatch +# +# Note: this example deliberately keeps the converter side trivial. +# Real importers put placeholder resolution (uploads, mentions, +# internal links) in custom handler subclasses at parse time, with +# the AST node carrying the resolved identifier — render Tags then +# stay one-line output formatters. That layer lives in the +# converter framework, not Markbridge. +# +# Run it: bundle exec ruby examples/forum_migration.rb + +require "bundler/setup" +require "markbridge/all" + +# -- Custom AST + handler ---------------------------------------------------- + +# AST node for [font=courier]...[/font] BBCode tags. +class FontNode < Markbridge::AST::Element + attr_reader :font + + def initialize(font: nil) + super() + @font = font + end +end + +# Parser handler. BBCode handlers are class-based (not lambda-based) +# because the open/close lifecycle and auto_closeable? introspection +# don't fit a single lambda shape. +class FontHandler < Markbridge::Parsers::BBCode::Handlers::BaseHandler + def initialize + @element_class = FontNode + end + + def on_open(token:, context:, registry:, tokens: nil) + font = token.attrs[:font] || token.attrs[:option] + context.push(FontNode.new(font:), token:) + end + + def auto_closeable? + true + end + + attr_reader :element_class +end + +# Renderer Tag: monospace fonts → inline code, everything else falls +# through to the children (no marker). +FONT_TAG = + Markbridge::Renderers::Discourse::Tag.new do |element, interface| + child_context = interface.with_parent(element) + content = interface.render_children(element, context: child_context) + + if element.font&.match?(/\b(courier|monospace|consolas|menlo|monaco)\b/i) + "`#{content.strip}`" + else + content + end + end + +# -- Build the importer's reusable parts (handlers + renderer) -------------- +# +# Forum posts using "1. item" / "- item" syntax need their leading +# markers preserved (the default escaper would escape them as +# `\- item`). Pass `allow: :lists` to the renderer factory — no +# subclassing required. + +class LoggingUrlHandler < Markbridge::Parsers::BBCode::Handlers::BaseHandler + def initialize(default:) + @default = default + @element_class = default.element_class + end + + def on_open(token:, context:, registry:, tokens: nil) + # ...real importer would log here; we just delegate. + @default.on_open(token:, context:, registry:, tokens:) + end + + attr_reader :element_class +end + +HANDLERS = + Markbridge::Parsers::BBCode::HandlerRegistry.build_from_default do |r| + r.register("font", FontHandler.new) + + # Demo of overlay/delegation: wrap the default URL handler once + # and re-bind it under every tag name that should use it. Note: + # when multiple tag names share an element class (url/link/iurl + # all build AST::Url), the wrapper must be a *single* instance + # so the closing-strategy's element→handler reconciliation + # finds the same object on both sides. `overlay` with one name + # is the simple case; multi-name aliases need the explicit + # `register` form below. + default_url = r["url"] + r.register(%w[url link iurl], LoggingUrlHandler.new(default: default_url)) + end + +RENDERER = + Markbridge.discourse_renderer( + tags: { + FontNode => FONT_TAG, + }, + # Drop built-ins so they fall through to render_children. Forum + # posts often use [color]/[size]/[u] decoratively; importers + # typically don't want those bytes in the migrated Markdown. + unregister: [Markbridge::AST::Color, Markbridge::AST::Size, Markbridge::AST::Underline], + allow: :lists, + ) + +# -- Sample posts to migrate ------------------------------------------------ + +POSTS = [ + { + id: 1, + format: :bbcode, + body: "[b]hello[/b] [color=red]world[/color] [font=courier]code[/font]", + }, + { + id: 2, + format: :bbcode, + body: "see [url=https://forum.example.com/t/42]this[/url] and [url=https://example.org]ext[/url]", + }, + { + id: 3, + format: :bbcode, + body: "[unknownext]hello[/unknownext]", + }, + { + id: 4, + format: :html, + body: "html alice", + }, + { + id: 5, + format: :bbcode, + body: "[b]see attachments below[/b]", + orphan_attachments: %w[5001 5002], + }, +] + +# -- The migration loop ------------------------------------------------------ + +stats = { ok: 0, errors: 0 } + +POSTS.each do |post| + # `Markbridge.convert(..., format:, &block)` yields the parsed AST + # between parse and render, so we can append attachments that + # weren't in the source post but should appear at the bottom of the + # rendered Markdown. + result = + Markbridge.convert( + post[:body], + format: post[:format], + handlers: post[:format] == :bbcode ? HANDLERS : nil, + renderer: RENDERER, + raise_on_error: false, + ) do |ast| + Array(post[:orphan_attachments]).each do |att| + ast << Markbridge::AST::Text.new("\n\n[attachment:#{att}]") + end + end + + if result.errors.any? + stats[:errors] += 1 + puts "post ##{post[:id]} FAILED: #{result.errors.first.message}" + next + end + + stats[:ok] += 1 + puts "post ##{post[:id]} (#{post[:format]}): #{result.markdown.inspect}" + puts " unknown_tags: #{result.unknown_tags}" if result.unknown_tags.any? +end + +puts +puts "Migration complete:" +puts " ok: #{stats[:ok]}" +puts " errors: #{stats[:errors]}" diff --git a/lib/markbridge.rb b/lib/markbridge.rb index 4fccf3d..41c44ef 100644 --- a/lib/markbridge.rb +++ b/lib/markbridge.rb @@ -1,7 +1,8 @@ # frozen_string_literal: true require_relative "markbridge/version" -require_relative "markbridge/configuration" +require_relative "markbridge/parse" +require_relative "markbridge/conversion" require_relative "markbridge/ast" require_relative "markbridge/renderers/discourse" @@ -9,158 +10,284 @@ module Markbridge class << self - # Parse BBCode to AST + # Parse BBCode to AST. + # # @param input [String] BBCode source - # @param handlers [HandlerRegistry, nil] custom handler registry or use default - # @return [AST::Document] + # @param handlers [Parsers::BBCode::HandlerRegistry, nil] custom handlers (defaults to .default) + # @return [Parse] def parse_bbcode(input, handlers: nil) - handlers ||= default_handlers - parse_with(Parsers::BBCode::Parser, input, handlers:) + raise ArgumentError, "input cannot be nil" if input.nil? + + parser = Parsers::BBCode::Parser.new(handlers:) + ast = parser.parse(input.to_s) + + Parse.new( + ast:, + format: :bbcode, + unknown_tags: parser.unknown_tags, + diagnostics: bbcode_diagnostics(parser), + ) end - # Convert BBCode to Discourse Markdown + # Convert BBCode to Discourse Markdown. + # + # If a block is given, it is called with the parsed AST between + # parse and render — the caller can append/remove/replace nodes + # before rendering. Mutations to the yielded AST persist in + # {Conversion#ast}. + # # @param input [String] BBCode source - # @param handlers [HandlerRegistry, nil] custom handler registry or use default - # @param tag_library [TagLibrary, nil] custom tag library or use default - # @return [String] Markdown output - def bbcode_to_markdown(input, handlers: nil, tag_library: nil) - ast = parse_bbcode(input, handlers:) - render_to_markdown(ast, tag_library:) + # @param handlers [Parsers::BBCode::HandlerRegistry, nil] custom handlers + # @param renderer [Renderers::Discourse::Renderer, nil] custom renderer + # (build with {.discourse_renderer}); defaults to a fresh default Renderer + # @param raise_on_error [Boolean] when true (default), let render-time + # exceptions propagate; when false, swallow them, return whatever + # the renderer produced before failing, and surface them via + # {Conversion#errors}. + # @yieldparam ast [AST::Document] mutate before rendering (optional) + # @return [Conversion] + def bbcode_to_markdown(input, handlers: nil, renderer: nil, raise_on_error: true) + parse = parse_bbcode(input, handlers:) + yield(parse.ast) if block_given? + build_conversion(parse, renderer:, raise_on_error:) end - # Parse HTML to AST + # Parse HTML to AST. + # # @param input [String] HTML source - # @param handlers [HandlerRegistry, nil] custom handler registry or use default - # @return [AST::Document] + # @param handlers [Parsers::HTML::HandlerRegistry, nil] custom handlers + # @return [Parse] def parse_html(input, handlers: nil) - handlers ||= default_html_handlers - parse_with(Parsers::HTML::Parser, input, handlers:) + raise ArgumentError, "input cannot be nil" if input.nil? + + parser = Parsers::HTML::Parser.new(handlers:) + ast = parser.parse(input.to_s) + + Parse.new(ast:, format: :html, unknown_tags: parser.unknown_tags, diagnostics: {}) end - # Convert HTML to Discourse Markdown + # Convert HTML to Discourse Markdown. + # # @param input [String] HTML source - # @param handlers [HandlerRegistry, nil] custom handler registry or use default - # @param tag_library [TagLibrary, nil] custom tag library or use default - # @return [String] Markdown output - def html_to_markdown(input, handlers: nil, tag_library: nil) - ast = parse_html(input, handlers:) - render_to_markdown(ast, tag_library:) + # @param handlers [Parsers::HTML::HandlerRegistry, nil] custom handlers + # @param renderer [Renderers::Discourse::Renderer, nil] custom renderer + # @param raise_on_error [Boolean] + # @yieldparam ast [AST::Document] mutate before rendering (optional) + # @return [Conversion] + def html_to_markdown(input, handlers: nil, renderer: nil, raise_on_error: true) + parse = parse_html(input, handlers:) + yield(parse.ast) if block_given? + build_conversion(parse, renderer:, raise_on_error:) end - # Parse s9e/TextFormatter XML to AST - # @param input [String] XML source in s9e/TextFormatter format - # @param handlers [Parsers::TextFormatter::HandlerRegistry, nil] custom handler registry or use default - # @return [AST::Document] + # Parse s9e/TextFormatter XML to AST. + # + # @param input [String] XML source + # @param handlers [Parsers::TextFormatter::HandlerRegistry, nil] custom handlers + # @return [Parse] def parse_text_formatter_xml(input, handlers: nil) - handlers ||= default_text_formatter_handlers - parse_with(Parsers::TextFormatter::Parser, input, handlers:) + raise ArgumentError, "input cannot be nil" if input.nil? + + parser = Parsers::TextFormatter::Parser.new(handlers:) + ast = parser.parse(input.to_s) + + Parse.new( + ast:, + format: :text_formatter_xml, + unknown_tags: parser.unknown_tags, + diagnostics: { + }, + ) end - # Convert s9e/TextFormatter XML to Discourse Markdown - # @param input [String] XML source in s9e/TextFormatter format - # @param handlers [Parsers::TextFormatter::HandlerRegistry, nil] custom handler registry or use default - # @param tag_library [TagLibrary, nil] custom tag library or use default - # @return [String] Markdown output - def text_formatter_xml_to_markdown(input, handlers: nil, tag_library: nil) - ast = parse_text_formatter_xml(input, handlers:) - render_to_markdown(ast, tag_library:) + # Convert s9e/TextFormatter XML to Discourse Markdown. + # + # @param input [String] XML source + # @param handlers [Parsers::TextFormatter::HandlerRegistry, nil] custom handlers + # @param renderer [Renderers::Discourse::Renderer, nil] custom renderer + # @param raise_on_error [Boolean] + # @yieldparam ast [AST::Document] mutate before rendering (optional) + # @return [Conversion] + def text_formatter_xml_to_markdown(input, handlers: nil, renderer: nil, raise_on_error: true) + parse = parse_text_formatter_xml(input, handlers:) + yield(parse.ast) if block_given? + build_conversion(parse, renderer:, raise_on_error:) end - # Parse MediaWiki wikitext to AST + # Parse MediaWiki wikitext to AST. + # # @param input [String] MediaWiki source - # @param inline_tag_registry [Parsers::MediaWiki::InlineTagRegistry, nil] custom registry - # @return [AST::Document] - def parse_mediawiki(input, inline_tag_registry: nil) + # @param handlers [Parsers::MediaWiki::InlineTagRegistry, nil] custom inline-tag registry + # @return [Parse] + def parse_mediawiki(input, handlers: nil) raise ArgumentError, "input cannot be nil" if input.nil? - input = input.to_s - parser = Parsers::MediaWiki::Parser.new(inline_tag_registry:) - parser.parse(input) + parser = Parsers::MediaWiki::Parser.new(handlers:) + ast = parser.parse(input.to_s) + + Parse.new(ast:, format: :mediawiki, unknown_tags: parser.unknown_tags, diagnostics: {}) end - # Convert MediaWiki wikitext to Discourse Markdown + # Convert MediaWiki wikitext to Discourse Markdown. + # # @param input [String] MediaWiki source - # @param inline_tag_registry [Parsers::MediaWiki::InlineTagRegistry, nil] custom registry - # @param tag_library [TagLibrary, nil] custom tag library or use default - # @return [String] Markdown output - def mediawiki_to_markdown(input, inline_tag_registry: nil, tag_library: nil) - ast = parse_mediawiki(input, inline_tag_registry:) - render_to_markdown(ast, tag_library:) + # @param handlers [Parsers::MediaWiki::InlineTagRegistry, nil] + # @param renderer [Renderers::Discourse::Renderer, nil] custom renderer + # @param raise_on_error [Boolean] + # @yieldparam ast [AST::Document] mutate before rendering (optional) + # @return [Conversion] + def mediawiki_to_markdown(input, handlers: nil, renderer: nil, raise_on_error: true) + parse = parse_mediawiki(input, handlers:) + yield(parse.ast) if block_given? + build_conversion(parse, renderer:, raise_on_error:) end - # Get default handler registry - # @return [Parsers::BBCode::HandlerRegistry] - def default_handlers - @default_handlers ||= Parsers::BBCode::HandlerRegistry.default + # Convert input in the given format. Thin dispatcher over the + # four +*_to_markdown+ methods; useful when the format is data- + # driven (e.g. iterating posts whose +:format+ column varies). + # An optional block is forwarded to the dispatched method. + # + # @param input [String] + # @param format [Symbol] one of +:bbcode+, +:html+, + # +:text_formatter_xml+, +:mediawiki+ + # @param kwargs [Hash] forwarded to the underlying convenience method + # (e.g. +handlers:+, +renderer:+, +raise_on_error:+). + # @yieldparam ast [AST::Document] mutate before rendering (optional) + # @return [Conversion] + def convert(input, format:, **kwargs, &block) + case format + when :bbcode + bbcode_to_markdown(input, **kwargs, &block) + when :html + html_to_markdown(input, **kwargs, &block) + when :text_formatter_xml + text_formatter_xml_to_markdown(input, **kwargs, &block) + when :mediawiki + mediawiki_to_markdown(input, **kwargs, &block) + else + raise ArgumentError, + "unknown format #{format.inspect} " \ + "(expected :bbcode, :html, :text_formatter_xml, or :mediawiki)" + end end - # Get default HTML handler registry - # @return [Parsers::HTML::HandlerRegistry] - def default_html_handlers - @default_html_handlers ||= Parsers::HTML::HandlerRegistry.default - end + # Render a {Parse} or a bare AST node to Discourse Markdown. + # Useful when the caller has mutated the AST between parse and + # render (e.g. appending attachments not present in the source), + # or built an AST programmatically. + # + # When given a {Parse}, the returned {Conversion} carries the + # parser's +unknown_tags+, +diagnostics+, and source +format+ + # forward. When given an AST node, those fields default to empty + # and +format+ falls back to +:discourse+. + # + # @param parse_or_ast [Parse, AST::Node] + # @param format [Symbol] :discourse (only renderer currently shipped) + # @param renderer [Renderers::Discourse::Renderer, nil] + # @param raise_on_error [Boolean] + # @return [Conversion] + def render(parse_or_ast, format: :discourse, renderer: nil, raise_on_error: true) + raise ArgumentError, "unknown render format #{format.inspect}" unless format == :discourse - # Get default tag library - # @return [Renderers::Discourse::TagLibrary] - def default_tag_library - @default_tag_library ||= Renderers::Discourse::TagLibrary.default - end + parse = + case parse_or_ast + when Parse + parse_or_ast + when AST::Node + Parse.new(ast: parse_or_ast, format: :discourse, unknown_tags: {}, diagnostics: {}) + else + raise ArgumentError, "expected Parse or AST::Node, got #{parse_or_ast.class}" + end - # Get default s9e/TextFormatter handler registry - # @return [Parsers::TextFormatter::HandlerRegistry] - def default_text_formatter_handlers - @default_text_formatter_handlers ||= Parsers::TextFormatter::HandlerRegistry.default + build_conversion(parse, renderer:, raise_on_error:) end - # Get the global configuration - # @return [Configuration] - def configuration - @configuration ||= Configuration.new - end + # Build a configured Discourse {Renderers::Discourse::Renderer} + # for use with the +renderer:+ kwarg on the +*_to_markdown+ + # convenience methods. + # + # @param tags [Hash{Class => Tag, nil}, nil] mappings to merge on + # top of the default library; +nil+ values unregister the class. + # @param tag_library [Renderers::Discourse::TagLibrary, nil] base + # library to start from. Defaults to a fresh {TagLibrary.default}. + # @param unregister [Array, nil] AST classes to drop from + # the library so they fall through to +render_children+. + # @param escaper [#escape, nil] when given, used as-is; +escape:+, + # +escape_hard_line_breaks:+, and +allow:+ are then ignored. + # @param escape [Boolean] when +false+, the renderer is built with + # {Renderers::Discourse::IdentityEscaper} (no Markdown escaping). + # Mutually exclusive with +escape_hard_line_breaks:+ / +allow:+. + # @param escape_hard_line_breaks [Boolean] forwarded to a fresh + # {MarkdownEscaper} when no explicit +escaper:+ is given. + # @param allow [Symbol, Array, nil] block-level constructs to + # pass through unescaped (e.g. +:lists+, +:bullet_list+, + # +:ordered_list+, +:atx_heading+, +:block_quote+); forwarded to a + # fresh {MarkdownEscaper}. + # @param postprocessor [Renderers::Discourse::Postprocessor, nil] + # @return [Renderers::Discourse::Renderer] + def discourse_renderer( + tags: nil, + tag_library: nil, + unregister: nil, + escaper: nil, + escape: true, + escape_hard_line_breaks: false, + allow: nil, + postprocessor: nil + ) + library = tag_library || Renderers::Discourse::TagLibrary.default + library.merge(tags) if tags + Array(unregister).each { |klass| library.unregister(klass) } - # Configure Markbridge with a block - # @yield [Configuration] - def configure - yield configuration - end + escaper ||= build_escaper(escape:, escape_hard_line_breaks:, allow:) - # Reset defaults (useful for testing) - def reset_defaults! - @default_handlers = nil - @default_html_handlers = nil - @default_tag_library = nil - @default_text_formatter_handlers = nil - @configuration = nil + Renderers::Discourse::Renderer.new(tag_library: library, escaper:, postprocessor:) end private - def parse_with(parser_class, input, handlers:) - raise ArgumentError, "input cannot be nil" if input.nil? - - parser = parser_class.new(handlers:) - parser.parse(input.to_s) + def bbcode_diagnostics(parser) + { + auto_closed_tags_count: parser.auto_closed_tags_count, + depth_exceeded_count: parser.depth_exceeded_count, + unclosed_raw_tags: parser.unclosed_raw_tags, + } end - def render_to_markdown(ast, tag_library:) - tag_library ||= default_tag_library - renderer = build_renderer(tag_library:) - cleanup_markdown(renderer.render(ast)) + def build_conversion(parse, renderer:, raise_on_error:) + renderer ||= Renderers::Discourse::Renderer.new + markdown, errors = render_through(renderer, parse.ast, raise_on_error:) + + Conversion.new( + markdown:, + ast: parse.ast, + format: parse.format, + unknown_tags: parse.unknown_tags, + diagnostics: parse.diagnostics, + errors:, + ) end - def build_renderer(tag_library:) - escaper = - Renderers::Discourse::MarkdownEscaper.new( - escape_hard_line_breaks: configuration.escape_hard_line_breaks, - ) - Renderers::Discourse::Renderer.new(tag_library:, escaper:) + def build_escaper(escape:, escape_hard_line_breaks:, allow:) + if escape == false + if escape_hard_line_breaks || allow + raise ArgumentError, + "escape: false is mutually exclusive with " \ + "escape_hard_line_breaks: / allow: (those configure " \ + "MarkdownEscaper, which escape: false replaces wholesale)" + end + Renderers::Discourse::IdentityEscaper.new + else + Renderers::Discourse::MarkdownEscaper.new(escape_hard_line_breaks:, allow:) + end end - def cleanup_markdown(text) - text - .gsub(/\n{3,}/, "\n\n") # Max 2 consecutive newlines - .gsub(/^[ \t]+$/, "") # Remove whitespace-only lines - .strip # Trim leading/trailing whitespace + def render_through(renderer, ast, raise_on_error:) + raw = renderer.render(ast) + [renderer.postprocessor.call(raw), []] + rescue StandardError => e + raise if raise_on_error + ["", [e]] end end end diff --git a/lib/markbridge/configuration.rb b/lib/markbridge/configuration.rb deleted file mode 100644 index 2224d91..0000000 --- a/lib/markbridge/configuration.rb +++ /dev/null @@ -1,11 +0,0 @@ -# frozen_string_literal: true - -module Markbridge - class Configuration - attr_accessor :escape_hard_line_breaks - - def initialize - @escape_hard_line_breaks = false - end - end -end diff --git a/lib/markbridge/conversion.rb b/lib/markbridge/conversion.rb new file mode 100644 index 0000000..2061cd4 --- /dev/null +++ b/lib/markbridge/conversion.rb @@ -0,0 +1,26 @@ +# frozen_string_literal: true + +module Markbridge + # Result of a *_to_markdown / convert / render call. + # + # @!attribute [r] markdown + # @return [String] the rendered Discourse-flavored Markdown + # @!attribute [r] ast + # @return [AST::Document] the AST that produced the markdown + # @!attribute [r] format + # @return [Symbol] :bbcode, :html, :text_formatter_xml, or :mediawiki + # @!attribute [r] unknown_tags + # @return [Hash{String => Integer}] tag-name → occurrence count + # @!attribute [r] diagnostics + # @return [Hash{Symbol => Object}] format-specific diagnostics + # @!attribute [r] errors + # @return [Array] render-time errors collected when + # +raise_on_error: false+ was passed; empty otherwise. + Conversion = + Data.define(:markdown, :ast, :format, :unknown_tags, :diagnostics, :errors) do + # Allows +puts result+ and +"text: #{result}"+ to work seamlessly. + def to_s + markdown + end + end +end diff --git a/lib/markbridge/parse.rb b/lib/markbridge/parse.rb new file mode 100644 index 0000000..82da4db --- /dev/null +++ b/lib/markbridge/parse.rb @@ -0,0 +1,18 @@ +# frozen_string_literal: true + +module Markbridge + # Result of a parse-only call (Markbridge.parse_bbcode and friends). + # + # @!attribute [r] ast + # @return [AST::Document] + # @!attribute [r] format + # @return [Symbol] :bbcode, :html, :text_formatter_xml, or :mediawiki + # @!attribute [r] unknown_tags + # @return [Hash{String => Integer}] tag-name → occurrence count. + # Empty for parsers that do not yet track unknown tags. + # @!attribute [r] diagnostics + # @return [Hash{Symbol => Object}] format-specific diagnostics. + # BBCode supplies :auto_closed_tags_count, :depth_exceeded_count, + # :unclosed_raw_tags. Other parsers supply an empty hash for now. + Parse = Data.define(:ast, :format, :unknown_tags, :diagnostics) +end diff --git a/lib/markbridge/parsers/bbcode/handler_registry.rb b/lib/markbridge/parsers/bbcode/handler_registry.rb index d543e9d..5509d98 100644 --- a/lib/markbridge/parsers/bbcode/handler_registry.rb +++ b/lib/markbridge/parsers/bbcode/handler_registry.rb @@ -37,6 +37,27 @@ def register(tag_names, handler) self end + # Replace the handler bound to one or more tag names by yielding + # the previously-bound handler (which may be +nil+) and + # registering whatever the block returns. Used to install a + # delegating handler that wraps the default. + # + # @example Wrap the default URL handler + # registry.overlay(%w[url link iurl]) do |default| + # LinkifyingUrlHandler.new(default:) + # end + # + # @param tag_names [String, Array] + # @yieldparam previous [BaseHandler, nil] previously bound handler + # @return [self] + def overlay(tag_names) + Array(tag_names).each do |name| + previous = self[name] + register(name, yield(previous)) + end + self + end + # Get handler for a tag name # @param tag_name [String] # @return [BaseHandler, nil] @@ -69,8 +90,10 @@ def close_element(token:, context:, tokens: nil) # Create the default handler registry with common BBCode tags. # # Each call returns a *fresh* instance — mutations made to one will - # not be visible to another. If you want a process-wide singleton, - # use {Markbridge.default_handlers} instead, which memoizes. + # not be visible to another. Convenience methods on +Markbridge+ + # build a fresh default registry per call when none is supplied; + # to share state across calls, build one once and pass it via + # the +handlers:+ kwarg. # # @param closing_strategy [Object, nil] optional closing strategy to apply, defaults to Reordering strategy # @return [HandlerRegistry] diff --git a/lib/markbridge/parsers/bbcode/handlers/raw_handler.rb b/lib/markbridge/parsers/bbcode/handlers/raw_handler.rb index 18841bd..5c36011 100644 --- a/lib/markbridge/parsers/bbcode/handlers/raw_handler.rb +++ b/lib/markbridge/parsers/bbcode/handlers/raw_handler.rb @@ -32,11 +32,22 @@ def on_close(token:, context:, registry:, tokens: nil) private def create_element(token:, content:) - language = token.attrs[:lang] || token.attrs[:option] - element = @element_class.new(language:) + element = + if accepts_language? + @element_class.new(language: token.attrs[:lang] || token.attrs[:option]) + else + @element_class.new + end element << AST::Text.new(content) unless content.empty? element end + + def accepts_language? + @element_class + .instance_method(:initialize) + .parameters + .any? { |_kind, name| name == :language } + end end end end diff --git a/lib/markbridge/parsers/html.rb b/lib/markbridge/parsers/html.rb index d0afcea..27394ba 100644 --- a/lib/markbridge/parsers/html.rb +++ b/lib/markbridge/parsers/html.rb @@ -10,6 +10,7 @@ # Handlers require_relative "html/handlers/base_handler" require_relative "html/handlers/simple_handler" +require_relative "html/handlers/self_closing_handler" require_relative "html/handlers/raw_handler" require_relative "html/handlers/url_handler" require_relative "html/handlers/image_handler" diff --git a/lib/markbridge/parsers/html/handler_registry.rb b/lib/markbridge/parsers/html/handler_registry.rb index 3272516..c463a11 100644 --- a/lib/markbridge/parsers/html/handler_registry.rb +++ b/lib/markbridge/parsers/html/handler_registry.rb @@ -11,15 +11,32 @@ def initialize # Register a handler for one or more tag names # @param tag_names [String, Array] tag name(s) to register - # @param handler [BaseHandler, Proc] the handler instance or proc + # @param handler [BaseHandler] the handler instance — must + # respond to +#process(element:, parent:)+ def register(tag_names, handler) Array(tag_names).each { |tag_name| @handlers[tag_name.to_s.downcase] = handler } self end + # Replace the handler bound to one or more tag names by yielding + # the previously-bound handler (which may be +nil+) and + # registering whatever the block returns. Used to install a + # delegating handler that wraps the default. + # + # @param tag_names [String, Array] + # @yieldparam previous [BaseHandler, nil] previously bound handler + # @return [self] + def overlay(tag_names) + Array(tag_names).each do |name| + previous = self[name] + register(name, yield(previous)) + end + self + end + # Get handler for a tag name # @param tag_name [String] - # @return [BaseHandler, Proc, nil] + # @return [BaseHandler, nil] def [](tag_name) @handlers[tag_name.to_s.downcase] end @@ -38,20 +55,8 @@ def self.default registry.register("a", Handlers::UrlHandler.new) registry.register("img", Handlers::ImageHandler.new) registry.register("blockquote", Handlers::QuoteHandler.new) - registry.register( - "br", - lambda do |element:, parent:| - parent << AST::LineBreak.new - nil - end, - ) - registry.register( - "hr", - lambda do |element:, parent:| - parent << AST::HorizontalRule.new - nil - end, - ) + registry.register("br", Handlers::SelfClosingHandler.new(AST::LineBreak)) + registry.register("hr", Handlers::SelfClosingHandler.new(AST::HorizontalRule)) registry.register(%w[ul ol], Handlers::ListHandler.new) registry.register("li", Handlers::ListItemHandler.new) registry.register("table", Handlers::TableHandler.new) diff --git a/lib/markbridge/parsers/html/handlers/self_closing_handler.rb b/lib/markbridge/parsers/html/handlers/self_closing_handler.rb new file mode 100644 index 0000000..f7c46a0 --- /dev/null +++ b/lib/markbridge/parsers/html/handlers/self_closing_handler.rb @@ -0,0 +1,26 @@ +# frozen_string_literal: true + +module Markbridge + module Parsers + module HTML + module Handlers + # Handler for self-closing leaf tags (br, hr, etc.). Creates + # an instance of +element_class+, appends it to +parent+, and + # returns nil so the parser does not try to recurse into + # children. + class SelfClosingHandler < BaseHandler + def initialize(element_class) + @element_class = element_class + end + + def process(element:, parent:) + parent << @element_class.new + nil + end + + attr_reader :element_class + end + end + end + end +end diff --git a/lib/markbridge/parsers/html/parser.rb b/lib/markbridge/parsers/html/parser.rb index 79457fa..51d0d44 100644 --- a/lib/markbridge/parsers/html/parser.rb +++ b/lib/markbridge/parsers/html/parser.rb @@ -85,15 +85,8 @@ def process_element_node(node, parent) handler = @handlers[tag_name] if handler - # Handler returns element if children should be processed, nil otherwise - ast_element = - if handler.respond_to?(:process) - handler.process(element: node, parent:) - else - handler.call(element: node, parent:) - end - - # Automatically process children if handler returned element + # Handler returns element if children should be processed, nil otherwise. + ast_element = handler.process(element: node, parent:) process_children(node, ast_element) if ast_element else handle_unknown_tag(node, parent) diff --git a/lib/markbridge/parsers/media_wiki/inline_parser.rb b/lib/markbridge/parsers/media_wiki/inline_parser.rb index 46408c1..0d3db73 100644 --- a/lib/markbridge/parsers/media_wiki/inline_parser.rb +++ b/lib/markbridge/parsers/media_wiki/inline_parser.rb @@ -11,13 +11,20 @@ module MediaWiki # registry = InlineTagRegistry.build_from_default do |r| # r.register("mark", :formatting, AST::Bold) # end - # parser = InlineParser.new(inline_tag_registry: registry) + # parser = InlineParser.new(handlers: registry) class InlineParser MAX_INLINE_DEPTH = 20 - def initialize(inline_tag_registry: nil, depth: 0) - @registry = inline_tag_registry || InlineTagRegistry.default + # @return [Hash{String => Integer}] tag-name → occurrence count for + # HTML-like inline tags whose names are not registered. Shared + # with nested InlineParser instances so depth-recursive parses + # contribute to the same tally. + attr_reader :unknown_tags + + def initialize(handlers: nil, depth: 0, unknown_tags: nil) + @registry = handlers || InlineTagRegistry.default @depth = depth + @unknown_tags = unknown_tags || Hash.new(0) end # Parse inline markup and append resulting AST nodes to the parent element. @@ -110,10 +117,11 @@ def parse_inner_content(content, parent:) return end - InlineParser.new(inline_tag_registry: @registry, depth: @depth + 1).parse( - content, - parent:, - ) + InlineParser.new( + handlers: @registry, + depth: @depth + 1, + unknown_tags: @unknown_tags, + ).parse(content, parent:) end # Collect text until we find n consecutive apostrophes. @@ -203,9 +211,14 @@ def parse_html_tag self_closing = !tag_match[3].empty? tag_name = tag_match[2].downcase - # Closing/self-closing tags and unknown tags are treated as literal text + # Closing/self-closing tags and unknown tags are treated as literal text. + # Track *unknown* opening tags so callers can surface them via + # Parse/Conversion#unknown_tags. We deliberately don't track + # closing/self-closing forms — they often pair up with the + # opening tag that's already counted. entry = @registry[tag_name] if closing || self_closing || !entry + @unknown_tags[tag_name] += 1 if !entry && !closing && !self_closing advance_as_text(full_match) return end diff --git a/lib/markbridge/parsers/media_wiki/parser.rb b/lib/markbridge/parsers/media_wiki/parser.rb index 91567a0..7f42ab8 100644 --- a/lib/markbridge/parsers/media_wiki/parser.rb +++ b/lib/markbridge/parsers/media_wiki/parser.rb @@ -21,13 +21,20 @@ module MediaWiki # parser = Markbridge::Parsers::MediaWiki::Parser.new # ast = parser.parse("'''bold''' and ''italic''") class Parser - # @param inline_tag_registry [InlineTagRegistry, nil] custom registry or use default + # @return [Hash{String => Integer}] tag-name → occurrence count for + # inline HTML-like tags whose names are not registered. Reset at + # the start of every #parse call. + attr_reader :unknown_tags + + # @param handlers [InlineTagRegistry, nil] custom registry or use default. + # Named +handlers:+ for consistency with sibling parsers; the + # value is still an +InlineTagRegistry+ instance. # @yield [InlineTagRegistry] optional block to customize the default registry - def initialize(inline_tag_registry: nil, &block) + def initialize(handlers: nil, &block) # InlineParser falls back to InlineTagRegistry.default when this is # nil, so we don't need to materialise it here. - @inline_tag_registry = - block_given? ? InlineTagRegistry.build_from_default(&block) : inline_tag_registry + @handlers = block_given? ? InlineTagRegistry.build_from_default(&block) : handlers + @unknown_tags = Hash.new(0) end # Parse MediaWiki wikitext into an AST Document. @@ -38,8 +45,9 @@ def parse(input) normalized = normalize_line_endings(input) lines = normalized.split("\n") + @unknown_tags.clear @document = AST::Document.new - @inline_parser = InlineParser.new(inline_tag_registry: @inline_tag_registry) + @inline_parser = InlineParser.new(handlers: @handlers, unknown_tags: @unknown_tags) @list_stack = [] process_lines(lines) diff --git a/lib/markbridge/parsers/text_formatter/handler_registry.rb b/lib/markbridge/parsers/text_formatter/handler_registry.rb index 62f9497..8f60dac 100644 --- a/lib/markbridge/parsers/text_formatter/handler_registry.rb +++ b/lib/markbridge/parsers/text_formatter/handler_registry.rb @@ -43,6 +43,28 @@ def register(element_name, handler) @mappings[element_name.upcase] = handler end + # Look up the handler for an element name (case-insensitive). + # @param element_name [String] + # @return [#process, nil] + def [](element_name) + @mappings[element_name.upcase] + end + + # Replace the handler bound to one or more element names by + # yielding the previously-bound handler (which may be +nil+) + # and registering whatever the block returns. + # + # @param element_names [String, Array] + # @yieldparam previous [#process, nil] + # @return [self] + def overlay(element_names) + Array(element_names).each do |name| + previous = self[name] + register(name, yield(previous)) + end + self + end + # Check if a handler is registered for an element # @param element_name [String] XML element name # @return [Boolean] true if handler is registered @@ -53,11 +75,12 @@ def has_handler?(element_name) # Process an XML element using the registered handler # @param element [Nokogiri::XML::Element] # @param parent [AST::Element] parent node to add children to + # @param processor [Parser] the parser, exposed to handlers so + # they can call back into +process_children+ for nested content # @return [AST::Element, nil] the created element if children should be processed, nil otherwise - def process_element(element, parent) - tag_name = element.name.upcase - handler = @mappings[tag_name] - handler&.process(element:, parent:) + def process_element(element, parent, processor) + handler = self[element.name] + handler&.process(element:, parent:, processor:) end # Register all default s9e/TextFormatter element mappings diff --git a/lib/markbridge/parsers/text_formatter/handlers/attachment_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/attachment_handler.rb index 92b7f26..9ca77e5 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/attachment_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/attachment_handler.rb @@ -10,7 +10,7 @@ def initialize @element_class = AST::Attachment end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) node = AST::Attachment.new( diff --git a/lib/markbridge/parsers/text_formatter/handlers/attribute_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/attribute_handler.rb index 3d0b1ed..85a7d19 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/attribute_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/attribute_handler.rb @@ -23,7 +23,7 @@ def initialize(element_class, attribute:, param: nil) @param = param || attribute end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) node = @element_class.new(@param => attrs[@attribute]) parent << node diff --git a/lib/markbridge/parsers/text_formatter/handlers/base_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/base_handler.rb index 471a693..c90435f 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/base_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/base_handler.rb @@ -16,7 +16,7 @@ class BaseHandler # @param element [Nokogiri::XML::Element] the XML element to process # @param parent [AST::Element] the parent AST node to add children to # @return [AST::Element, nil] the created element if children should be processed, nil otherwise - def process(element:, parent:) + def process(element:, parent:, processor: nil) raise NotImplementedError, "#{self.class} must implement #process" end diff --git a/lib/markbridge/parsers/text_formatter/handlers/code_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/code_handler.rb index be7dccc..88aff31 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/code_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/code_handler.rb @@ -10,7 +10,7 @@ def initialize @element_class = AST::Code end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) lang = attrs[:lang] || attrs[:language] node = AST::Code.new(language: lang) diff --git a/lib/markbridge/parsers/text_formatter/handlers/email_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/email_handler.rb index 08dfd63..3114743 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/email_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/email_handler.rb @@ -10,7 +10,7 @@ def initialize @element_class = AST::Email end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) node = AST::Email.new(address: attrs[:email]) parent << node diff --git a/lib/markbridge/parsers/text_formatter/handlers/image_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/image_handler.rb index 58aa06a..aa6c247 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/image_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/image_handler.rb @@ -10,7 +10,7 @@ def initialize @element_class = AST::Image end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) node = AST::Image.new( diff --git a/lib/markbridge/parsers/text_formatter/handlers/list_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/list_handler.rb index 65f2e3a..ae6009a 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/list_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/list_handler.rb @@ -10,7 +10,7 @@ def initialize @element_class = AST::List end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) type_str = attrs[:type] # Ordered if type is not empty, disc, circle, or square diff --git a/lib/markbridge/parsers/text_formatter/handlers/quote_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/quote_handler.rb index 8a98080..37879ff 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/quote_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/quote_handler.rb @@ -10,7 +10,7 @@ def initialize @element_class = AST::Quote end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) node = AST::Quote.new( diff --git a/lib/markbridge/parsers/text_formatter/handlers/simple_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/simple_handler.rb index 012df29..266fcb0 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/simple_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/simple_handler.rb @@ -21,7 +21,7 @@ def initialize(element_class) # Process the element by creating an AST node and processing children # @param element [Nokogiri::XML::Element] # @param parent [AST::Element] - def process(element:, parent:) + def process(element:, parent:, processor: nil) node = @element_class.new parent << node diff --git a/lib/markbridge/parsers/text_formatter/handlers/table_cell_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/table_cell_handler.rb index 2280327..45fabcf 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/table_cell_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/table_cell_handler.rb @@ -10,7 +10,7 @@ def initialize @element_class = AST::TableCell end - def process(element:, parent:) + def process(element:, parent:, processor: nil) node = AST::TableCell.new(header: element.name.upcase == "TH") parent << node node diff --git a/lib/markbridge/parsers/text_formatter/handlers/url_handler.rb b/lib/markbridge/parsers/text_formatter/handlers/url_handler.rb index 94f9712..6f7774f 100644 --- a/lib/markbridge/parsers/text_formatter/handlers/url_handler.rb +++ b/lib/markbridge/parsers/text_formatter/handlers/url_handler.rb @@ -12,7 +12,7 @@ def initialize @element_class = AST::Url end - def process(element:, parent:) + def process(element:, parent:, processor: nil) attrs = extract_attributes(element) node = AST::Url.new(href: attrs[:url]) parent << node diff --git a/lib/markbridge/parsers/text_formatter/parser.rb b/lib/markbridge/parsers/text_formatter/parser.rb index 486a589..25bdcf0 100644 --- a/lib/markbridge/parsers/text_formatter/parser.rb +++ b/lib/markbridge/parsers/text_formatter/parser.rb @@ -101,7 +101,7 @@ def process_element(element, ast_parent) # Process element with registered handler # Handler returns element if children should be processed, nil otherwise - result_element = @handlers.process_element(element, ast_parent) + result_element = @handlers.process_element(element, ast_parent, self) if result_element # Handler succeeded and returned element - process children into it diff --git a/lib/markbridge/renderers/discourse.rb b/lib/markbridge/renderers/discourse.rb index e49b13d..a9bedbf 100644 --- a/lib/markbridge/renderers/discourse.rb +++ b/lib/markbridge/renderers/discourse.rb @@ -5,7 +5,9 @@ require_relative "discourse/render_context" require_relative "discourse/rendering_interface" require_relative "discourse/markdown_escaper" +require_relative "discourse/identity_escaper" require_relative "discourse/html_escaper" +require_relative "discourse/postprocessor" # Builders require_relative "discourse/builders/list_item_builder" diff --git a/lib/markbridge/renderers/discourse/identity_escaper.rb b/lib/markbridge/renderers/discourse/identity_escaper.rb new file mode 100644 index 0000000..e20bade --- /dev/null +++ b/lib/markbridge/renderers/discourse/identity_escaper.rb @@ -0,0 +1,26 @@ +# frozen_string_literal: true + +module Markbridge + module Renderers + module Discourse + # Pass-through escaper. Returns its input unchanged. + # + # Useful for migration paths where the source content is already + # valid Markdown (or otherwise trusted not to need escaping) and + # should reach the postprocessor verbatim. For *partial* + # passthrough (e.g. allow lists but still escape headings), see + # {MarkdownEscaper#initialize}'s +allow:+ kwarg. + # + # @example Per-call use via the renderer factory + # renderer = Markbridge.discourse_renderer(escape: false) + # Markbridge.bbcode_to_markdown(post.body, renderer:) + class IdentityEscaper + # @param text [String, nil] + # @return [String] +text+ unchanged, or +""+ when +text+ is nil + def escape(text) + text || "" + end + end + end + end +end diff --git a/lib/markbridge/renderers/discourse/markdown_escaper.rb b/lib/markbridge/renderers/discourse/markdown_escaper.rb index 49f3b9c..9fcf195 100644 --- a/lib/markbridge/renderers/discourse/markdown_escaper.rb +++ b/lib/markbridge/renderers/discourse/markdown_escaper.rb @@ -30,12 +30,29 @@ module Discourse # escaper.escape("") # => "\\" # class MarkdownEscaper + # Block-level constructs that callers can opt into letting + # through unescaped via the +allow:+ kwarg. The check fires + # only after a line's first byte has matched the relevant + # case arm, so this is a cold-path lookup with no measurable + # hot-path cost. + ALLOW_KEYS = %i[bullet_list ordered_list atx_heading block_quote].freeze + ALLOW_ALIASES = { lists: %i[bullet_list ordered_list] }.freeze + private_constant :ALLOW_KEYS, :ALLOW_ALIASES + # @param escape_hard_line_breaks [Boolean] when true, strip trailing spaces # before newlines to prevent CommonMark hard line breaks (
). # Defaults to false because Discourse has trailing-space hard line # breaks disabled by default. - def initialize(escape_hard_line_breaks: false) + # @param allow [Symbol, Array, nil] block-level constructs + # to pass through unescaped. Recognised keys: + # +:bullet_list+, +:ordered_list+, +:atx_heading+, + # +:block_quote+. The alias +:lists+ expands to + # `[:bullet_list, :ordered_list]`. Thematic breaks, setext + # underlines, fenced code, and indented code remain escaped + # even when their first byte matches an allow-listed marker. + def initialize(escape_hard_line_breaks: false, allow: nil) @escape_hard_line_breaks = escape_hard_line_breaks + @allow = resolve_allow(allow) # @inline_content / @inline_result / @inline_len are set by # escape_inline on every call before any helper reads them; # no defensive init needed. @@ -131,6 +148,24 @@ def escape(text) private + def resolve_allow(allow) + # `flat_map` flattens Array results and appends scalar + # results as-is, so `|| key` keeps non-alias keys without + # extra wrapping. + keys = Array(allow).flat_map { |key| ALLOW_ALIASES[key] || key } + unknown = keys - ALLOW_KEYS + unless unknown.empty? + raise ArgumentError, + "unknown allow keys: #{unknown.inspect} " \ + "(expected #{ALLOW_KEYS.inspect} or alias #{ALLOW_ALIASES.keys.inspect})" + end + # Array, not Set: with at most 4 keys the linear `include?` + # is observably identical to `Set#include?` and avoids the + # Set allocation. The array isn't reachable from outside + # the escaper, so we don't bother freezing it. + keys + end + def escape_text(text) # On CRLF input, consume `\r` as part of the line terminator instead # of leaving it on the line. A trailing `\r` breaks line-end anchored @@ -224,13 +259,20 @@ def escape_block_level(content, prev_was_paragraph) case first_byte when HASH - return escape_first_char_inline(content, "\\#") if ATX_HEADING.match?(content) + if (match = ATX_HEADING.match(content)) + return pass_marker_inline(content, match[0].length) if @allow.include?(:atx_heading) + return escape_first_char_inline(content, "\\#") + end when GT + return pass_first_char_inline(content) if @allow.include?(:block_quote) return escape_first_char_inline(content, "\\>") when DASH return escape_block_dash(content, prev_was_paragraph) when PLUS - return escape_first_char_inline(content, "\\+") if BULLET_LIST.match?(content) + if BULLET_LIST.match?(content) + return pass_first_char_inline(content) if @allow.include?(:bullet_list) + return escape_first_char_inline(content, "\\+") + end when STAR return escape_block_star(content) when UNDERSCORE @@ -268,24 +310,46 @@ def escape_block_dash(content, prev_was_paragraph) (prev_was_paragraph && SETEXT_UNDERLINE_DASH.match?(content)) return escape_all_chars(content, DASH, "\\-"), true end - return escape_first_char_inline(content, "\\-") if BULLET_LIST.match?(content) + if BULLET_LIST.match?(content) + return pass_first_char_inline(content) if @allow.include?(:bullet_list) + return escape_first_char_inline(content, "\\-") + end [content, false] end def escape_block_star(content) return escape_all_chars(content, STAR, "\\*"), true if THEMATIC_BREAK_STAR.match?(content) - return escape_first_char_inline(content, "\\*") if BULLET_LIST.match?(content) + if BULLET_LIST.match?(content) + return pass_first_char_inline(content) if @allow.include?(:bullet_list) + return escape_first_char_inline(content, "\\*") + end [content, false] end def escape_block_ordered_list(content) if (match = ORDERED_LIST.match(content)) rest = content[match[0].length..] + return pass_marker_inline(content, match[0].length) if @allow.include?(:ordered_list) + return "#{match[1]}\\#{match[2]}#{escape_inline(rest)}", true end [content, false] end + # Like {#escape_first_char_inline} but the leading character is + # preserved verbatim (used when allow: lets a single-byte + # marker like `-`, `+`, `*`, or `>` through). + def pass_first_char_inline(content) + ["#{content[0]}#{escape_inline(content[1..])}", true] + end + + # Preserve a multi-byte marker (e.g. `1.`, `99)`, `##`) and + # inline-escape the rest. Used when allow: lets ordered lists + # or ATX headings through. + def pass_marker_inline(content, marker_length) + ["#{content[0, marker_length]}#{escape_inline(content[marker_length..])}", true] + end + def escape_all_chars(str, byte_val, escaped) result = String.new(capacity: str.bytesize * 2, encoding: str.encoding) str.each_byte do |byte| diff --git a/lib/markbridge/renderers/discourse/postprocessor.rb b/lib/markbridge/renderers/discourse/postprocessor.rb new file mode 100644 index 0000000..6e9fa38 --- /dev/null +++ b/lib/markbridge/renderers/discourse/postprocessor.rb @@ -0,0 +1,27 @@ +# frozen_string_literal: true + +module Markbridge + module Renderers + module Discourse + # Cleans up the raw Markdown produced by the Renderer: + # + # 1. collapses runs of 3+ newlines down to two, + # 2. clears whitespace-only lines, + # 3. trims leading/trailing whitespace from the whole document. + # + # Subclass to customize. The +call+ method is the entry point. + class Postprocessor + # @param text [String] + # @return [String] + def call(text) + text + .gsub(/\n{3,}/, "\n\n") # Max 2 consecutive newlines + .gsub(/^[ \t]+$/, "") # Remove whitespace-only lines + .strip # Trim leading/trailing whitespace + end + + DEFAULT = new + end + end + end +end diff --git a/lib/markbridge/renderers/discourse/renderer.rb b/lib/markbridge/renderers/discourse/renderer.rb index d04bbdd..0d0a251 100644 --- a/lib/markbridge/renderers/discourse/renderer.rb +++ b/lib/markbridge/renderers/discourse/renderer.rb @@ -5,13 +5,15 @@ module Renderers module Discourse # Renders AST to Discourse-flavored Markdown in-memory. class Renderer - def initialize(tag_library: nil, escaper: nil, html_escaper: nil) + attr_reader :postprocessor + + def initialize(tag_library: nil, escaper: nil, html_escaper: nil, postprocessor: nil) @tag_library = tag_library || TagLibrary.default @escaper = escaper || MarkdownEscaper.new @html_escaper = html_escaper || HtmlEscaper + @postprocessor = postprocessor || Postprocessor::DEFAULT # @interface_cache is lazily initialized in #render's top-level - # call and reset to nil after the call completes. No init - # needed here — unset ivar returns nil under `.nil?` check. + # call and reset to nil after the call completes. end # Render a node to Markdown @@ -20,7 +22,7 @@ def initialize(tag_library: nil, escaper: nil, html_escaper: nil) # @return [String] def render(node, context: RenderContext.new) root_call = @interface_cache.nil? - @interface_cache ||= {} + @interface_cache = {} if root_call tag = @tag_library[node.class] if tag diff --git a/lib/markbridge/renderers/discourse/tag_library.rb b/lib/markbridge/renderers/discourse/tag_library.rb index be33149..7f7755b 100644 --- a/lib/markbridge/renderers/discourse/tag_library.rb +++ b/lib/markbridge/renderers/discourse/tag_library.rb @@ -19,6 +19,34 @@ def register(element_class, tag) self end + # Remove a tag binding so the renderer falls through to + # +render_children+ for that element class. See + # +Renderer#render+ for the auto-passthrough path. + # + # @param element_class [Class] + # @return [self] + def unregister(element_class) + @tags.delete(element_class) + self + end + + # Merge a Hash of class → Tag mappings on top of this library. + # A +nil+ value unregisters the corresponding class (so the + # default auto-passthrough kicks in). + # + # @param mapping [Hash{Class => Tag, nil}] + # @return [self] + def merge(mapping) + mapping.each_pair do |klass, tag| + if tag.nil? + unregister(klass) + else + register(klass, tag) + end + end + self + end + # Get tag for an element class # @param element_class [Class] # @return [Tag, nil] @@ -59,8 +87,7 @@ def ast_class_for(tag_constant) # Create the default tag library for Discourse Markdown. # # Each call returns a *fresh* instance — mutations made to one will - # not be visible to another. If you want a process-wide singleton, - # use {Markbridge.default_tag_library} instead, which memoizes. + # not be visible to another. # # @return [TagLibrary] def self.default diff --git a/mutant.yml b/mutant.yml index f8be099..adff66d 100644 --- a/mutant.yml +++ b/mutant.yml @@ -334,14 +334,6 @@ mutation: - "send{receiver=send{receiver=self selector=class} selector=new}" - "index{receiver=lvar{value=new_cache}}" - # HTML::Parser#process_element_node's `handler.call(element: node, - # parent:)` is the lambda-dispatch path, used only by the `br` / - # `hr` handlers in HandlerRegistry.default. Neither lambda reads - # `element:` — both just emit a void AST node into `parent`. So - # `element: nil` is equivalent for every registered caller. - # Bucket A. - - "send{receiver=lvar{value=handler} selector=call}" - # MarkdownEscaper#block_construct?'s `when DIGIT_0..DIGIT_9` range. # Mutations `when DIGIT_0..nil` / `when nil..DIGIT_9` extend the # range to unbounded, but the body is `ORDERED_LIST.match?(content)` diff --git a/spec/markbridge_spec.rb b/spec/markbridge_spec.rb index 717e678..f1c4772 100644 --- a/spec/markbridge_spec.rb +++ b/spec/markbridge_spec.rb @@ -1,174 +1,97 @@ # frozen_string_literal: true RSpec.describe Markbridge do - after { described_class.reset_defaults! } - - fixed_output_tag = - Class.new(Markbridge::Renderers::Discourse::Tag) do - def initialize(output) - super() - @output = output - end - - def render(_element, _interface, **_kwargs) - @output - end - end - it "has a version number" do - expect(Markbridge::VERSION).not_to be nil - end - - describe ".configuration" do - it "returns a Configuration instance" do - expect(described_class.configuration).to be_a(Markbridge::Configuration) - end - - it "memoizes the configuration" do - expect(described_class.configuration).to be(described_class.configuration) - end + expect(Markbridge::VERSION).not_to be_nil end - describe ".configure" do - it "yields the configuration" do - yielded = nil - described_class.configure { |config| yielded = config } + describe ".parse_bbcode" do + it "returns a Parse with format :bbcode" do + result = described_class.parse_bbcode("[b]hi[/b]") - expect(yielded).to be(described_class.configuration) + expect(result).to be_a(Markbridge::Parse) + expect(result.format).to eq(:bbcode) end - end - describe ".reset_defaults!" do - it "resets the configuration" do - old_config = described_class.configuration - described_class.reset_defaults! - expect(described_class.configuration).not_to be(old_config) - end + it "produces an AST that reflects the input" do + result = described_class.parse_bbcode("[b]hi[/b]") - it "resets the default handlers" do - old = described_class.default_handlers - described_class.reset_defaults! - expect(described_class.default_handlers).not_to be(old) + expect(result.ast).to be_a(Markbridge::AST::Document) + expect(result.ast.children.first).to be_a(Markbridge::AST::Bold) end - it "resets the default HTML handlers" do - old = described_class.default_html_handlers - described_class.reset_defaults! - expect(described_class.default_html_handlers).not_to be(old) + it "raises ArgumentError on nil input" do + expect { described_class.parse_bbcode(nil) }.to raise_error( + ArgumentError, + /input cannot be nil/, + ) end - it "resets the default tag library" do - old = described_class.default_tag_library - described_class.reset_defaults! - expect(described_class.default_tag_library).not_to be(old) + it "coerces non-string input via to_s" do + expect(described_class.parse_bbcode(123).ast).to be_a(Markbridge::AST::Document) end - it "resets the default text formatter handlers" do - old = described_class.default_text_formatter_handlers - described_class.reset_defaults! - expect(described_class.default_text_formatter_handlers).not_to be(old) - end - end + it "uses the provided handler registry" do + registry = Markbridge::Parsers::BBCode::HandlerRegistry.new + registry.register( + "weird", + Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(Markbridge::AST::Italic), + ) - describe ".default_handlers" do - it "returns a BBCode HandlerRegistry" do - expect(described_class.default_handlers).to be_a(Markbridge::Parsers::BBCode::HandlerRegistry) - end + result = described_class.parse_bbcode("[weird]x[/weird]", handlers: registry) - it "memoizes the registry across calls" do - expect(described_class.default_handlers).to be(described_class.default_handlers) + expect(result.ast.children.first).to be_a(Markbridge::AST::Italic) end - end - describe ".default_html_handlers" do - it "returns an HTML HandlerRegistry" do - expect(described_class.default_html_handlers).to be_a( - Markbridge::Parsers::HTML::HandlerRegistry, - ) - end + it "exposes unknown_tags from the parser" do + result = described_class.parse_bbcode("[neverknown]x[/neverknown]") - it "memoizes the registry across calls" do - expect(described_class.default_html_handlers).to be(described_class.default_html_handlers) + expect(result.unknown_tags["neverknown"]).to eq(2) end - end - describe ".default_tag_library" do - it "returns a Discourse TagLibrary" do - expect(described_class.default_tag_library).to be_a( - Markbridge::Renderers::Discourse::TagLibrary, - ) - end + it "exposes BBCode diagnostics with integer counters and an unclosed-raw-tags hash" do + result = described_class.parse_bbcode("[b]hi[/b]") - it "memoizes the library across calls" do - expect(described_class.default_tag_library).to be(described_class.default_tag_library) + expect(result.diagnostics[:auto_closed_tags_count]).to eq(0) + expect(result.diagnostics[:depth_exceeded_count]).to eq(0) + expect(result.diagnostics[:unclosed_raw_tags]).to eq({}) end - end - describe ".default_text_formatter_handlers" do - it "returns a TextFormatter HandlerRegistry" do - expect(described_class.default_text_formatter_handlers).to be_a( - Markbridge::Parsers::TextFormatter::HandlerRegistry, - ) - end + it "increments auto_closed_tags_count when the parser auto-closes a tag" do + # [b][i]x[/b] forces auto-close of [i] when [/b] arrives. + result = described_class.parse_bbcode("[b][i]x[/b]") - it "memoizes the registry across calls" do - expect(described_class.default_text_formatter_handlers).to be( - described_class.default_text_formatter_handlers, - ) + expect(result.diagnostics[:auto_closed_tags_count]).to be > 0 end end - describe ".parse_bbcode" do - it "returns an AST::Document" do - expect(described_class.parse_bbcode("[b]hi[/b]")).to be_a(Markbridge::AST::Document) - end - - it "produces children that reflect the input" do - doc = described_class.parse_bbcode("[b]hi[/b]") + describe ".bbcode_to_markdown" do + it "returns a Conversion whose markdown reflects the input" do + result = described_class.bbcode_to_markdown("[b]hi[/b]") - expect(doc.children.first).to be_a(Markbridge::AST::Bold) + expect(result).to be_a(Markbridge::Conversion) + expect(result.markdown).to eq("**hi**") end - it "raises ArgumentError on nil input" do - expect { described_class.parse_bbcode(nil) }.to raise_error( - ArgumentError, - /input cannot be nil/, - ) - end + it "carries the parsed AST through to Conversion#ast" do + result = described_class.bbcode_to_markdown("[b]hi[/b]") - it "coerces non-string input via to_s" do - expect(described_class.parse_bbcode(123)).to be_a(Markbridge::AST::Document) + expect(result.ast).to be_a(Markbridge::AST::Document) + expect(result.ast.children.first).to be_a(Markbridge::AST::Bold) end - it "uses Markbridge.default_handlers when handlers not provided" do - # Register a custom tag on the shared default registry; it must be - # picked up by parse_bbcode (proves the default is reused, not re-built) - described_class.default_handlers.register( - "customtag", - Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(Markbridge::AST::Bold), - ) - - doc = described_class.parse_bbcode("[customtag]x[/customtag]") - - expect(doc.children.first).to be_a(Markbridge::AST::Bold) + it "carries the parsed format through to Conversion#format" do + expect(described_class.bbcode_to_markdown("[b]hi[/b]").format).to eq(:bbcode) end - it "uses the provided handler registry" do - registry = Markbridge::Parsers::BBCode::HandlerRegistry.new - registry.register( - "weird", - Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(Markbridge::AST::Italic), - ) - - doc = described_class.parse_bbcode("[weird]x[/weird]", handlers: registry) + it "carries parser-side diagnostics through to Conversion#diagnostics" do + result = described_class.bbcode_to_markdown("[b][i]x[/b]") - expect(doc.children.first).to be_a(Markbridge::AST::Italic) + expect(result.diagnostics[:auto_closed_tags_count]).to be > 0 end - end - describe ".bbcode_to_markdown" do - it "renders BBCode to markdown" do - expect(described_class.bbcode_to_markdown("[b]hi[/b]")).to eq("**hi**") + it "delegates to_s to markdown for string-coercion contexts" do + expect("got #{described_class.bbcode_to_markdown("[b]hi[/b]")}").to eq("got **hi**") end it "passes the provided handler registry through to the parser" do @@ -178,64 +101,66 @@ def render(_element, _interface, **_kwargs) Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(Markbridge::AST::Italic), ) - # Custom registry maps [b] to italic; markdown output uses *_* - expect(described_class.bbcode_to_markdown("[b]hi[/b]", handlers: registry)).to eq("*hi*") + result = described_class.bbcode_to_markdown("[b]hi[/b]", handlers: registry) + + expect(result.markdown).to eq("*hi*") end it "raises ArgumentError on nil input" do expect { described_class.bbcode_to_markdown(nil) }.to raise_error(ArgumentError) end - it "respects escape_hard_line_breaks configuration" do - described_class.configure { |c| c.escape_hard_line_breaks = true } + it "exposes unknown_tags on the Conversion" do + result = described_class.bbcode_to_markdown("[neverknown]x[/neverknown]") - result = described_class.bbcode_to_markdown("hello \nworld") - expect(result).to eq("hello\nworld") + expect(result.unknown_tags["neverknown"]).to eq(2) end - it "preserves trailing-space hard line breaks when escape_hard_line_breaks is false (default)" do - # Default config keeps trailing spaces; build_renderer must read .escape_hard_line_breaks, - # not the Configuration object itself (which is always truthy) - result = described_class.bbcode_to_markdown("hello \nworld") - expect(result).to eq("hello \nworld") + it "returns an empty errors array by default" do + expect(described_class.bbcode_to_markdown("[b]hi[/b]").errors).to eq([]) end - it "uses the provided tag library to render" do - library = Markbridge::Renderers::Discourse::TagLibrary.new - library.register(Markbridge::AST::Bold, fixed_output_tag.new("BOLDED")) - - expect(described_class.bbcode_to_markdown("[b]hi[/b]", tag_library: library)).to eq("BOLDED") + it "collapses three or more consecutive newlines to exactly two" do + expect(described_class.bbcode_to_markdown("a\n\n\n\nb").markdown).to eq("a\n\nb") end - it "uses Markbridge.default_tag_library when tag_library not provided" do - # Customize the shared default library; output must reflect the customization - described_class.default_tag_library.register( - Markbridge::AST::Bold, - fixed_output_tag.new("OUTPUT_FROM_CUSTOMIZED_DEFAULT"), - ) - - expect(described_class.bbcode_to_markdown("[b]hi[/b]")).to eq( - "OUTPUT_FROM_CUSTOMIZED_DEFAULT", - ) + it "removes whitespace-only lines" do + expect(described_class.bbcode_to_markdown("a\n \nb").markdown).to eq("a\n\nb") end - it "collapses three or more consecutive newlines to exactly two" do - # BBCode -> markdown -> cleanup turns runs of blank lines into a single blank line - expect(described_class.bbcode_to_markdown("a\n\n\n\nb")).to eq("a\n\nb") + it "strips leading and trailing whitespace from the final output" do + expect(described_class.bbcode_to_markdown(" hi ").markdown).to eq("hi") end - it "removes whitespace-only lines" do - expect(described_class.bbcode_to_markdown("a\n \nb")).to eq("a\n\nb") + it "lets render-time errors propagate by default (raise_on_error defaults to true)" do + tag = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + raise "boom" + end + end + renderer = described_class.discourse_renderer(tags: { Markbridge::AST::Bold => tag.new }) + + expect { described_class.bbcode_to_markdown("[b]x[/b]", renderer:) }.to raise_error(/boom/) end - it "strips leading and trailing whitespace from the final output" do - expect(described_class.bbcode_to_markdown(" hi ")).to eq("hi") + it "yields the AST to a block between parse and render so callers can mutate it" do + result = + described_class.bbcode_to_markdown("[b]hi[/b]") do |ast| + ast << Markbridge::AST::Text.new(" extra") + end + + expect(result.markdown).to eq("**hi** extra") end end describe ".parse_html" do - it "returns an AST::Document" do - expect(described_class.parse_html("hi")).to be_a(Markbridge::AST::Document) + it "returns a Parse with format :html" do + result = described_class.parse_html("hi") + + expect(result).to be_a(Markbridge::Parse) + expect(result.format).to eq(:html) + expect(result.ast).to be_a(Markbridge::AST::Document) end it "raises ArgumentError on nil input" do @@ -245,22 +170,39 @@ def render(_element, _interface, **_kwargs) ) end - it "uses Markbridge.default_html_handlers when handlers not provided" do - # Register a custom handler on the shared default registry - described_class.default_html_handlers.register( + it "uses the provided handler registry" do + registry = Markbridge::Parsers::HTML::HandlerRegistry.new + registry.register( "b", Markbridge::Parsers::HTML::Handlers::SimpleHandler.new(Markbridge::AST::Italic), ) - doc = described_class.parse_html("hi") + result = described_class.parse_html("hi", handlers: registry) - expect(doc.children.first).to be_a(Markbridge::AST::Italic) + expect(result.ast.children.first).to be_a(Markbridge::AST::Italic) + end + + it "coerces non-string input via to_s" do + expect(described_class.parse_html(123).ast).to be_a(Markbridge::AST::Document) + end + + it "exposes unknown_tags from the parser as a queryable Hash" do + result = described_class.parse_html("x") + + expect(result.unknown_tags["neverknown"]).to eq(1) + end + + it "exposes an empty diagnostics Hash" do + expect(described_class.parse_html("hi").diagnostics).to eq({}) end end describe ".html_to_markdown" do - it "renders HTML to markdown" do - expect(described_class.html_to_markdown("hi")).to eq("**hi**") + it "renders HTML to a Conversion" do + result = described_class.html_to_markdown("hi") + + expect(result).to be_a(Markbridge::Conversion) + expect(result.markdown).to eq("**hi**") end it "raises ArgumentError on nil input" do @@ -274,23 +216,41 @@ def render(_element, _interface, **_kwargs) Markbridge::Parsers::HTML::Handlers::SimpleHandler.new(Markbridge::AST::Italic), ) - # With a custom registry mapping to italic, the markdown output uses *_* - expect(described_class.html_to_markdown("hi", handlers: registry)).to eq("*hi*") + result = described_class.html_to_markdown("hi", handlers: registry) + + expect(result.markdown).to eq("*hi*") + end + + it "lets render-time errors propagate by default (raise_on_error defaults to true)" do + tag = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + raise "boom" + end + end + renderer = described_class.discourse_renderer(tags: { Markbridge::AST::Bold => tag.new }) + + expect { described_class.html_to_markdown("x", renderer:) }.to raise_error(/boom/) end - it "uses the provided tag library to render" do - library = Markbridge::Renderers::Discourse::TagLibrary.new - library.register(Markbridge::AST::Bold, fixed_output_tag.new("BOLDED")) + it "yields the AST to a block between parse and render so callers can mutate it" do + result = + described_class.html_to_markdown("hi") do |ast| + ast << Markbridge::AST::Text.new(" extra") + end - expect(described_class.html_to_markdown("hi", tag_library: library)).to eq("BOLDED") + expect(result.markdown).to eq("**hi** extra") end end describe ".parse_text_formatter_xml" do let(:xml) { "[b]hi[/b]" } - it "returns an AST::Document" do - expect(described_class.parse_text_formatter_xml(xml)).to be_a(Markbridge::AST::Document) + it "returns a Parse with format :text_formatter_xml" do + result = described_class.parse_text_formatter_xml(xml) + + expect(result).to be_a(Markbridge::Parse) + expect(result.format).to eq(:text_formatter_xml) end it "raises ArgumentError on nil input" do @@ -300,25 +260,40 @@ def render(_element, _interface, **_kwargs) ) end - it "uses Markbridge.default_text_formatter_handlers when handlers not provided" do - # Register a custom handler on the shared default registry; it must be - # picked up by parse_text_formatter_xml (proves the default is reused) - described_class.default_text_formatter_handlers.register( + it "uses the provided handler registry" do + registry = Markbridge::Parsers::TextFormatter::HandlerRegistry.new + registry.register( "B", Markbridge::Parsers::TextFormatter::Handlers::SimpleHandler.new(Markbridge::AST::Italic), ) - doc = described_class.parse_text_formatter_xml(xml) + result = described_class.parse_text_formatter_xml(xml, handlers: registry) - expect(doc.children.first).to be_a(Markbridge::AST::Italic) + expect(result.ast.children.first).to be_a(Markbridge::AST::Italic) + end + + it "coerces non-string input via to_s" do + expect(described_class.parse_text_formatter_xml(123).ast).to be_a(Markbridge::AST::Document) + end + + it "exposes unknown_tags from the parser as a queryable Hash" do + result = described_class.parse_text_formatter_xml("x") + + expect(result.unknown_tags["NEVERKNOWN"]).to eq(1) + end + + it "exposes an empty diagnostics Hash" do + expect(described_class.parse_text_formatter_xml(xml).diagnostics).to eq({}) end end describe ".text_formatter_xml_to_markdown" do let(:xml) { "[b]hi[/b]" } - it "renders TextFormatter XML to markdown" do - expect(described_class.text_formatter_xml_to_markdown(xml)).to eq("**hi**") + it "renders TextFormatter XML to a Conversion" do + result = described_class.text_formatter_xml_to_markdown(xml) + + expect(result.markdown).to eq("**hi**") end it "raises ArgumentError on nil input" do @@ -332,22 +307,41 @@ def render(_element, _interface, **_kwargs) Markbridge::Parsers::TextFormatter::Handlers::SimpleHandler.new(Markbridge::AST::Italic), ) - expect(described_class.text_formatter_xml_to_markdown(xml, handlers: registry)).to eq("*hi*") + result = described_class.text_formatter_xml_to_markdown(xml, handlers: registry) + + expect(result.markdown).to eq("*hi*") end - it "uses the provided tag library to render" do - library = Markbridge::Renderers::Discourse::TagLibrary.new - library.register(Markbridge::AST::Bold, fixed_output_tag.new("BOLDED")) + it "lets render-time errors propagate by default (raise_on_error defaults to true)" do + tag = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + raise "boom" + end + end + renderer = described_class.discourse_renderer(tags: { Markbridge::AST::Bold => tag.new }) - expect(described_class.text_formatter_xml_to_markdown(xml, tag_library: library)).to eq( - "BOLDED", + expect { described_class.text_formatter_xml_to_markdown(xml, renderer:) }.to raise_error( + /boom/, ) end + + it "yields the AST to a block between parse and render so callers can mutate it" do + result = + described_class.text_formatter_xml_to_markdown(xml) do |ast| + ast << Markbridge::AST::Text.new(" extra") + end + + expect(result.markdown).to eq("**hi** extra") + end end describe ".parse_mediawiki" do - it "returns an AST::Document" do - expect(described_class.parse_mediawiki("'''hi'''")).to be_a(Markbridge::AST::Document) + it "returns a Parse with format :mediawiki" do + result = described_class.parse_mediawiki("'''hi'''") + + expect(result).to be_a(Markbridge::Parse) + expect(result.format).to eq(:mediawiki) end it "raises ArgumentError on nil input" do @@ -358,72 +352,87 @@ def render(_element, _interface, **_kwargs) end it "coerces non-string input via to_s" do - expect(described_class.parse_mediawiki(123)).to be_a(Markbridge::AST::Document) + expect(described_class.parse_mediawiki(123).ast).to be_a(Markbridge::AST::Document) end - it "forwards the inline_tag_registry kwarg to the parser" do + it "forwards the handlers kwarg to the parser" do registry = Markbridge::Parsers::MediaWiki::InlineTagRegistry.build_from_default do |r| r.register("highlight", :formatting, Markbridge::AST::Bold) end - doc = - described_class.parse_mediawiki("x", inline_tag_registry: registry) - paragraph = doc.children.first - # Custom registry maps to Bold; default registry doesn't - # know the tag and would have left it as literal text. + result = described_class.parse_mediawiki("x", handlers: registry) + paragraph = result.ast.children.first + expect(paragraph.children.first).to be_a(Markbridge::AST::Bold) end + + it "exposes unknown HTML-like inline tags via Parse#unknown_tags" do + result = described_class.parse_mediawiki("hello world") + + expect(result.unknown_tags["neverknown"]).to eq(1) + end + + it "exposes an empty diagnostics Hash so callers can index into it without nil-checks" do + expect(described_class.parse_mediawiki("'''hi'''").diagnostics).to eq({}) + end end describe ".mediawiki_to_markdown" do - it "renders MediaWiki to markdown" do - expect(described_class.mediawiki_to_markdown("'''hi'''")).to eq("**hi**") + it "renders MediaWiki to a Conversion" do + expect(described_class.mediawiki_to_markdown("'''hi'''").markdown).to eq("**hi**") end it "raises ArgumentError on nil input" do expect { described_class.mediawiki_to_markdown(nil) }.to raise_error(ArgumentError) end - it "uses the provided tag library to render" do - library = Markbridge::Renderers::Discourse::TagLibrary.new - library.register(Markbridge::AST::Bold, fixed_output_tag.new("BOLDED")) - - expect(described_class.mediawiki_to_markdown("'''hi'''", tag_library: library)).to eq( - "BOLDED", - ) - end - - it "forwards the inline_tag_registry kwarg through to the parser" do + it "forwards the handlers kwarg through to the parser" do registry = Markbridge::Parsers::MediaWiki::InlineTagRegistry.build_from_default do |r| r.register("highlight", :formatting, Markbridge::AST::Bold) end + result = described_class.mediawiki_to_markdown("x", handlers: registry) + + expect(result.markdown).to eq("**x**") + end + + it "lets render-time errors propagate by default (raise_on_error defaults to true)" do + tag = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + raise "boom" + end + end + renderer = described_class.discourse_renderer(tags: { Markbridge::AST::Bold => tag.new }) + + expect { described_class.mediawiki_to_markdown("'''x'''", renderer:) }.to raise_error(/boom/) + end + + it "yields the AST to a block between parse and render so callers can mutate it" do result = - described_class.mediawiki_to_markdown( - "x", - inline_tag_registry: registry, - ) - # Custom registry parses as Bold, which renders to **x**. - # Without forwarding, the tag would survive as literal text. - expect(result).to eq("**x**") + described_class.mediawiki_to_markdown("'''hi'''") do |ast| + ast << Markbridge::AST::Text.new(" extra") + end + + # MediaWiki wraps the Bold in a Paragraph, so the appended Text is a + # second top-level child and a paragraph break separates them. + expect(result.markdown).to eq("**hi**\n\n extra") end end describe "cleanup behavior in *_to_markdown methods" do it "removes whitespace-only lines (preserving multiple of them)" do - # gsub vs sub: with sub only the first occurrence is replaced; gsub catches them all - expect(described_class.bbcode_to_markdown("a\n \nb\n\t\nc")).to eq("a\n\nb\n\nc") + expect(described_class.bbcode_to_markdown("a\n \nb\n\t\nc").markdown).to eq("a\n\nb\n\nc") end it "collapses every run of 3+ newlines, not just the first" do - # Two distinct runs of 3+ newlines must both be reduced - expect(described_class.bbcode_to_markdown("a\n\n\nb\n\n\nc")).to eq("a\n\nb\n\nc") + expect(described_class.bbcode_to_markdown("a\n\n\nb\n\n\nc").markdown).to eq("a\n\nb\n\nc") end it "preserves paragraph breaks (single blank line) without collapsing" do - expect(described_class.bbcode_to_markdown("a\n\nb")).to eq("a\n\nb") + expect(described_class.bbcode_to_markdown("a\n\nb").markdown).to eq("a\n\nb") end end @@ -431,12 +440,410 @@ def render(_element, _interface, **_kwargs) it "calls to_s on the input (not on Markbridge itself)" do wrapper = StringWrapper.new("'''hi'''") - doc = described_class.parse_mediawiki(wrapper) + result = described_class.parse_mediawiki(wrapper) + + expect(result.ast.children.first).to be_a(Markbridge::AST::Paragraph) + expect(result.ast.children.first.children.first).to be_a(Markbridge::AST::Bold) + end + end + + describe "raise_on_error: kwarg" do + let(:exploding_tag) do + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_element, _interface) + raise "boom" + end + end + end + + it "raises by default (raise_on_error: true)" do + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => exploding_tag.new }) + + expect { described_class.bbcode_to_markdown("[b]hi[/b]", renderer:) }.to raise_error(/boom/) + end + + it "swallows the error and surfaces it on Conversion#errors when raise_on_error: false" do + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => exploding_tag.new }) + + result = described_class.bbcode_to_markdown("[b]hi[/b]", renderer:, raise_on_error: false) + + expect(result.markdown).to eq("") + expect(result.errors.size).to eq(1) + expect(result.errors.first.message).to match(/boom/) + end + + it "returns an empty errors array when render succeeds" do + result = described_class.bbcode_to_markdown("[b]hi[/b]", raise_on_error: false) + + expect(result.errors).to eq([]) + end + end + + describe ".convert" do + it "dispatches :bbcode to bbcode_to_markdown" do + result = described_class.convert("[b]hi[/b]", format: :bbcode) + + expect(result.markdown).to eq("**hi**") + expect(result.format).to eq(:bbcode) + end + + it "dispatches :html to html_to_markdown" do + expect(described_class.convert("hi", format: :html).markdown).to eq("**hi**") + end + + it "dispatches :text_formatter_xml to text_formatter_xml_to_markdown" do + xml = "[b]hi[/b]" + + expect(described_class.convert(xml, format: :text_formatter_xml).markdown).to eq("**hi**") + end + + it "dispatches :mediawiki to mediawiki_to_markdown" do + expect(described_class.convert("'''hi'''", format: :mediawiki).markdown).to eq("**hi**") + end + + it "raises ArgumentError for unknown formats" do + expect { described_class.convert("x", format: :unknown) }.to raise_error( + ArgumentError, + /unknown format/, + ) + end + + it "forwards renderer: kwarg to the dispatched method" do + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + "B" + end + end + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) + + expect(described_class.convert("[b]x[/b]", format: :bbcode, renderer:).markdown).to eq("B") + end + + it "forwards renderer: kwarg through the :html branch" do + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + "HB" + end + end + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) + + expect(described_class.convert("x", format: :html, renderer:).markdown).to eq("HB") + end + + it "forwards renderer: kwarg through the :mediawiki branch" do + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + "MB" + end + end + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) + + expect(described_class.convert("'''x'''", format: :mediawiki, renderer:).markdown).to eq("MB") + end + + it "forwards renderer: kwarg through the :text_formatter_xml branch" do + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + "TB" + end + end + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) + + result = described_class.convert("x", format: :text_formatter_xml, renderer:) + expect(result.markdown).to eq("TB") + end + + it "forwards a block to the :bbcode branch" do + result = + described_class.convert("[b]hi[/b]", format: :bbcode) do |ast| + ast << Markbridge::AST::Text.new(" extra") + end + + expect(result.markdown).to eq("**hi** extra") + end + + it "forwards a block to the :html branch" do + result = + described_class.convert("hi", format: :html) do |ast| + ast << Markbridge::AST::Text.new(" extra") + end + + expect(result.markdown).to eq("**hi** extra") + end + + it "forwards a block to the :text_formatter_xml branch" do + result = + described_class.convert("hi", format: :text_formatter_xml) do |ast| + ast << Markbridge::AST::Text.new(" extra") + end + + expect(result.markdown).to eq("**hi** extra") + end + + it "forwards a block to the :mediawiki branch" do + result = + described_class.convert("'''hi'''", format: :mediawiki) do |ast| + ast << Markbridge::AST::Text.new(" extra") + end + + # Paragraph wrap puts the appended Text after a blank line. + expect(result.markdown).to eq("**hi**\n\n extra") + end + end + + describe ".render" do + it "renders a Document AST through the default Discourse renderer" do + doc = described_class.parse_bbcode("[b]hi[/b]").ast + + result = described_class.render(doc) + + expect(result).to be_a(Markbridge::Conversion) + expect(result.markdown).to eq("**hi**") + expect(result.format).to eq(:discourse) + end + + it "honors a custom renderer:" do + doc = described_class.parse_bbcode("[b]hi[/b]").ast + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + "BB" + end + end + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) + + expect(described_class.render(doc, renderer:).markdown).to eq("BB") + end + + it "raises ArgumentError for unknown render format with the offending format inspected" do + doc = Markbridge::AST::Document.new + expect { described_class.render(doc, format: :weird) }.to raise_error( + ArgumentError, + "unknown render format :weird", + ) + end + + it "lets render-time errors propagate by default (raise_on_error defaults to true)" do + tag = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_e, _i) + raise "boom" + end + end + renderer = described_class.discourse_renderer(tags: { Markbridge::AST::Bold => tag.new }) + doc = described_class.parse_bbcode("[b]x[/b]").ast + + expect { described_class.render(doc, renderer:) }.to raise_error(/boom/) + end + + it "carries the AST through to Conversion#ast" do + doc = described_class.parse_bbcode("[b]hi[/b]").ast + + expect(described_class.render(doc).ast).to be(doc) + end + + it "exposes empty Hashes for unknown_tags and diagnostics (no parser-side data available)" do + result = described_class.render(Markbridge::AST::Document.new) + + expect(result.unknown_tags).to eq({}) + expect(result.diagnostics).to eq({}) + end + + it "exposes an empty Array for errors when render succeeds" do + expect(described_class.render(Markbridge::AST::Document.new).errors).to eq([]) + end + + context "with a Parse argument" do + it "renders the Parse's AST" do + parse = described_class.parse_bbcode("[b]hi[/b]") + + expect(described_class.render(parse).markdown).to eq("**hi**") + end + + it "carries the Parse's source format through to Conversion#format" do + parse = described_class.parse_bbcode("[b]hi[/b]") + + expect(described_class.render(parse).format).to eq(:bbcode) + end + + it "carries the Parse's unknown_tags forward" do + parse = described_class.parse_bbcode("[neverknown]x[/neverknown]") + + expect(described_class.render(parse).unknown_tags["neverknown"]).to eq(2) + end + + it "carries the Parse's diagnostics forward" do + parse = described_class.parse_bbcode("[b][i]x[/b]") + + expect(described_class.render(parse).diagnostics[:auto_closed_tags_count]).to be > 0 + end + + it "renders mutations made to the Parse's AST after parsing" do + parse = described_class.parse_bbcode("[b]hi[/b]") + parse.ast << Markbridge::AST::Text.new(" extra") + + expect(described_class.render(parse).markdown).to eq("**hi** extra") + end + end + + context "with neither a Parse nor an AST::Node" do + it "raises ArgumentError naming the offending class" do + expect { described_class.render("a string") }.to raise_error( + ArgumentError, + /expected Parse or AST::Node, got String/, + ) + end + end + end + + describe ".discourse_renderer" do + it "returns a Renderer that converts BBCode using a custom Tag" do + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_element, _interface) + "BOLDED" + end + end + + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) + + expect(described_class.bbcode_to_markdown("[b]hi[/b]", renderer:).markdown).to eq("BOLDED") + end + + it "uses the default library when called without arguments" do + renderer = described_class.discourse_renderer + + expect(described_class.bbcode_to_markdown("[b]hi[/b]", renderer:).markdown).to eq("**hi**") + end + + it "honors unregister: by falling through to render_children" do + renderer = described_class.discourse_renderer(unregister: [Markbridge::AST::Bold]) + + # Without a Tag for AST::Bold the renderer falls through to + # render_children, so the bold marker disappears entirely. + expect(described_class.bbcode_to_markdown("[b]hi[/b]", renderer:).markdown).to eq("hi") + end + + it "honors escape_hard_line_breaks: true via the sugar" do + renderer = described_class.discourse_renderer(escape_hard_line_breaks: true) + + expect(described_class.bbcode_to_markdown("hello \nworld", renderer:).markdown).to eq( + "hello\nworld", + ) + end + + it "preserves trailing-space hard line breaks by default" do + renderer = described_class.discourse_renderer + + expect(described_class.bbcode_to_markdown("hello \nworld", renderer:).markdown).to eq( + "hello \nworld", + ) + end + + it "uses an explicit tag_library: as the base when one is provided" do + base = Markbridge::Renderers::Discourse::TagLibrary.new + base.register( + Markbridge::AST::Bold, + Markbridge::Renderers::Discourse::Tag.new { |_e, _i| "FROM-BASE" }, + ) + + renderer = described_class.discourse_renderer(tag_library: base) + + # No bold registered in the *default* library would render as "**" markers; the + # explicit base is being used (returning the literal "FROM-BASE"). + expect(described_class.bbcode_to_markdown("[b]x[/b]", renderer:).markdown).to eq("FROM-BASE") + end + + it "forwards an explicit postprocessor: through to the Renderer" do + shouting = + Class.new(Markbridge::Renderers::Discourse::Postprocessor) do + def call(text) + text.upcase + end + end + + renderer = described_class.discourse_renderer(postprocessor: shouting.new) + + expect(described_class.bbcode_to_markdown("[b]hi[/b]", renderer:).markdown).to eq("**HI**") + end + + it "forwards :allow to the constructed escaper (lists alias)" do + renderer = described_class.discourse_renderer(allow: :lists) + + # The `-` and `1.` markers would normally be escaped to `\-` + # and `1\.`. With allow: :lists they pass through verbatim. + expect(renderer.render(Markbridge::AST::Text.new("- item"))).to eq("- item") + expect(renderer.render(Markbridge::AST::Text.new("1. item"))).to eq("1. item") + end + + it "uses IdentityEscaper when escape: false" do + renderer = described_class.discourse_renderer(escape: false) + + # `*raw*` would normally be escaped to `\*raw\*`. With + # IdentityEscaper, it survives. + result = described_class.html_to_markdown("*raw*", renderer:) + expect(result.markdown).to eq("*raw*") + end + + it "raises when escape: false is combined with escape_hard_line_breaks: true" do + expect { + described_class.discourse_renderer(escape: false, escape_hard_line_breaks: true) + }.to raise_error(ArgumentError, /mutually exclusive/) + end + + it "raises when escape: false is combined with allow:" do + expect { described_class.discourse_renderer(escape: false, allow: :lists) }.to raise_error( + ArgumentError, + /mutually exclusive/, + ) + end + + it "lets an explicit escaper: win even when escape: false is given" do + explicit = Markbridge::Renderers::Discourse::MarkdownEscaper.new + renderer = described_class.discourse_renderer(escaper: explicit, escape: false) + + result = described_class.html_to_markdown("a*b", renderer:) + expect(result.markdown).to eq('a\*b') + end + end + + describe "renderer: kwarg" do + it "is honored by html_to_markdown" do + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_element, _interface) + "HBOLD" + end + end + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) + + expect(described_class.html_to_markdown("hi", renderer:).markdown).to eq("HBOLD") + end + + it "is honored by mediawiki_to_markdown" do + fixed_bold = + Class.new(Markbridge::Renderers::Discourse::Tag) do + def render(_element, _interface) + "MBOLD" + end + end + renderer = + described_class.discourse_renderer(tags: { Markbridge::AST::Bold => fixed_bold.new }) - # If `input.to_s` were replaced with `self.to_s`, the parsed document would - # contain the literal text "Markbridge" instead of a Bold inside a Paragraph - expect(doc.children.first).to be_a(Markbridge::AST::Paragraph) - expect(doc.children.first.children.first).to be_a(Markbridge::AST::Bold) + expect(described_class.mediawiki_to_markdown("'''hi'''", renderer:).markdown).to eq("MBOLD") end end diff --git a/spec/system/bbcode_to_markdown_spec.rb b/spec/system/bbcode_to_markdown_spec.rb index 99220db..617ac89 100644 --- a/spec/system/bbcode_to_markdown_spec.rb +++ b/spec/system/bbcode_to_markdown_spec.rb @@ -25,7 +25,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "handles deeply nested lists" do @@ -48,7 +48,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -75,7 +75,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -102,7 +102,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "renders unordered inside ordered" do @@ -127,7 +127,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -148,7 +148,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "handles complex nested content in lists" do @@ -167,7 +167,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end end @@ -175,34 +175,34 @@ describe "basic formatting" do it "converts bold tags" do result = Markbridge.bbcode_to_markdown("[b]bold text[/b]") - expect(result).to eq("**bold text**") + expect(result.markdown).to eq("**bold text**") end it "converts italic tags" do result = Markbridge.bbcode_to_markdown("[i]italic text[/i]") - expect(result).to eq("*italic text*") + expect(result.markdown).to eq("*italic text*") end it "converts code tags" do result = Markbridge.bbcode_to_markdown("[code]code text[/code]") - expect(result).to eq("`code text`") + expect(result.markdown).to eq("`code text`") end it "handles nested formatting" do result = Markbridge.bbcode_to_markdown("[b][i]bold italic[/i][/b]") - expect(result).to eq("***bold italic***") + expect(result.markdown).to eq("***bold italic***") end end describe "line breaks and horizontal rules" do it "converts line breaks" do result = Markbridge.bbcode_to_markdown("line 1[br]line 2") - expect(result).to eq("line 1\nline 2") + expect(result.markdown).to eq("line 1\nline 2") end it "converts horizontal rules" do result = Markbridge.bbcode_to_markdown("before[hr]after") - expect(result).to eq("before\n\n---\n\nafter") + expect(result.markdown).to eq("before\n\n---\n\nafter") end end @@ -223,7 +223,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts simple ordered list" do @@ -242,7 +242,7 @@ MARKDOWN result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -252,12 +252,12 @@ Markbridge.bbcode_to_markdown( "Plain text with [b]bold[/b] and [i]italic[/i] and [code]code[/code].", ) - expect(result).to eq("Plain text with **bold** and *italic* and `code`.") + expect(result.markdown).to eq("Plain text with **bold** and *italic* and `code`.") end it "preserves plain text" do result = Markbridge.bbcode_to_markdown("Just plain text") - expect(result).to eq("Just plain text") + expect(result.markdown).to eq("Just plain text") end end @@ -267,23 +267,23 @@ "[color=green][b]Skill Name[/b]\n[list]\n[*]Level 2: Upgrade\n[*]Level 3: Upgrade\n[/list][/color]" result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).not_to include("[/color]") - expect(result).to include("**Skill Name**") + expect(result.markdown).not_to include("[/color]") + expect(result.markdown).to include("**Skill Name**") end it "converts size wrapping a list without leaking closing tags" do bbcode = "[size=150][b]Title[/b]\n[list]\n[*]Item 1\n[*]Item 2\n[/list][/size]" result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).not_to include("[/size]") - expect(result).to include("**Title**") + expect(result.markdown).not_to include("[/size]") + expect(result.markdown).to include("**Title**") end it "converts color with bold inside list items" do bbcode = "[color=#FFBF00][b]Wolverine (X-Force)[/b][/color]" result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq('**Wolverine (X-Force)**') + expect(result.markdown).to eq('**Wolverine (X-Force)**') end it "converts nested color and list pattern from real forum data" do @@ -295,202 +295,204 @@ BBCODE result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).not_to include("[/color]") - expect(result).not_to include("[/list]") - expect(result).to include("**Godlike Power - Green 14**") + expect(result.markdown).not_to include("[/color]") + expect(result.markdown).not_to include("[/list]") + expect(result.markdown).to include("**Godlike Power - Green 14**") end end describe "urls" do it "converts url with href option" do result = Markbridge.bbcode_to_markdown("[url=https://example.com]Click here[/url]") - expect(result).to eq("[Click here](https://example.com)") + expect(result.markdown).to eq("[Click here](https://example.com)") end it "converts url with content only (no href attribute)" do result = Markbridge.bbcode_to_markdown("[url]https://example.com[/url]") - expect(result).to eq("https://example.com") + expect(result.markdown).to eq("https://example.com") end it "converts url with formatted content" do result = Markbridge.bbcode_to_markdown("[url=https://example.com][b]Bold link[/b][/url]") - expect(result).to eq("[**Bold link**](https://example.com)") + expect(result.markdown).to eq("[**Bold link**](https://example.com)") end end describe "images" do it "converts simple image" do result = Markbridge.bbcode_to_markdown("[img]https://example.com/photo.jpg[/img]") - expect(result).to eq("![](https://example.com/photo.jpg)") + expect(result.markdown).to eq("![](https://example.com/photo.jpg)") end it "converts image with dimensions" do result = Markbridge.bbcode_to_markdown("[img=100x200]https://example.com/photo.jpg[/img]") - expect(result).to eq("![|100x200](https://example.com/photo.jpg)") + expect(result.markdown).to eq("![|100x200](https://example.com/photo.jpg)") end it "converts image with width attribute" do result = Markbridge.bbcode_to_markdown("[img width=100]https://example.com/photo.jpg[/img]") - expect(result).to eq("![|100](https://example.com/photo.jpg)") + expect(result.markdown).to eq("![|100](https://example.com/photo.jpg)") end end describe "quotes" do it "converts simple quote" do result = Markbridge.bbcode_to_markdown("[quote]Hello world[/quote]") - expect(result).to eq("> Hello world") + expect(result.markdown).to eq("> Hello world") end it "converts quote with author" do result = Markbridge.bbcode_to_markdown("[quote=John]Hello world[/quote]") - expect(result).to eq("[quote=\"John\"]\nHello world\n[/quote]") + expect(result.markdown).to eq("[quote=\"John\"]\nHello world\n[/quote]") end it "converts quote with Discourse context" do result = Markbridge.bbcode_to_markdown('[quote="alice, post:123, topic:456"]Quoted text[/quote]') - expect(result).to eq("[quote=\"alice, post:123, topic:456\"]\nQuoted text\n[/quote]") + expect(result.markdown).to eq("[quote=\"alice, post:123, topic:456\"]\nQuoted text\n[/quote]") end it "separates two consecutive plain quotes with a blank line" do result = Markbridge.bbcode_to_markdown("[quote]first[/quote][quote]second[/quote]") - expect(result).to eq("> first\n\n> second") + expect(result.markdown).to eq("> first\n\n> second") end it "separates two consecutive named quotes with a blank line" do result = Markbridge.bbcode_to_markdown("[quote=A]first[/quote][quote=B]second[/quote]") - expect(result).to eq("[quote=\"A\"]\nfirst\n[/quote]\n\n[quote=\"B\"]\nsecond\n[/quote]") + expect(result.markdown).to eq( + "[quote=\"A\"]\nfirst\n[/quote]\n\n[quote=\"B\"]\nsecond\n[/quote]", + ) end it "separates a plain quote from trailing text with a blank line" do result = Markbridge.bbcode_to_markdown("[quote]quoted[/quote]after paragraph") - expect(result).to eq("> quoted\n\nafter paragraph") + expect(result.markdown).to eq("> quoted\n\nafter paragraph") end it "separates a named quote from trailing text with a blank line" do result = Markbridge.bbcode_to_markdown("[quote=A]quoted[/quote]after paragraph") - expect(result).to eq("[quote=\"A\"]\nquoted\n[/quote]\n\nafter paragraph") + expect(result.markdown).to eq("[quote=\"A\"]\nquoted\n[/quote]\n\nafter paragraph") end end describe "strikethrough" do it "converts strikethrough tags" do result = Markbridge.bbcode_to_markdown("[s]deleted text[/s]") - expect(result).to eq("~~deleted text~~") + expect(result.markdown).to eq("~~deleted text~~") end it "converts strike alias" do result = Markbridge.bbcode_to_markdown("[strike]deleted[/strike]") - expect(result).to eq("~~deleted~~") + expect(result.markdown).to eq("~~deleted~~") end end describe "underline" do it "passes underline through as BBCode (Discourse renders [u] natively)" do result = Markbridge.bbcode_to_markdown("[u]underlined[/u]") - expect(result).to eq("[u]underlined[/u]") + expect(result.markdown).to eq("[u]underlined[/u]") end end describe "superscript and subscript" do it "converts superscript to HTML" do result = Markbridge.bbcode_to_markdown("[sup]2[/sup]") - expect(result).to eq("2") + expect(result.markdown).to eq("2") end it "converts subscript to HTML" do result = Markbridge.bbcode_to_markdown("[sub]2[/sub]") - expect(result).to eq("2") + expect(result.markdown).to eq("2") end it "handles superscript in context" do result = Markbridge.bbcode_to_markdown("x[sup]2[/sup] + y[sup]3[/sup]") - expect(result).to eq("x2 \\+ y3") + expect(result.markdown).to eq("x2 \\+ y3") end end describe "spoiler" do it "converts simple spoiler" do result = Markbridge.bbcode_to_markdown("[spoiler]secret content[/spoiler]") - expect(result).to eq("[spoiler]secret content[/spoiler]") + expect(result.markdown).to eq("[spoiler]secret content[/spoiler]") end it "converts spoiler with title" do result = Markbridge.bbcode_to_markdown("[spoiler=Click to reveal]secret[/spoiler]") - expect(result).to eq("\\[spoiler=Click to reveal]secret\\[/spoiler]") + expect(result.markdown).to eq("\\[spoiler=Click to reveal]secret\\[/spoiler]") end it "converts hide alias" do result = Markbridge.bbcode_to_markdown("[hide]hidden content[/hide]") - expect(result).to eq("[spoiler]hidden content[/spoiler]") + expect(result.markdown).to eq("[spoiler]hidden content[/spoiler]") end end describe "email" do it "converts email with address option" do result = Markbridge.bbcode_to_markdown("[email=user@example.com]Contact us[/email]") - expect(result).to eq("[Contact us](mailto:user@example.com)") + expect(result.markdown).to eq("[Contact us](mailto:user@example.com)") end it "converts email with content as address" do result = Markbridge.bbcode_to_markdown("[email]user@example.com[/email]") - expect(result).to eq("user@example.com") + expect(result.markdown).to eq("user@example.com") end end describe "alignment" do it "converts center alignment" do result = Markbridge.bbcode_to_markdown("[center]centered text[/center]") - expect(result).to eq('
centered text
') + expect(result.markdown).to eq('
centered text
') end it "converts right alignment" do result = Markbridge.bbcode_to_markdown("[right]right-aligned[/right]") - expect(result).to eq('
right-aligned
') + expect(result.markdown).to eq('
right-aligned
') end it "separates two consecutive aligned blocks with a blank line" do result = Markbridge.bbcode_to_markdown("[left]a[/left][right]b[/right]") - expect(result).to eq(%(
a
\n\n
b
)) + expect(result.markdown).to eq(%(
a
\n\n
b
)) end it "separates an aligned block from trailing text with a blank line" do result = Markbridge.bbcode_to_markdown("[center]a[/center]after") - expect(result).to eq(%(
a
\n\nafter)) + expect(result.markdown).to eq(%(
a
\n\nafter)) end end describe "block code separation" do it "separates two consecutive block code fences with a blank line" do result = Markbridge.bbcode_to_markdown("[code]line1\nline2[/code][code]line3\nline4[/code]") - expect(result).to eq("```\nline1\nline2\n```\n\n```\nline3\nline4\n```") + expect(result.markdown).to eq("```\nline1\nline2\n```\n\n```\nline3\nline4\n```") end it "separates a block code fence from trailing text with a blank line" do result = Markbridge.bbcode_to_markdown("[code]line1\nline2[/code]after") - expect(result).to eq("```\nline1\nline2\n```\n\nafter") + expect(result.markdown).to eq("```\nline1\nline2\n```\n\nafter") end end describe "edge cases" do it "drops unknown tag brackets but preserves content" do result = Markbridge.bbcode_to_markdown("[unknown]some text[/unknown]") - expect(result).to eq("some text") + expect(result.markdown).to eq("some text") end it "handles empty input" do result = Markbridge.bbcode_to_markdown("") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "handles deeply nested formatting" do result = Markbridge.bbcode_to_markdown("[b][i][u]deep[/u][/i][/b]") - expect(result).to eq("***[u]deep[/u]***") + expect(result.markdown).to eq("***[u]deep[/u]***") end it "handles unclosed tags gracefully" do result = Markbridge.bbcode_to_markdown("[b]bold text") - expect(result).to eq("**bold text**") + expect(result.markdown).to eq("**bold text**") end it "inserts an HTML comment to break colliding emphasis delimiters between siblings" do @@ -499,39 +501,39 @@ # ambiguously in CommonMark. result = Markbridge.bbcode_to_markdown("[b]bold [i]italic [u]underline[/b] still here[/i][/u]") - expect(result).to eq("**bold *italic [u]underline[/u]****[u] still here[/u]*") + expect(result.markdown).to eq("**bold *italic [u]underline[/u]****[u] still here[/u]*") end end describe "attachments" do it "converts attachment with numeric id (vBulletin/XenForo format)" do result = Markbridge.bbcode_to_markdown("[attach]1234[/attach]") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "converts attachment with index and filename (phpBB format)" do result = Markbridge.bbcode_to_markdown("[attachment=0]image.jpg[/attachment]") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "converts attachment with index only (phpBB format)" do result = Markbridge.bbcode_to_markdown("[attachment=2][/attachment]") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "converts attachment with id and alt text (XenForo 2.1+ format)" do result = Markbridge.bbcode_to_markdown('[attach alt="diagram"]5678[/attach]') - expect(result).to eq("") + expect(result.markdown).to eq("") end it "converts self-closing attachment with SMF format" do result = Markbridge.bbcode_to_markdown("[attach id=2 msg=9876]") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "converts attachment with filename only" do result = Markbridge.bbcode_to_markdown("[attach]document.pdf[/attach]") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "handles attachment in context with text" do @@ -540,7 +542,7 @@ "Check out this image: for details." result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "handles multiple attachments" do @@ -548,7 +550,7 @@ expected = " and " result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "handles attachments in formatted text" do @@ -556,7 +558,7 @@ expected = "**Bold text with inside**" result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -566,7 +568,7 @@ result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") + expect(result.markdown).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") end it "renders a table without headers using first row as header" do @@ -574,7 +576,7 @@ result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to eq("| A | B |\n| --- | --- |\n| 1 | 2 |") + expect(result.markdown).to eq("| A | B |\n| --- | --- |\n| 1 | 2 |") end it "renders formatted content inside table cells" do @@ -582,7 +584,7 @@ result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to include("| **Alice** |") + expect(result.markdown).to include("| **Alice** |") end it "falls back to HTML for uneven rows" do @@ -590,8 +592,8 @@ result = Markbridge.bbcode_to_markdown(bbcode) - expect(result).to include("") - expect(result).to include("") + expect(result.markdown).to include("
A
") + expect(result.markdown).to include("") end end end diff --git a/spec/system/html_to_markdown_spec.rb b/spec/system/html_to_markdown_spec.rb index 77c16f3..c1ec932 100644 --- a/spec/system/html_to_markdown_spec.rb +++ b/spec/system/html_to_markdown_spec.rb @@ -7,7 +7,7 @@ expected = "**bold text**" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts italic text" do @@ -15,7 +15,7 @@ expected = "*italic text*" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts strikethrough text" do @@ -23,7 +23,7 @@ expected = "~~deleted text~~" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts nested formatting" do @@ -31,7 +31,7 @@ expected = "***bold italic***" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -41,7 +41,7 @@ expected = "[Click here](https://example.com)" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts link with no text" do @@ -49,7 +49,7 @@ expected = "[](https://example.com)" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -59,7 +59,7 @@ expected = "![](https://example.com/photo.jpg)" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts image with dimensions" do @@ -67,7 +67,7 @@ expected = "![|100x200](photo.jpg)" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -88,7 +88,7 @@ MARKDOWN result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts ordered list" do @@ -107,7 +107,7 @@ MARKDOWN result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts nested lists" do @@ -133,7 +133,7 @@ MARKDOWN result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -143,7 +143,7 @@ expected = "`var x = 1;`" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts code block" do @@ -151,7 +151,7 @@ expected = "```\nfunction hello() {\n console.log('hi');\n}\n```" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -161,7 +161,7 @@ expected = "> Quoted text" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts blockquote with multiple paragraphs" do @@ -169,7 +169,7 @@ expected = "> First paragraph\n> \n> Second paragraph" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -179,7 +179,7 @@ expected = "Line 1\nLine 2" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts horizontal rule" do @@ -187,7 +187,7 @@ expected = "Text\n\n---\n\nMore text" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -209,48 +209,48 @@ MARKDOWN result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end describe "underline" do it "converts underline to [u] BBCode" do result = Markbridge.html_to_markdown("underlined text") - expect(result).to eq("[u]underlined text[/u]") + expect(result.markdown).to eq("[u]underlined text[/u]") end end describe "superscript and subscript" do it "converts superscript" do result = Markbridge.html_to_markdown("2") - expect(result).to eq("2") + expect(result.markdown).to eq("2") end it "converts subscript" do result = Markbridge.html_to_markdown("2") - expect(result).to eq("2") + expect(result.markdown).to eq("2") end it "handles inline superscript" do result = Markbridge.html_to_markdown("x2 + y3") - expect(result).to eq("x2 \\+ y3") + expect(result.markdown).to eq("x2 \\+ y3") end end describe "edge cases" do it "handles empty input" do result = Markbridge.html_to_markdown("") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "preserves plain text" do result = Markbridge.html_to_markdown("Just plain text") - expect(result).to eq("Just plain text") + expect(result.markdown).to eq("Just plain text") end it "handles deeply nested formatting" do result = Markbridge.html_to_markdown("deep") - expect(result).to eq("***[u]deep[/u]***") + expect(result.markdown).to eq("***[u]deep[/u]***") end end @@ -260,7 +260,7 @@ expected = "**strong text**" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "converts em to italic" do @@ -268,7 +268,7 @@ expected = "*emphasized text*" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -278,7 +278,7 @@ expected = "One\n\nTwo" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "handles multiple paragraphs" do @@ -286,7 +286,7 @@ expected = "First\n\nSecond\n\nThird" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "handles paragraphs from minified HTML without whitespace" do @@ -294,7 +294,7 @@ expected = "Paragraph one\n\nParagraph two" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end it "handles paragraphs with formatted content" do @@ -302,7 +302,7 @@ expected = "**Bold** text\n\n*Italic* text" result = Markbridge.html_to_markdown(html) - expect(result).to eq(expected) + expect(result.markdown).to eq(expected) end end @@ -312,7 +312,7 @@ result = Markbridge.html_to_markdown(html) - expect(result).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") + expect(result.markdown).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") end it "handles thead and tbody" do @@ -321,7 +321,7 @@ result = Markbridge.html_to_markdown(html) - expect(result).to eq("| A | B |\n| --- | --- |\n| 1 | 2 |") + expect(result.markdown).to eq("| A | B |\n| --- | --- |\n| 1 | 2 |") end it "falls back to HTML for uneven rows" do @@ -329,7 +329,7 @@ result = Markbridge.html_to_markdown(html) - expect(result).to include("
A
") + expect(result.markdown).to include("
") end it "drops the spurious

wrapper when a single paragraph fills a cell" do @@ -341,8 +341,8 @@ result = Markbridge.html_to_markdown(html) - expect(result).to include("

") - expect(result).not_to include("") + expect(result.markdown).not_to include("
line one
line two

") + expect(result.markdown).to include("

line one
line two

") end end end diff --git a/spec/system/mediawiki_to_markdown_spec.rb b/spec/system/mediawiki_to_markdown_spec.rb index dcc1614..e17a6e2 100644 --- a/spec/system/mediawiki_to_markdown_spec.rb +++ b/spec/system/mediawiki_to_markdown_spec.rb @@ -4,91 +4,91 @@ describe "inline formatting" do it "converts bold text" do result = Markbridge.mediawiki_to_markdown("'''bold text'''") - expect(result).to eq("**bold text**") + expect(result.markdown).to eq("**bold text**") end it "converts italic text" do result = Markbridge.mediawiki_to_markdown("''italic text''") - expect(result).to eq("*italic text*") + expect(result.markdown).to eq("*italic text*") end it "converts bold italic text" do result = Markbridge.mediawiki_to_markdown("'''''bold italic'''''") - expect(result).to eq("***bold italic***") + expect(result.markdown).to eq("***bold italic***") end it "converts strikethrough with " do result = Markbridge.mediawiki_to_markdown("deleted") - expect(result).to eq("~~deleted~~") + expect(result.markdown).to eq("~~deleted~~") end it "converts strikethrough with " do result = Markbridge.mediawiki_to_markdown("deleted") - expect(result).to eq("~~deleted~~") + expect(result.markdown).to eq("~~deleted~~") end it "converts underline with " do result = Markbridge.mediawiki_to_markdown("underlined") - expect(result).to eq("[u]underlined[/u]") + expect(result.markdown).to eq("[u]underlined[/u]") end it "converts underline with " do result = Markbridge.mediawiki_to_markdown("inserted") - expect(result).to eq("[u]inserted[/u]") + expect(result.markdown).to eq("[u]inserted[/u]") end it "converts superscript" do result = Markbridge.mediawiki_to_markdown("x2") - expect(result).to eq("x2") + expect(result.markdown).to eq("x2") end it "converts subscript" do result = Markbridge.mediawiki_to_markdown("H2O") - expect(result).to eq("H2O") + expect(result.markdown).to eq("H2O") end it "converts inline code" do result = Markbridge.mediawiki_to_markdown("var x = 1") - expect(result).to eq("`var x = 1`") + expect(result.markdown).to eq("`var x = 1`") end it "converts line break" do result = Markbridge.mediawiki_to_markdown("Line 1
Line 2") - expect(result).to eq("Line 1\nLine 2") + expect(result.markdown).to eq("Line 1\nLine 2") end it "converts self-closing line break" do result = Markbridge.mediawiki_to_markdown("Line 1
Line 2") - expect(result).to eq("Line 1\nLine 2") + expect(result.markdown).to eq("Line 1\nLine 2") end end describe "nowiki" do it "preserves wiki markup as literal text" do result = Markbridge.mediawiki_to_markdown("'''not bold'''") - expect(result).to eq("'''not bold'''") + expect(result.markdown).to eq("'''not bold'''") end end describe "links" do it "converts external link with display text" do result = Markbridge.mediawiki_to_markdown("[https://example.com Click here]") - expect(result).to eq("[Click here](https://example.com)") + expect(result.markdown).to eq("[Click here](https://example.com)") end it "converts external link without display text" do result = Markbridge.mediawiki_to_markdown("[https://example.com]") - expect(result).to eq("[https://example.com](https://example.com)") + expect(result.markdown).to eq("[https://example.com](https://example.com)") end it "converts internal link" do result = Markbridge.mediawiki_to_markdown("[[Page Name]]") - expect(result).to eq("Page Name") + expect(result.markdown).to eq("Page Name") end it "converts internal link with display text" do result = Markbridge.mediawiki_to_markdown("[[Page Name|display text]]") - expect(result).to eq("display text") + expect(result.markdown).to eq("display text") end end @@ -97,51 +97,51 @@ wiki = "* Item 1\n* Item 2\n* Item 3" result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("- Item 1\n- Item 2\n- Item 3") + expect(result.markdown).to eq("- Item 1\n- Item 2\n- Item 3") end it "converts ordered list" do wiki = "# First\n# Second\n# Third" result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("1. First\n1. Second\n1. Third") + expect(result.markdown).to eq("1. First\n1. Second\n1. Third") end it "converts nested unordered list" do wiki = "* Item 1\n** Subitem 1.1\n** Subitem 1.2\n* Item 2" result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to include("- Item 1") - expect(result).to include("- Subitem 1.1") - expect(result).to include("- Item 2") + expect(result.markdown).to include("- Item 1") + expect(result.markdown).to include("- Subitem 1.1") + expect(result.markdown).to include("- Item 2") end it "converts nested ordered list" do wiki = "# Item 1\n## Subitem 1.1\n## Subitem 1.2\n# Item 2" result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to include("1. Item 1") - expect(result).to include("1. Subitem 1.1") - expect(result).to include("1. Item 2") + expect(result.markdown).to include("1. Item 1") + expect(result.markdown).to include("1. Subitem 1.1") + expect(result.markdown).to include("1. Item 2") end it "converts list items with formatting" do wiki = "* '''Important''' item\n* Normal item" result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("- **Important** item\n- Normal item") + expect(result.markdown).to eq("- **Important** item\n- Normal item") end end describe "horizontal rules" do it "converts ---- to horizontal rule" do result = Markbridge.mediawiki_to_markdown("----") - expect(result).to eq("---") + expect(result.markdown).to eq("---") end it "converts longer dashes to horizontal rule" do result = Markbridge.mediawiki_to_markdown("------") - expect(result).to eq("---") + expect(result.markdown).to eq("---") end end @@ -150,48 +150,48 @@ wiki = " preformatted line 1\n preformatted line 2" result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("```\npreformatted line 1\npreformatted line 2\n```") + expect(result.markdown).to eq("```\npreformatted line 1\npreformatted line 2\n```") end it "converts

 block to code block" do
       wiki = "
code block\nline 2
" result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("```\ncode block\nline 2\n```") + expect(result.markdown).to eq("```\ncode block\nline 2\n```") end end describe "headings" do it "converts level 1 heading" do result = Markbridge.mediawiki_to_markdown("= Heading 1 =") - expect(result).to eq("# Heading 1") + expect(result.markdown).to eq("# Heading 1") end it "converts level 2 heading" do result = Markbridge.mediawiki_to_markdown("== Heading 2 ==") - expect(result).to eq("## Heading 2") + expect(result.markdown).to eq("## Heading 2") end it "converts level 3 heading" do result = Markbridge.mediawiki_to_markdown("=== Heading 3 ===") - expect(result).to eq("### Heading 3") + expect(result.markdown).to eq("### Heading 3") end it "converts heading with inline formatting" do result = Markbridge.mediawiki_to_markdown("== '''Bold''' heading ==") - expect(result).to eq("## **Bold** heading") + expect(result.markdown).to eq("## **Bold** heading") end end describe "edge cases" do it "handles empty input" do result = Markbridge.mediawiki_to_markdown("") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "preserves plain text" do result = Markbridge.mediawiki_to_markdown("Just plain text") - expect(result).to eq("Just plain text") + expect(result.markdown).to eq("Just plain text") end end @@ -207,7 +207,7 @@ result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") + expect(result.markdown).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") end it "handles header and data rows" do @@ -224,7 +224,7 @@ result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("| A | B |\n| --- | --- |\n| 1 | 2 |") + expect(result.markdown).to eq("| A | B |\n| --- | --- |\n| 1 | 2 |") end it "handles inline formatting in cells" do @@ -238,7 +238,7 @@ result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to include("| **Alice** |") + expect(result.markdown).to include("| **Alice** |") end it "preserves pipes inside internal links within cells" do @@ -252,7 +252,7 @@ result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("| Page | Status |\n| --- | --- |\n| Home | Ready |") + expect(result.markdown).to eq("| Page | Status |\n| --- | --- |\n| Home | Ready |") end it "preserves pipes inside internal links on per-line cells" do @@ -265,7 +265,7 @@ result = Markbridge.mediawiki_to_markdown(wiki) - expect(result).to eq("| Home |\n| --- |") + expect(result.markdown).to eq("| Home |\n| --- |") end end end diff --git a/spec/system/text_formatter_xml_to_markdown_spec.rb b/spec/system/text_formatter_xml_to_markdown_spec.rb index bf67c94..0b203d9 100644 --- a/spec/system/text_formatter_xml_to_markdown_spec.rb +++ b/spec/system/text_formatter_xml_to_markdown_spec.rb @@ -7,7 +7,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("Hello **world**!\n[example](https://example.com)") + expect(result.markdown).to eq("Hello **world**!\n[example](https://example.com)") end it "renders Discourse quote markup when attribution is present" do @@ -15,7 +15,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("[quote=\"alice, post:123, topic:456\"]\nQuoted text\n[/quote]") + expect(result.markdown).to eq("[quote=\"alice, post:123, topic:456\"]\nQuoted text\n[/quote]") end it "renders ordered lists with proper spacing" do @@ -23,7 +23,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("1. First\n1. Second") + expect(result.markdown).to eq("1. First\n1. Second") end it "renders multi-line code blocks with fences" do @@ -31,22 +31,22 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("```ruby\nputs 'hello'\nputs 'world'\n```") + expect(result.markdown).to eq("```ruby\nputs 'hello'\nputs 'world'\n```") end it "converts italic text" do result = Markbridge.text_formatter_xml_to_markdown("italic") - expect(result).to eq("*italic*") + expect(result.markdown).to eq("*italic*") end it "converts underline text" do result = Markbridge.text_formatter_xml_to_markdown("underlined") - expect(result).to eq("[u]underlined[/u]") + expect(result.markdown).to eq("[u]underlined[/u]") end it "converts strikethrough text" do result = Markbridge.text_formatter_xml_to_markdown("deleted") - expect(result).to eq("~~deleted~~") + expect(result.markdown).to eq("~~deleted~~") end it "converts unordered lists" do @@ -54,7 +54,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("- One\n- Two") + expect(result.markdown).to eq("- One\n- Two") end it "converts images" do @@ -62,7 +62,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("![](https://example.com/photo.jpg)") + expect(result.markdown).to eq("![](https://example.com/photo.jpg)") end it "converts email links" do @@ -70,7 +70,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("[Contact us](mailto:user@example.com)") + expect(result.markdown).to eq("[Contact us](mailto:user@example.com)") end it "converts inline code" do @@ -78,7 +78,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("`var x = 1`") + expect(result.markdown).to eq("`var x = 1`") end it "converts nested formatting" do @@ -86,17 +86,17 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("***bold italic***") + expect(result.markdown).to eq("***bold italic***") end it "handles plain text without XML wrapper" do result = Markbridge.text_formatter_xml_to_markdown("Just plain text") - expect(result).to eq("Just plain text") + expect(result.markdown).to eq("Just plain text") end it "handles empty input" do result = Markbridge.text_formatter_xml_to_markdown("") - expect(result).to eq("") + expect(result.markdown).to eq("") end it "converts spoiler tags" do @@ -104,7 +104,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("[spoiler=Reveal]hidden content[/spoiler]") + expect(result.markdown).to eq("[spoiler=Reveal]hidden content[/spoiler]") end it "converts color tags" do @@ -112,7 +112,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq('red text') + expect(result.markdown).to eq('red text') end it "converts size tags" do @@ -120,7 +120,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq('big text') + expect(result.markdown).to eq('big text') end end @@ -131,7 +131,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") + expect(result.markdown).to eq("| Name | Age |\n| --- | --- |\n| Alice | 30 |") end it "falls back to HTML for uneven rows" do @@ -139,7 +139,7 @@ result = Markbridge.text_formatter_xml_to_markdown(xml) - expect(result).to include("") + expect(result.markdown).to include("
") end end end diff --git a/spec/unit/markbridge/configuration_spec.rb b/spec/unit/markbridge/configuration_spec.rb deleted file mode 100644 index 8a408cc..0000000 --- a/spec/unit/markbridge/configuration_spec.rb +++ /dev/null @@ -1,16 +0,0 @@ -# frozen_string_literal: true - -RSpec.describe Markbridge::Configuration do - subject(:configuration) { described_class.new } - - describe "#escape_hard_line_breaks" do - it "defaults to false" do - expect(configuration.escape_hard_line_breaks).to be false - end - - it "can be set to true" do - configuration.escape_hard_line_breaks = true - expect(configuration.escape_hard_line_breaks).to be true - end - end -end diff --git a/spec/unit/markbridge/parsers/bbcode/handler_registry_spec.rb b/spec/unit/markbridge/parsers/bbcode/handler_registry_spec.rb index e05dcb0..33a1236 100644 --- a/spec/unit/markbridge/parsers/bbcode/handler_registry_spec.rb +++ b/spec/unit/markbridge/parsers/bbcode/handler_registry_spec.rb @@ -460,4 +460,66 @@ def fake_handler(element_class: Markbridge::AST::Bold, auto_closeable: false) expect(reconciler.instance_variable_get(:@registry)).to eq(registry) end end + + describe "#overlay" do + let(:registry) { described_class.default } + + it "yields the previously bound handler so a wrapper can delegate to it" do + seen = nil + registry.overlay("url") do |previous| + seen = previous + previous + end + + expect(seen).to be_a(Markbridge::Parsers::BBCode::Handlers::UrlHandler) + end + + it "yields nil when no handler was previously bound" do + seen = :unset + registry.overlay("brand-new-tag") do |previous| + seen = previous + Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(Markbridge::AST::Bold) + end + + expect(seen).to be_nil + end + + it "registers whatever the block returns" do + replacement = + Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(Markbridge::AST::Italic) + + registry.overlay("url") { |_| replacement } + + expect(registry["url"]).to be(replacement) + end + + it "applies to every tag name in the array" do + replacement = + Markbridge::Parsers::BBCode::Handlers::SimpleHandler.new(Markbridge::AST::Italic) + + registry.overlay(%w[url link iurl]) { |_| replacement } + + expect(registry["url"]).to be(replacement) + expect(registry["link"]).to be(replacement) + expect(registry["iurl"]).to be(replacement) + end + + it "yields each name's previously-bound handler when called with an Array" do + yielded = [] + registry.overlay(%w[url link iurl]) do |previous| + yielded << previous + previous + end + + expect(yielded.size).to eq(3) + yielded.each do |handler| + expect(handler).to be_a(Markbridge::Parsers::BBCode::Handlers::UrlHandler) + end + end + + it "returns self for chaining" do + result = registry.overlay("url") { |p| p } + expect(result).to be(registry) + end + end end diff --git a/spec/unit/markbridge/parsers/bbcode/handlers/raw_handler_spec.rb b/spec/unit/markbridge/parsers/bbcode/handlers/raw_handler_spec.rb index 4011942..3086340 100644 --- a/spec/unit/markbridge/parsers/bbcode/handlers/raw_handler_spec.rb +++ b/spec/unit/markbridge/parsers/bbcode/handlers/raw_handler_spec.rb @@ -250,4 +250,79 @@ def next_token expect { handler.on_close(token:, context:, registry:) }.not_to raise_error end end + + describe "with an AST class that does not accept language:" do + let(:bare_class) do + Class.new(Markbridge::AST::Element) do + def self.name + "BareElement" + end + end + end + + let(:bare_handler) { described_class.new(bare_class) } + + it "instantiates the AST class without passing language:" do + document = Markbridge::AST::Document.new + context = Markbridge::Parsers::BBCode::ParserState.new(document) + registry = Markbridge::Parsers::BBCode::HandlerRegistry.new + + open_token = + Markbridge::Parsers::BBCode::TagStartToken.new( + tag: "bare", + # lang: "ruby" attr exists but the AST class doesn't accept + # language:, so the handler must not forward it. + attrs: { + lang: "ruby", + }, + pos: 0, + source: "[bare lang=ruby]", + ) + close_token = + Markbridge::Parsers::BBCode::TagEndToken.new(tag: "bare", pos: 6, source: "[/bare]") + scanner = MockScanner.new([close_token]) + + expect { + bare_handler.on_open(token: open_token, context:, registry:, tokens: scanner) + }.not_to raise_error + + expect(document.children.first).to be_an_instance_of(bare_class) + end + end + + describe "with an AST class that takes a non-:language kwarg" do + let(:other_class) do + Class.new(Markbridge::AST::Element) do + def initialize(other: nil) + super() + @other = other + end + end + end + + it "does not pass the lang attr through (the AST class would raise on unknown :language)" do + handler = described_class.new(other_class) + document = Markbridge::AST::Document.new + context = Markbridge::Parsers::BBCode::ParserState.new(document) + registry = Markbridge::Parsers::BBCode::HandlerRegistry.new + + open_token = + Markbridge::Parsers::BBCode::TagStartToken.new( + tag: "x", + attrs: { + lang: "ruby", + }, + pos: 0, + source: "[x lang=ruby]", + ) + close_token = Markbridge::Parsers::BBCode::TagEndToken.new(tag: "x", pos: 0, source: "[/x]") + scanner = MockScanner.new([close_token]) + + expect { + handler.on_open(token: open_token, context:, registry:, tokens: scanner) + }.not_to raise_error + + expect(document.children.first).to be_an_instance_of(other_class) + end + end end diff --git a/spec/unit/markbridge/parsers/html/handler_registry_spec.rb b/spec/unit/markbridge/parsers/html/handler_registry_spec.rb index 15a23bc..f7a322c 100644 --- a/spec/unit/markbridge/parsers/html/handler_registry_spec.rb +++ b/spec/unit/markbridge/parsers/html/handler_registry_spec.rb @@ -124,25 +124,21 @@ expect(registered).to be_a(Markbridge::Parsers::HTML::Handlers::SpanHandler) end - # br and hr are inline lambdas, not handler instances - it "registers a lambda for
that emits a LineBreak and returns nil" do + it "registers a SelfClosingHandler for
that emits a LineBreak and returns nil" do parent = Markbridge::AST::Paragraph.new - result = default_registry["br"].call(element: nil, parent:) + result = default_registry["br"].process(element: nil, parent:) - # Assert exactly one child of the right type — `all(be_a(...))` - # passes vacuously on empty arrays, so mutations that drop the - # `parent << AST::LineBreak.new` would slip through. expect(parent.children.size).to eq(1) expect(parent.children.first).to be_a(Markbridge::AST::LineBreak) - # Not a HorizontalRule — kills cross-lambda `.new` swaps. + # Not a HorizontalRule — kills cross-handler element_class swaps. expect(parent.children.first).not_to be_a(Markbridge::AST::HorizontalRule) # Returns nil so the parser does NOT descend into children. expect(result).to be_nil end - it "registers a lambda for
that emits a HorizontalRule and returns nil" do + it "registers a SelfClosingHandler for
that emits a HorizontalRule and returns nil" do parent = Markbridge::AST::Paragraph.new - result = default_registry["hr"].call(element: nil, parent:) + result = default_registry["hr"].process(element: nil, parent:) expect(parent.children.size).to eq(1) expect(parent.children.first).to be_a(Markbridge::AST::HorizontalRule) @@ -173,4 +169,52 @@ expect(registry["b"]).to be_a(Markbridge::Parsers::HTML::Handlers::SimpleHandler) end end + + describe "#overlay" do + let(:registry) { described_class.default } + + it "yields the previously bound handler" do + seen = nil + registry.overlay("a") { |p| seen = p } + expect(seen).to be_a(Markbridge::Parsers::HTML::Handlers::UrlHandler) + end + + it "yields nil for unbound names" do + seen = :unset + registry.overlay("never-seen") do |p| + seen = p + Markbridge::Parsers::HTML::Handlers::SimpleHandler.new(Markbridge::AST::Bold) + end + expect(seen).to be_nil + end + + it "registers whatever the block returns" do + replacement = Markbridge::Parsers::HTML::Handlers::SimpleHandler.new(Markbridge::AST::Italic) + registry.overlay("a") { |_| replacement } + expect(registry["a"]).to be(replacement) + end + + it "iterates over an Array of names" do + replacement = Markbridge::Parsers::HTML::Handlers::SimpleHandler.new(Markbridge::AST::Italic) + registry.overlay(%w[a b]) { |_| replacement } + expect(registry["a"]).to be(replacement) + expect(registry["b"]).to be(replacement) + end + + it "yields each name's previously-bound handler when called with an Array" do + yielded = [] + registry.overlay(%w[a img]) do |previous| + yielded << previous + previous + end + + expect(yielded.size).to eq(2) + expect(yielded.first).to be_a(Markbridge::Parsers::HTML::Handlers::UrlHandler) + expect(yielded.last).to be_a(Markbridge::Parsers::HTML::Handlers::ImageHandler) + end + + it "returns self for chaining" do + expect(registry.overlay("a") { |p| p }).to be(registry) + end + end end diff --git a/spec/unit/markbridge/parsers/html/handlers/self_closing_handler_spec.rb b/spec/unit/markbridge/parsers/html/handlers/self_closing_handler_spec.rb new file mode 100644 index 0000000..e110bde --- /dev/null +++ b/spec/unit/markbridge/parsers/html/handlers/self_closing_handler_spec.rb @@ -0,0 +1,49 @@ +# frozen_string_literal: true + +RSpec.describe Markbridge::Parsers::HTML::Handlers::SelfClosingHandler do + let(:parent) { Markbridge::AST::Paragraph.new } + + describe "#initialize" do + it "exposes the element_class via reader" do + expect(described_class.new(Markbridge::AST::LineBreak).element_class).to eq( + Markbridge::AST::LineBreak, + ) + end + end + + describe "#process" do + it "appends a fresh instance of element_class to parent" do + handler = described_class.new(Markbridge::AST::LineBreak) + + handler.process(element: nil, parent:) + + expect(parent.children.size).to eq(1) + expect(parent.children.first).to be_a(Markbridge::AST::LineBreak) + end + + it "returns nil so the parser does not recurse into children" do + handler = described_class.new(Markbridge::AST::LineBreak) + + expect(handler.process(element: nil, parent:)).to be_nil + end + + it "produces a fresh instance on every call (not a shared object)" do + handler = described_class.new(Markbridge::AST::HorizontalRule) + + handler.process(element: nil, parent:) + handler.process(element: nil, parent:) + + expect(parent.children.size).to eq(2) + expect(parent.children[0]).not_to equal(parent.children[1]) + end + + it "respects the configured element_class (HorizontalRule, not LineBreak)" do + handler = described_class.new(Markbridge::AST::HorizontalRule) + + handler.process(element: nil, parent:) + + expect(parent.children.first).to be_a(Markbridge::AST::HorizontalRule) + expect(parent.children.first).not_to be_a(Markbridge::AST::LineBreak) + end + end +end diff --git a/spec/unit/markbridge/parsers/html/handlers/span_handler_spec.rb b/spec/unit/markbridge/parsers/html/handlers/span_handler_spec.rb index 50545ff..a139b49 100644 --- a/spec/unit/markbridge/parsers/html/handlers/span_handler_spec.rb +++ b/spec/unit/markbridge/parsers/html/handlers/span_handler_spec.rb @@ -229,7 +229,7 @@ def fragment(html) 'X', ) - expect(result).to eq("[u]**X**[/u]") + expect(result.markdown).to eq("[u]**X**[/u]") end end end diff --git a/spec/unit/markbridge/parsers/media_wiki/inline_parser_spec.rb b/spec/unit/markbridge/parsers/media_wiki/inline_parser_spec.rb index 0701c0e..b852661 100644 --- a/spec/unit/markbridge/parsers/media_wiki/inline_parser_spec.rb +++ b/spec/unit/markbridge/parsers/media_wiki/inline_parser_spec.rb @@ -469,7 +469,7 @@ def parse(text) r.register("mark", :formatting, Markbridge::AST::Bold) end end - let(:parser) { described_class.new(inline_tag_registry: registry) } + let(:parser) { described_class.new(handlers: registry) } it "handles custom registered tags" do doc = parse("highlighted") @@ -556,7 +556,7 @@ def parse(text) Markbridge::Parsers::MediaWiki::InlineTagRegistry.build_from_default do |r| r.register("highlight", :formatting, Markbridge::AST::Bold) end - parser = described_class.new(inline_tag_registry: registry) + parser = described_class.new(handlers: registry) parent = Markbridge::AST::Document.new # Outer ''…'' wraps the content in Italic and recurses via # parse_inner_content; the inner tag must still resolve diff --git a/spec/unit/markbridge/parsers/media_wiki/parser_spec.rb b/spec/unit/markbridge/parsers/media_wiki/parser_spec.rb index 2013b19..210d821 100644 --- a/spec/unit/markbridge/parsers/media_wiki/parser_spec.rb +++ b/spec/unit/markbridge/parsers/media_wiki/parser_spec.rb @@ -746,12 +746,12 @@ def parse_table(wikitext) end describe "constructor customization" do - it "accepts a custom inline_tag_registry" do + it "accepts a custom handlers registry" do registry = Markbridge::Parsers::MediaWiki::InlineTagRegistry.build_from_default do |r| r.register("mark", :formatting, Markbridge::AST::Bold) end - parser = described_class.new(inline_tag_registry: registry) + parser = described_class.new(handlers: registry) doc = parser.parse("highlighted") paragraph = doc.children.first @@ -766,4 +766,43 @@ def parse_table(wikitext) expect(paragraph.children.first).to be_a(Markbridge::AST::Bold) end end + + describe "#parse" do + it "clears unknown_tags from the previous parse so a fresh call has a fresh tally" do + parser = described_class.new + parser.parse("x") + expect(parser.unknown_tags).to eq("neverknown" => 1) + + parser.parse("hello") + + expect(parser.unknown_tags).to eq({}) + end + + it "forwards the configured handler registry into the inline parser" do + registry = + Markbridge::Parsers::MediaWiki::InlineTagRegistry.build_from_default do |r| + r.register("highlight", :formatting, Markbridge::AST::Bold) + end + + doc = described_class.new(handlers: registry).parse("x") + paragraph = doc.children.first + + # The default registry doesn't know ; the custom one + # maps it to Bold. If parse dropped the handlers: kwarg the + # InlineParser would fall back to .default and the tag would + # survive as literal text. + expect(paragraph.children.first).to be_a(Markbridge::AST::Bold) + end + + it "normalizes line endings before splitting into lines" do + # Use the Unicode line separator (U+2028). split("\n") does NOT + # split on it, so without the normalize step the input would + # collapse into a single line with a literal separator inside — + # producing one paragraph instead of three headings. + doc = described_class.new.parse("== H1 ==
== H2 ==
== H3 ==") + + headings = doc.children.select { |c| c.is_a?(Markbridge::AST::Heading) } + expect(headings.size).to eq(3) + end + end end diff --git a/spec/unit/markbridge/parsers/text_formatter/handler_registry_spec.rb b/spec/unit/markbridge/parsers/text_formatter/handler_registry_spec.rb index 4fe53ac..a29d15a 100644 --- a/spec/unit/markbridge/parsers/text_formatter/handler_registry_spec.rb +++ b/spec/unit/markbridge/parsers/text_formatter/handler_registry_spec.rb @@ -5,10 +5,13 @@ RSpec.describe Markbridge::Parsers::TextFormatter::HandlerRegistry do let(:registry) { described_class.default } let(:parent) { Markbridge::AST::Document.new } + let(:processor) do + instance_double(Markbridge::Parsers::TextFormatter::Parser, process_children: nil) + end def process_and_get_node(xml_string) xml = Nokogiri.XML(xml_string).root - registry.process_element(xml, parent) + registry.process_element(xml, parent, processor) parent.children.last end @@ -18,15 +21,17 @@ def process_and_get_node(xml_string) xml = Nokogiri.XML("").root fake_node = Markbridge::AST::Text.new("x") registry.register("custom", handler) - allow(handler).to receive(:process).with(element: xml, parent:).and_return(fake_node) + allow(handler).to receive(:process).with(element: xml, parent:, processor:).and_return( + fake_node, + ) - expect(registry.process_element(xml, parent)).to eq(fake_node) + expect(registry.process_element(xml, parent, processor)).to eq(fake_node) end it "returns nil when no handler is registered for the element name" do xml = Nokogiri.XML("").root - expect(registry.process_element(xml, parent)).to be_nil + expect(registry.process_element(xml, parent, processor)).to be_nil end context "with default handlers" do @@ -88,7 +93,7 @@ def process_and_get_node(xml_string) it "dispatches the asterisk element (non-XML name, registered directly) to a ListItem handler" do element = instance_double(Nokogiri::XML::Element, name: "*") - result = registry.process_element(element, parent) + result = registry.process_element(element, parent, processor) expect(result).to be_a(Markbridge::AST::ListItem) expect(parent.children.last).to eq(result) @@ -173,9 +178,11 @@ def process_and_get_node(xml_string) xml = Nokogiri.XML("").root replacement = Markbridge::AST::Text.new("replaced") registry.register("B", new_handler) - allow(new_handler).to receive(:process).with(element: xml, parent:).and_return(replacement) + allow(new_handler).to receive(:process).with(element: xml, parent:, processor:).and_return( + replacement, + ) - expect(registry.process_element(xml, parent)).to eq(replacement) + expect(registry.process_element(xml, parent, processor)).to eq(replacement) end end @@ -306,4 +313,58 @@ def handler_for(name) expect(registry.has_handler?("B")).to be true end end + + describe "#[]" do + it "returns the handler bound to an element name (case-insensitive)" do + registry = described_class.default + expect(registry["b"]).to be_a(Markbridge::Parsers::TextFormatter::Handlers::SimpleHandler) + expect(registry["B"]).to be(registry["b"]) + end + + it "returns nil when no handler is bound" do + expect(described_class.new["never-seen"]).to be_nil + end + end + + describe "#overlay" do + let(:registry) { described_class.default } + + it "yields the previously bound handler" do + seen = nil + registry.overlay("URL") { |p| seen = p } + expect(seen).to be_a(Markbridge::Parsers::TextFormatter::Handlers::UrlHandler) + end + + it "yields nil for unbound names" do + seen = :unset + registry.overlay("NEVER-SEEN") do |p| + seen = p + Markbridge::Parsers::TextFormatter::Handlers::SimpleHandler.new(Markbridge::AST::Bold) + end + expect(seen).to be_nil + end + + it "registers whatever the block returns" do + replacement = + Markbridge::Parsers::TextFormatter::Handlers::SimpleHandler.new(Markbridge::AST::Italic) + + registry.overlay("URL") { |_| replacement } + + expect(registry["URL"]).to be(replacement) + end + + it "iterates over an Array of names" do + replacement = + Markbridge::Parsers::TextFormatter::Handlers::SimpleHandler.new(Markbridge::AST::Italic) + + registry.overlay(%w[URL EMAIL]) { |_| replacement } + + expect(registry["URL"]).to be(replacement) + expect(registry["EMAIL"]).to be(replacement) + end + + it "returns self for chaining" do + expect(registry.overlay("URL") { |p| p }).to be(registry) + end + end end diff --git a/spec/unit/markbridge/parsers/text_formatter/parser_spec.rb b/spec/unit/markbridge/parsers/text_formatter/parser_spec.rb index 6e6aa4d..f893f70 100644 --- a/spec/unit/markbridge/parsers/text_formatter/parser_spec.rb +++ b/spec/unit/markbridge/parsers/text_formatter/parser_spec.rb @@ -113,7 +113,7 @@ it "does not track a registered handler as unknown even when it returns nil" do void_handler = Class.new(Markbridge::Parsers::TextFormatter::Handlers::BaseHandler) do - def process(element:, parent:) + def process(element:, parent:, processor: nil) nil end end @@ -166,4 +166,31 @@ def process(element:, parent:) expect(parent.children.map(&:class)).to eq([Markbridge::AST::Text, Markbridge::AST::Bold]) end end + + describe "custom handlers that recurse manually" do + it "passes element:, parent:, processor: and lets a handler recurse via processor.process_children" do + wrap_handler = + Class.new(Markbridge::Parsers::TextFormatter::Handlers::BaseHandler) do + def initialize + @element_class = Markbridge::AST::Bold + end + attr_reader :element_class + + def process(element:, parent:, processor:) + wrapper = Markbridge::AST::Bold.new + parent << wrapper + processor.process_children(element, wrapper) + nil # we recursed manually; don't double-process + end + end + + parser = described_class.new { |r| r.register("WRAP", wrap_handler.new) } + + doc = parser.parse("x") + wrap = doc.children.first + + expect(wrap).to be_a(Markbridge::AST::Bold) + expect(wrap.children.first).to be_a(Markbridge::AST::Italic) + end + end end diff --git a/spec/unit/markbridge/renderers/discourse/identity_escaper_spec.rb b/spec/unit/markbridge/renderers/discourse/identity_escaper_spec.rb new file mode 100644 index 0000000..6d25a32 --- /dev/null +++ b/spec/unit/markbridge/renderers/discourse/identity_escaper_spec.rb @@ -0,0 +1,72 @@ +# frozen_string_literal: true + +RSpec.describe Markbridge::Renderers::Discourse::IdentityEscaper do + let(:escaper) { described_class.new } + + describe "#escape" do + it "returns the input unchanged" do + expect(escaper.escape("**hi** *star* `code` ")).to eq("**hi** *star* `code` ") + end + + it "preserves whitespace and newlines verbatim" do + input = " leading\n middle\n trailing " + expect(escaper.escape(input)).to eq(input) + end + + it "returns the same object when given a String (no allocation)" do + input = +"plain" + expect(escaper.escape(input)).to be(input) + end + + it "returns an empty string for nil (parity with MarkdownEscaper#escape)" do + expect(escaper.escape(nil)).to eq("") + end + end + + describe "as plumbed through Markbridge.discourse_renderer(escape: false)" do + let(:renderer) { Markbridge.discourse_renderer(escape: false) } + + it "leaves Markdown-special characters in Text nodes untouched" do + result = renderer.render(Markbridge::AST::Text.new("a*b_c [d](e)")) + + expect(result).to eq("a*b_c [d](e)") + end + + it "leaves block-level constructs untouched (lists, headings, quotes)" do + result = renderer.render(Markbridge::AST::Text.new("# Heading\n- item\n1. ordered\n> quoted")) + + expect(result).to eq("# Heading\n- item\n1. ordered\n> quoted") + end + + it "is end-to-end usable through bbcode_to_markdown" do + result = Markbridge.bbcode_to_markdown("[b]hi[/b] *raw* `untouched`", renderer:) + + # The Bold tag still wraps; the surrounding text is *not* escaped. + expect(result.markdown).to eq("**hi** *raw* `untouched`") + end + end + + describe "discourse_renderer mutual-exclusion" do + it "raises when escape: false is combined with escape_hard_line_breaks: true" do + expect { + Markbridge.discourse_renderer(escape: false, escape_hard_line_breaks: true) + }.to raise_error(ArgumentError, /mutually exclusive/) + end + + it "raises when escape: false is combined with allow:" do + expect { Markbridge.discourse_renderer(escape: false, allow: :lists) }.to raise_error( + ArgumentError, + /mutually exclusive/, + ) + end + + it "lets an explicit escaper: win even when escape: false is given" do + explicit = Markbridge::Renderers::Discourse::MarkdownEscaper.new + renderer = Markbridge.discourse_renderer(escaper: explicit, escape: false) + + # The MarkdownEscaper still escapes, so `*` becomes `\*`. + result = renderer.render(Markbridge::AST::Text.new("a*b")) + expect(result).to eq('a\*b') + end + end +end diff --git a/spec/unit/markbridge/renderers/discourse/markdown_escaper/allow_spec.rb b/spec/unit/markbridge/renderers/discourse/markdown_escaper/allow_spec.rb new file mode 100644 index 0000000..ace7d5a --- /dev/null +++ b/spec/unit/markbridge/renderers/discourse/markdown_escaper/allow_spec.rb @@ -0,0 +1,234 @@ +# frozen_string_literal: true + +RSpec.describe Markbridge::Renderers::Discourse::MarkdownEscaper, "#initialize" do + describe "escape_hard_line_breaks:" do + it "defaults to false (preserves trailing-space hard breaks)" do + expect(described_class.new.escape("foo \nbar")).to eq("foo \nbar") + end + + it "when true, strips trailing spaces before newlines" do + expect(described_class.new(escape_hard_line_breaks: true).escape("foo \nbar")).to eq( + "foo\nbar", + ) + end + end + describe "default behavior (allow: nil)" do + let(:escaper) { described_class.new } + + it "escapes the leading dash of a bullet list" do + expect(escaper.escape("- item")).to eq("\\- item") + end + + it "escapes the leading plus of a bullet list" do + expect(escaper.escape("+ item")).to eq("\\+ item") + end + + it "escapes the leading star of a bullet list" do + expect(escaper.escape("* item")).to eq("\\* item") + end + + it "escapes the period of an ordered list" do + expect(escaper.escape("1. item")).to eq("1\\. item") + end + + it "escapes the close-paren of an ordered list" do + expect(escaper.escape("1) item")).to eq("1\\) item") + end + end + + describe "allow: :bullet_list" do + let(:escaper) { described_class.new(allow: :bullet_list) } + + it "passes a `- item` line through unescaped" do + expect(escaper.escape("- item")).to eq("- item") + end + + it "passes a `+ item` line through unescaped" do + expect(escaper.escape("+ item")).to eq("+ item") + end + + it "passes a `* item` line through unescaped" do + expect(escaper.escape("* item")).to eq("* item") + end + + it "still escapes ordered lists (only bullets allowed)" do + expect(escaper.escape("1. item")).to eq("1\\. item") + end + + it "still escapes a thematic break of dashes" do + expect(escaper.escape("---")).to eq("\\-\\-\\-") + end + + it "still escapes a thematic break of stars" do + expect(escaper.escape("***")).to eq("\\*\\*\\*") + end + + it "still escapes a setext underline of dashes after a paragraph" do + expect(escaper.escape("paragraph\n---")).to eq("paragraph\n\\-\\-\\-") + end + + it "still inline-escapes content after the bullet marker" do + # The leading "- " passes through, but inline `*emphasis*` markers + # inside the line still get escaped. + expect(escaper.escape("- a *star* mark")).to eq("- a \\*star\\* mark") + end + end + + describe "allow: :ordered_list" do + let(:escaper) { described_class.new(allow: :ordered_list) } + + it "passes a `1. item` line through unescaped" do + expect(escaper.escape("1. item")).to eq("1. item") + end + + it "passes a `1) item` line through unescaped" do + expect(escaper.escape("1) item")).to eq("1) item") + end + + it "passes large ordered numbers through unescaped" do + expect(escaper.escape("99. item")).to eq("99. item") + end + + it "still escapes bullet lists (only ordered allowed)" do + expect(escaper.escape("- item")).to eq("\\- item") + end + + it "still inline-escapes content after the marker" do + # `1.` passes through; an inline `*emphasis*` in the rest + # is still escaped. + expect(escaper.escape("1. a *star* mark")).to eq("1. a \\*star\\* mark") + end + end + + describe "allow: :atx_heading" do + let(:escaper) { described_class.new(allow: :atx_heading) } + + it "passes an h1 through unescaped" do + expect(escaper.escape("# Heading")).to eq("# Heading") + end + + it "passes an h6 through unescaped" do + expect(escaper.escape("###### Heading")).to eq("###### Heading") + end + + it "passes a 7-hash run through (CommonMark rejects 7+ hashes as a heading; not the kwarg's concern)" do + # ATX_HEADING is `\#{1,6}(?=[ \t]|$)` — 7 hashes do not match, + # so this never enters the allow-checked branch; behaviour is + # identical to the default escaper. + expect(escaper.escape("####### Heading")).to eq("####### Heading") + end + + it "still inline-escapes content after the heading marker" do + expect(escaper.escape("## a *star* h2")).to eq("## a \\*star\\* h2") + end + + it "passes a `# ` empty heading through unescaped" do + # Edge case from the plan: `# ` matches ATX_HEADING with empty + # content. With :atx_heading allowed, the marker passes verbatim; + # Discourse renders this as an empty

. + expect(escaper.escape("# ")).to eq("# ") + end + end + + describe "allow: :block_quote" do + let(:escaper) { described_class.new(allow: :block_quote) } + + it "passes a `> quoted` line through unescaped" do + expect(escaper.escape("> quoted")).to eq("> quoted") + end + + it "still inline-escapes content after the `>`" do + expect(escaper.escape("> a *star*")).to eq("> a \\*star\\*") + end + end + + describe "allow: :lists (alias for both)" do + let(:escaper) { described_class.new(allow: :lists) } + + it "passes bullet lists through unescaped" do + expect(escaper.escape("- item")).to eq("- item") + end + + it "passes ordered lists through unescaped" do + expect(escaper.escape("1. item")).to eq("1. item") + end + + it "still escapes thematic breaks" do + expect(escaper.escape("---")).to eq("\\-\\-\\-") + end + end + + describe "allow: as an Array" do + it "accepts an Array of granular keys" do + escaper = described_class.new(allow: %i[bullet_list ordered_list]) + + expect(escaper.escape("- item")).to eq("- item") + expect(escaper.escape("1. item")).to eq("1. item") + end + + it "accepts an Array containing aliases (expanded)" do + escaper = described_class.new(allow: [:lists]) + + expect(escaper.escape("- item")).to eq("- item") + expect(escaper.escape("1. item")).to eq("1. item") + end + end + + describe "allow: with unknown keys" do + it "raises ArgumentError naming the unknown key and the recognised set" do + expect { described_class.new(allow: :headings) }.to raise_error(ArgumentError) do |error| + expect(error.message).to include("headings") + expect(error.message).to include("bullet_list") + expect(error.message).to include("ordered_list") + expect(error.message).to include("atx_heading") + expect(error.message).to include("block_quote") + expect(error.message).to include("lists") + end + end + + it "raises when one element of an Array is unknown (others ignored)" do + expect { described_class.new(allow: %i[bullet_list typos]) }.to raise_error( + ArgumentError, + /typos/, + ) + end + end + + describe "interaction with thematic breaks and setext underlines" do + let(:escaper) { described_class.new(allow: :lists) } + + it "still escapes a thematic break of dashes even with :bullet_list allowed" do + expect(escaper.escape("---")).to eq("\\-\\-\\-") + end + + it "still escapes a setext-dash underline after a paragraph" do + expect(escaper.escape("paragraph\n---")).to eq("paragraph\n\\-\\-\\-") + end + + it "still escapes a thematic break of stars" do + expect(escaper.escape("***")).to eq("\\*\\*\\*") + end + end + + describe "as plumbed through Markbridge.discourse_renderer" do + it "forwards :lists to the constructed escaper" do + renderer = Markbridge.discourse_renderer(allow: :lists) + input = "- item" + + # The default postprocessor strips trailing whitespace; the + # bullet line itself passes through unescaped. + result = renderer.render(Markbridge::AST::Text.new(input)) + expect(result).to eq("- item") + end + + it "is ignored when an explicit escaper: is supplied" do + explicit = described_class.new # no allow + renderer = Markbridge.discourse_renderer(escaper: explicit, allow: :lists) + + # The factory must not override an explicit escaper — the user's + # instance wins. + result = renderer.render(Markbridge::AST::Text.new("- item")) + expect(result).to eq("\\- item") + end + end +end diff --git a/spec/unit/markbridge/renderers/discourse/postprocessor_spec.rb b/spec/unit/markbridge/renderers/discourse/postprocessor_spec.rb new file mode 100644 index 0000000..446334a --- /dev/null +++ b/spec/unit/markbridge/renderers/discourse/postprocessor_spec.rb @@ -0,0 +1,55 @@ +# frozen_string_literal: true + +RSpec.describe Markbridge::Renderers::Discourse::Postprocessor do + let(:postprocessor) { described_class.new } + + describe "#call" do + it "collapses runs of three or more newlines to exactly two" do + expect(postprocessor.call("a\n\n\n\nb")).to eq("a\n\nb") + end + + it "collapses every run of 3+ newlines, not just the first" do + # Two distinct runs — `sub` would only catch the first. + expect(postprocessor.call("a\n\n\nb\n\n\nc")).to eq("a\n\nb\n\nc") + end + + it "removes whitespace-only lines (preserving multiple of them)" do + expect(postprocessor.call("a\n \nb\n\t\nc")).to eq("a\n\nb\n\nc") + end + + it "strips leading and trailing whitespace from the document" do + expect(postprocessor.call(" hi ")).to eq("hi") + end + + it "leaves a single blank line between paragraphs alone" do + expect(postprocessor.call("a\n\nb")).to eq("a\n\nb") + end + end + + describe "DEFAULT" do + it "is a Postprocessor instance" do + expect(described_class::DEFAULT).to be_a(described_class) + end + + it "behaves like a fresh instance" do + expect(described_class::DEFAULT.call("a\n\n\nb")).to eq("a\n\nb") + end + end + + describe "as a Renderer dependency" do + it "is invoked by Markbridge.bbcode_to_markdown via the renderer" do + custom = + Class.new(described_class) do + def call(text) + "PROCESSED:#{text.strip}" + end + end + + renderer = Markbridge.discourse_renderer(postprocessor: custom.new) + + expect(Markbridge.bbcode_to_markdown("[b]hi[/b]", renderer:).markdown).to eq( + "PROCESSED:**hi**", + ) + end + end +end diff --git a/spec/unit/markbridge/renderers/discourse/renderer_spec.rb b/spec/unit/markbridge/renderers/discourse/renderer_spec.rb index 3fb5d93..04c3610 100644 --- a/spec/unit/markbridge/renderers/discourse/renderer_spec.rb +++ b/spec/unit/markbridge/renderers/discourse/renderer_spec.rb @@ -42,6 +42,18 @@ expect(result).to eq('a\*b') end + it "falls back to Postprocessor::DEFAULT when no postprocessor is provided" do + expect(described_class.new.postprocessor).to be( + Markbridge::Renderers::Discourse::Postprocessor::DEFAULT, + ) + end + + it "uses an explicit postprocessor when one is provided" do + custom = Markbridge::Renderers::Discourse::Postprocessor.new + + expect(described_class.new(postprocessor: custom).postprocessor).to be(custom) + end + it "uses an explicit html_escaper when one is provided" do html_escaper = class_double(Markbridge::Renderers::Discourse::HtmlEscaper) allow(html_escaper).to receive(:escape).and_return("HTML-ESCAPED") diff --git a/spec/unit/markbridge/renderers/discourse/tag_library_spec.rb b/spec/unit/markbridge/renderers/discourse/tag_library_spec.rb index 747311a..a3bfecb 100644 --- a/spec/unit/markbridge/renderers/discourse/tag_library_spec.rb +++ b/spec/unit/markbridge/renderers/discourse/tag_library_spec.rb @@ -171,4 +171,57 @@ end end end + + describe "#unregister" do + it "removes a previously registered binding" do + tag = Markbridge::Renderers::Discourse::Tag.new { |_, _| "x" } + library.register(Markbridge::AST::Bold, tag) + + library.unregister(Markbridge::AST::Bold) + + expect(library[Markbridge::AST::Bold]).to be_nil + end + + it "is a no-op when the class was never registered" do + expect { library.unregister(Markbridge::AST::Bold) }.not_to raise_error + end + + it "returns self for chaining" do + expect(library.unregister(Markbridge::AST::Bold)).to be(library) + end + end + + describe "#merge" do + let(:bold_tag) { Markbridge::Renderers::Discourse::Tag.new { |_, _| "b" } } + let(:italic_tag) { Markbridge::Renderers::Discourse::Tag.new { |_, _| "i" } } + + it "registers each non-nil mapping" do + library.merge(Markbridge::AST::Bold => bold_tag, Markbridge::AST::Italic => italic_tag) + + expect(library[Markbridge::AST::Bold]).to be(bold_tag) + expect(library[Markbridge::AST::Italic]).to be(italic_tag) + end + + it "unregisters classes with a nil value" do + library.register(Markbridge::AST::Bold, bold_tag) + + library.merge(Markbridge::AST::Bold => nil) + + expect(library[Markbridge::AST::Bold]).to be_nil + end + + it "removes the class from iteration when given a nil value (vs. registering nil)" do + library.register(Markbridge::AST::Bold, bold_tag) + + library.merge(Markbridge::AST::Bold => nil) + + # Iteration must reflect deletion — registering `nil` would leave the + # class as a key with a nil value. + expect(library.map { |klass, _| klass }).not_to include(Markbridge::AST::Bold) + end + + it "returns self for chaining" do + expect(library.merge({})).to be(library) + end + end end diff --git a/spec/unit/markbridge/renderers/discourse/tags/table_tag_spec.rb b/spec/unit/markbridge/renderers/discourse/tags/table_tag_spec.rb index d1b1417..8488aa3 100644 --- a/spec/unit/markbridge/renderers/discourse/tags/table_tag_spec.rb +++ b/spec/unit/markbridge/renderers/discourse/tags/table_tag_spec.rb @@ -645,4 +645,17 @@ def build_uneven_table_with(child_in_first_cell) expect(result).to include("

") end end + + describe "with rows-less children" do + it "returns the empty string for a table whose children are all non-Row" do + # The render method's empty_table? predicate must check that no + # child is an AST::TableRow specifically — not just that the + # children list is non-empty. A bare Text child is not a row, + # so the table is effectively empty. + table = Markbridge::AST::Table.new + table << Markbridge::AST::Text.new("stray") + + expect(tag.render(table, interface)).to eq("") + end + end end
x