Skip to content

Block Delimiter: add package contents #43648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
354 changes: 354 additions & 0 deletions projects/packages/block-delimiter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,360 @@ Efficiently work with block structure

To use this package in your WordPress plugin, you can require both this package and the [Jetpack Autoloader](https://packagist.org/packages/automattic/jetpack-autoloader) in your project's `composer.json` file.

## Usage

The Block Delimiter package provides an efficient, streaming parser for working with WordPress block structure without the memory overhead of `parse_blocks()`. It's designed for scenarios where you need to inspect, find, or modify specific blocks without parsing the entire block tree.

### Basic Block Scanning

Find and iterate through all block delimiters in a document:

```php
use Automattic\Block_Delimiter;

$post_content = '<!-- wp:paragraph -->
<p>Hello world!</p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":123} -->
<figure><img src="example.jpg" /></figure>
<!-- /wp:image -->';

foreach ( Block_Delimiter::scan_delimiters( $post_content ) as $where => $delimiter ) {
// $where is an array: [ byte_offset, byte_length ]
// $delimiter is a Block_Delimiter instance

echo "Found block: " . $delimiter->allocate_and_return_block_type() . "\n";
echo "Type: " . $delimiter->get_delimiter_type() . "\n";
}
```

**Output:**
```
Found block: core/paragraph
Type: opener
Found block: core/paragraph
Type: closer
Found block: core/image
Type: opener
Found block: core/image
Type: closer
```

### Finding Specific Block Types

Efficiently find blocks of a specific type without parsing everything:

```php
use Automattic\Block_Delimiter;

$post_content = '<!-- wp:paragraph -->
<p>Welcome to my blog!</p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":456} -->
<figure><img src="photo.jpg" /></figure>
<!-- /wp:image -->
<!-- wp:gallery {"ids":[789,101]} -->
<figure class="wp-block-gallery">...</figure>
<!-- /wp:gallery -->';

// Find the first image block
foreach ( Block_Delimiter::scan_delimiters( $post_content ) as $delimiter ) {
if ( ! $delimiter->is_block_type( 'image' ) ) {
continue;
}

if ( Block_Delimiter::OPENER === $delimiter->get_delimiter_type() ) {
$attributes = $delimiter->allocate_and_return_parsed_attributes();
if ( isset( $attributes['id'] ) ) {
echo "Found image with ID: " . $attributes['id'];
break;
}
}
}
```

**Output:**
```
Found image with ID: 456
```

### Extracting Block Attributes

Parse JSON attributes only when needed:

```php
use Automattic\Block_Delimiter;

$post_content = '<!-- wp:paragraph {"fontSize":"large"} -->
<p class="has-large-font-size">This paragraph has a large font size.</p>
<!-- /wp:paragraph -->
<!-- wp:paragraph -->
<p>This paragraph has no custom font size.</p>
<!-- /wp:paragraph -->
<!-- wp:paragraph {"fontSize":"small","textColor":"primary"} -->
<p class="has-primary-color has-small-font-size">This paragraph has a small font size and primary color.</p>
<!-- /wp:paragraph -->';

foreach ( Block_Delimiter::scan_delimiters( $post_content ) as $delimiter ) {
if ( $delimiter->is_block_type( 'core/paragraph' ) &&
Block_Delimiter::OPENER === $delimiter->get_delimiter_type() ) {

$attributes = $delimiter->allocate_and_return_parsed_attributes();
if ( isset( $attributes['fontSize'] ) ) {
echo "Paragraph with font size: " . $attributes['fontSize'] . "\n";
}
}
}
```

**Output:**
```
Paragraph with font size: large
Paragraph with font size: small
```

### Counting Block Types

Get a summary of all block types in a document:

```php
use Automattic\Block_Delimiter;

$post_content = '<!-- wp:heading {"level":2} -->
<h2>My Blog Post</h2>
<!-- /wp:heading -->
<!-- wp:paragraph -->
<p>Introduction paragraph.</p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":123} -->
<figure><img src="hero.jpg" /></figure>
<!-- /wp:image -->
<!-- wp:paragraph -->
<p>Another paragraph.</p>
<!-- /wp:paragraph -->
<!-- wp:list -->
<ul><li>Item 1</li><li>Item 2</li></ul>
<!-- /wp:list -->';

function get_block_types_in( string $html ): array {
$block_types = [];

foreach ( Block_Delimiter::scan_delimiters( $html ) as $delimiter ) {
if ( Block_Delimiter::OPENER === $delimiter->get_delimiter_type() ) {
$block_types[ $delimiter->allocate_and_return_block_type() ] = true;
}
}

$block_types = array_keys( $block_types );
sort( $block_types );
return $block_types;
}

$block_types = get_block_types_in( $post_content );
print_r( $block_types );
```

**Output:**
```
Array
(
[0] => core/heading
[1] => core/image
[2] => core/list
[3] => core/paragraph
)
```

### Extracting Complete Block Content

Extract an entire block including its delimiters and content:

```php
use Automattic\Block_Delimiter;

$post_content = '<!-- wp:paragraph -->
<p>First paragraph.</p>
<!-- /wp:paragraph -->
<!-- wp:heading {"level":3} -->
<h3>Section Title</h3>
<!-- /wp:heading -->
<!-- wp:paragraph -->
<p>Second paragraph with more content.</p>
<!-- /wp:paragraph -->';

function extract_block( string $block_name, string $html ): ?string {
$depth = 0;
$starts_at = null;

foreach ( Block_Delimiter::scan_delimiters( $html ) as $where => $delimiter ) {
if ( ! $delimiter->is_block_type( $block_name ) ) {
continue;
}

switch ( $delimiter->get_delimiter_type() ) {
case Block_Delimiter::VOID:
return substr( $html, $where[0], $where[1] );

case Block_Delimiter::OPENER:
$depth++;
$starts_at = $starts_at ?? $where[0];
break;

case Block_Delimiter::CLOSER:
if ( --$depth === 0 ) {
return substr( $html, $starts_at, $where[0] + $where[1] - $starts_at );
}
}
}

return null;
}

$heading_block = extract_block( 'heading', $post_content );
echo $heading_block;
```

**Output:**
```
<!-- wp:heading {"level":3} -->
<h3>Section Title</h3>
<!-- /wp:heading -->
```

### Modifying Block Content

Transform block content efficiently without parsing the entire tree:

```php
use Automattic\Block_Delimiter;

$post_content = '<!-- wp:paragraph -->
<p>Some text content.</p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":123} -->
<figure class="wp-block-image"><img src="photo1.jpg" /></figure>
<!-- /wp:image -->
<!-- wp:paragraph -->
<p>More text content.</p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":456} -->
<figure class="wp-block-image"><img src="photo2.jpg" /></figure>
<!-- /wp:image -->';

function add_css_class_to_images( string $post_content, string $css_class ): string {
$output = '';
$starts_at = null;
$was_at = 0;

foreach ( Block_Delimiter::scan_delimiters( $post_content ) as $where => $delimiter ) {
if ( ! $delimiter->is_block_type( 'image' ) ) {
continue;
}

list( $at, $length ) = $where;

if ( Block_Delimiter::OPENER === $delimiter->get_delimiter_type() ) {
$starts_at = $at + $length;
} elseif ( Block_Delimiter::CLOSER === $delimiter->get_delimiter_type() ) {
// Copy untouched content before this block
$output .= substr( $post_content, $was_at, $starts_at - $was_at );

// Transform the block content
$block_content = substr( $post_content, $starts_at, $at - $starts_at );
$output .= add_css_class( $block_content, $css_class );

$was_at = $at;
}
}

// Add any remaining content
$output .= substr( $post_content, $was_at );
return $output;
}

function add_css_class( string $html, string $css_class ): string {
// Simple example - add class to figure elements
return str_replace( 'class="wp-block-image"', 'class="wp-block-image ' . $css_class . '"', $html );
}

$modified_content = add_css_class_to_images( $post_content, 'custom-image-style' );
echo $modified_content;
```

**Output:**
```
<!-- wp:paragraph -->
<p>Some text content.</p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":123} -->
<figure class="wp-block-image custom-image-style"><img src="photo1.jpg" /></figure>
<!-- /wp:image -->
<!-- wp:paragraph -->
<p>More text content.</p>
<!-- /wp:paragraph -->
<!-- wp:image {"id":456} -->
<figure class="wp-block-image custom-image-style"><img src="photo2.jpg" /></figure>
<!-- /wp:image -->
```

### Error Handling

Check for parsing errors:

```php
use Automattic\Block_Delimiter;

$post_content = '<!-- wp:paragraph {"invalid": json} -->
<p>This block has invalid JSON attributes.</p>
<!-- /wp:paragraph -->
<!-- wp:image -->
<figure><img src="valid.jpg" /></figure>
<!-- /wp:image -->';

foreach ( Block_Delimiter::scan_delimiters( $post_content ) as $delimiter ) {
if ( Block_Delimiter::OPENER === $delimiter->get_delimiter_type() ) {
$attributes = $delimiter->allocate_and_return_parsed_attributes();
if ( null === $attributes && $delimiter->get_last_json_error() !== JSON_ERROR_NONE ) {
echo "Invalid JSON in " . $delimiter->allocate_and_return_block_type() . " block\n";
} elseif ( is_array( $attributes ) ) {
echo "Valid " . $delimiter->allocate_and_return_block_type() . " block\n";
} else {
echo "No attributes in " . $delimiter->allocate_and_return_block_type() . " block\n";
}
}
}

// Check for incomplete input
$incomplete_content = '<!-- wp:paragraph';
$delimiter = Block_Delimiter::next_delimiter( $incomplete_content, 0 );

if ( null === $delimiter ) {
$error = Block_Delimiter::get_last_error();
if ( Block_Delimiter::INCOMPLETE_INPUT === $error ) {
echo "Document appears to be truncated\n";
}
}
```

**Output:**
```
Invalid JSON in core/paragraph block
No attributes in core/image block
Document appears to be truncated
```

### Performance Benefits

The Block Delimiter approach offers significant performance advantages:

- **Zero memory overhead**: No block tree construction
- **Streaming processing**: Process only what you need
- **Lazy parsing**: JSON attributes parsed only when accessed
- **String-based operations**: Work directly with the source text
- **Early termination**: Stop processing when you find what you need

This makes it ideal for operations like finding specific blocks, counting block types, or making targeted modifications without the cost of full block tree parsing.

## Contribute

You can contribute to this package by submitting a pull request to the [Jetpack repository](https://github.com/Automattic/jetpack/tree/trunk/projects/packages/block-delimiter).
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Significance: minor
Type: added

Add initial functionality.
5 changes: 1 addition & 4 deletions projects/packages/block-delimiter/composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,7 @@
"link-template": "https://github.com/Automattic/block-delimiter/compare/v${old}...v${new}"
},
"mirror-repo": "Automattic/block-delimiter",
"textdomain": "jetpack-block-delimiter",
"version-constants": {
"::PACKAGE_VERSION": "src/class-block-delimiter.php"
}
"textdomain": "jetpack-block-delimiter"
},
"suggest": {
"automattic/jetpack-autoloader": "Allow for better interoperability with other plugins that use this package."
Expand Down
Loading