-
Notifications
You must be signed in to change notification settings - Fork 4.7k
strip whitespaces in render_block_core_cover #40859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
09739d8
849197a
9be451c
ee8fe32
2981dd7
e374216
7d65028
98424c7
7cd71a2
4a4aa98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -60,10 +60,13 @@ class="wp-block-cover__image-background" | |||||||||||||||||||||||||||||||||||||||
| esc_attr( $object_position ) | ||||||||||||||||||||||||||||||||||||||||
| ); | ||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| $content = str_replace( | ||||||||||||||||||||||||||||||||||||||||
| '</span><div', | ||||||||||||||||||||||||||||||||||||||||
| '</span>' . $image . '<div', | ||||||||||||||||||||||||||||||||||||||||
| $content | ||||||||||||||||||||||||||||||||||||||||
| /* Inserts the featured image between the (1st) cover 'background' `span` and 'inner_container' `div`, | ||||||||||||||||||||||||||||||||||||||||
| and removes eventual withespace characters between the two (typically introduced at template level) */ | ||||||||||||||||||||||||||||||||||||||||
| $content = preg_replace( | ||||||||||||||||||||||||||||||||||||||||
| '/(<span\b[^>]*wp-block-cover__background[\s|"][^>]*>.*<\/span>)\s*(<div\b[^>]*wp-block-cover__inner-container[\s|"][^>]*>)/U', | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
| /** | |
| * For themes without theme.json file, make sure | |
| * to restore the inner div for the group block | |
| * to avoid breaking styles relying on that div. | |
| * | |
| * @param string $block_content Rendered block content. | |
| * @param array $block Block object. | |
| * @return string Filtered block content. | |
| */ | |
| function gutenberg_restore_group_inner_container( $block_content, $block ) { | |
| $tag_name = isset( $block['attrs']['tagName'] ) ? $block['attrs']['tagName'] : 'div'; | |
| $group_with_inner_container_regex = sprintf( | |
| '/(^\s*<%1$s\b[^>]*wp-block-group(\s|")[^>]*>)(\s*<div\b[^>]*wp-block-group__inner-container(\s|")[^>]*>)((.|\S|\s)*)/U', | |
| preg_quote( $tag_name, '/' ) | |
| ); |
As you can see, no class= to match the div with class="wp-block-group", and the div withclass="wp-block-group__inner-container"
And again:
gutenberg/lib/block-supports/layout.php
Lines 243 to 246 in 51db4bf
| $replace_regex = sprintf( | |
| '/(^\s*<%1$s\b[^>]*wp-block-group[^>]*>)(.*)(<\/%1$s>\s*$)/ms', | |
| preg_quote( $tag_name, '/' ) | |
| ); |
No class=. And the matching of a closing tag. And .* between the two.
As in our case, they had to put something between the two groups.
With your HTML malformed every day usecase, putting a div inside a span:
<span><span>...</span><div>...</span><div>...my regex would insert the featured image before the last div. While the regex you suggested before the first div.
So what?
It would seem you are maintaining at the same time:
- "It's a bad practice trying to cotrol what comes before the opening div. That's why I was suggesting no control at all"
- "I can't accept we are so permissive about what comes (within the span) before the div.We must absolutly control it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but I bet it could accomplished rewriting less than 10% of the entire Gutenberg!
- Insert an HTML element as a placeholder in the editor.
Easy: rewite the Cover Block (and probably the Image Block)!
-
Reconstruct the block's HTML in PHP
More or less this is what has been done by @draganescu. Who has added such a cool feature to WP. I was so impacient that installed Gutenberg for the first time to try it. And I had the misfortune to discover some kind of bug. It's not a real bug. And it wasn't me to discover the nature of this pseudo 'bug'. We have to remove some spaces and line breaks. That's it. If yu have some code, let's do it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navigatrum first of all I want to say thank for working hard on this. please know that when I look at a proposed change my purpose and goal is to try and find the places where it could break down and so if I seem pessimistic it's because that's how I'm looking at the changes (and all changes for any PR).
Concerning the class question would it be right to understand that you are saying you removed it because there was other code in the project that also doesn't? If so, I wouldn't suggest we need to feel bound to match that. I'm not sure why it was left out there either, and it's not a critical thing, but it just seems surprising that we'd want to intentionally decouple the class name we're looking for from the class attribute we expect to find it in. 🤷♂️
my regex would insert the featured image before the last div. While the regex you suggested before the first div.
So what?
I might not have presented a great example because it looks malformed. In this example it should be more apparent that the .* will continue consuming input as long as it can while still matching the entire thing. Even a small change there could prevent a match from continuing past any unexpected tags (for example, if we use [^<]* instead of .* then we at least shouldn't break past unexpected tags that might sit between the <span> and the <div>).
We're probably somewhat shielded here because we're only being supplied the HTML from within the block itself (and also because we added the match on the class name), but if this were applied to the post (or if we were only looking at the tag names) we could easily get into trouble.
It would seem you are maintaining at the same time:
I think there's a misunderstand here. What I'm saying is that we should be aware that whitespace isn't significant in HTML in the way we were treating it and we should look for an approach that doesn't depend on a specific whitespace formatting. We have very few guarantees about how it will reach our code.
When moving to the more robust RegExp there was simply another point I wanted to raise about the risk .* introduces and how it can cause our matcher to accidentally consume significantly more than we expect it to (and perform much slower while doing it!)
It's unfortunate we have more cases where we've already introduced these risks but I believe that's a good enough reason to be on alert to help spot these before they get merged. RegExp patterns can be hard enough, let alone with the complexity of trying to parse HTML…
In summary I think there's several ways to move this forward without throwing much out. Replacing the .* addresses the biggest issue. The point about associating a start tag with its end tag isn't a demand; it's an acknowledgement of how this stuff works unless we build far more complicated patterns (purely due to the recursive nature of HTML).
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the regex again, do you think the span tag matching can be removed? Doesn't seem needed as long as the opening div is matched. Even if there is a possibility to have the div without a span before it, thinking it would be good to insert the featured image before that div. This also removes the need to use .* and match white space between the nodes.
So the regex would be something like: '/<div\b[^>]+wp-block-cover__inner-container(\s|")[^>]*>/U'.
(Note I also changed * to + after the first [^>] as it has to match at least one char.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the logic of the block is:
the image goes between a span with class "wp-block-cover__background" followed by a div with class "wp-block-cover__inner-container". If you remove one or the other, no image. What if I want to put something inside the span which in the react code is empty? Good luck!
To you and @dmsnell since regexes are problematic for HTML parsing, and for a better maintainability, the php porting of the react logic should be reduced to: the featured image goes before a div with class "wp-block-cover__inner-container". Do what you want before that div and good luck!
Or , dsmell variant: if we want to require both the span and the div, at that point we must require an empty span.
Now, just in case for @draganescu and @dmsnell '/<div\b[^>]+wp-block-cover__inner-container[\s|"][^>]*>/U' by @azaozz is acceptable, and given that
$inner_container_start = // above regex;
if ( 1 !== preg_match( $inner_container_start, $content, $matches, PREG_OFFSET_CAPTURE ) {
// handle this, we can't find our div, maybe another plugin changed it
return $content;
}
list( $fullmatch, $offset ) = $matches[0];
$content = substr( $content, 0, $offset ) . $image_tag . substr( $content, $offset );shuold be acceptable by @dmsnell
I'm sending the 4th PR to trim the coverblock (in a few hours)
(Sorry if a wasted your time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navigatrum no wasted time at all, quite the reverse it will be a fruitful conversation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all seems good, though I should rephrase, as I never wanted to suggest we try and trap the <span>. here's the difference:
- finding the start of the
<div>with a RegExp is normal and doable and not problematic from a parsing standpoint. this is because it isn't parsing HTML at that point. it's only parsing HTML tokens, or lexing the HTML. - trying to identify the end of a specific
<span>before the<div>is parsing HTML and that's the problematic part with the RegExp and that's the kind of thing we want to avoid (and it's almost always the case to be suspicious when we find ourselves adding.*)
seems like a nuance or a small thing, but there's a profound impact. my comments were more like, "if you are going to match the <span> too here's what you should be aware of).
I still don't understand why we don't want to match on class, but that's not a big issue to me since wp-block-cover__inner-container is unlikely to appear outside of where we expect it. note though, this does match a few quirky possibilities:
<div title="Demonstrating how to use the wp-block-cover__inner-container styles"><div class="x-disregard-wp-block-cover__inner-container">
Feel free to ignore matching these, but if you are curious…
"~<div [^>]*\bclass=(\"|')((?!\1).)*\b{$CLASS}\b((?!\1).)*\1( |>)~"
This is almost the same as the one you gave above in your last comment. The main differences are that this:
- specifically looks for the
classattribute - makes sure we're only examining the content inside the
classattribute by matching on the leading quote and trailing quote, ensuring they are the same, and ensuring we don't scan past the matching one with((?!\1).)*though I think we could replace that with the simpler[^'"]*since I think quotes aren't allowed as class names. - requires that the class name we're searching for is not a part of a bigger string through the surrounding
\bboundary markers (that's wrong, should be[\"' ]since the hyphen counts as a word boundary) - while we're not matching a closing HTML tag I'm still using
~instead of/as the PCRE pattern delimiters so we wouldn't have to escape/as\/
with the simplification that would be this instead
"~<div [^>]*\bclass=([\"'])[^\"']*[\"' ]wp-block-cover__inner-container[\"' ][^\"']*\1~"
this is simpler than general HTML attribute parsing because of the rules of CSS class names. hopefully that fulfills your curiosity. feel free to use this or ask questions or ignore it.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the featured image have to be inserted there? Who decided that and for what purpose? Could you please comment to make sure we're on the same page (I'm probably missing some considerations there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you didn't read the first line of comment to this PR.
It says:
Fell free to take a look at the history of that file: the who and their purpose cannot be really hidden. My 2¢? A henchman of his. 😄
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@azaozz I decided that for not having a better way - as Gutenberg is missing a token system - to make the cover block get the featured image at render time.
If you look at the history of PRs in which I tried to solve the featured image in cover block problem you'll see there are a few constraints that arise from our lack of a token system which results in this kind of render string operations on the block markup.
It is not the only block where these operations occur and I think it also happens (e.g.
gutenberg_restore_group_inner_container,render_block_core_image,render_block_core_post_featured_image,render_block_core_template_part,render_block_core_site_logo).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my bad, should have explained this better. Was thinking that the way this works may have been simpler if the featured image was appended to the end of the block's HTML. The difference in the featured image's node position would have probably been pretty easy to account for with a little of CSS. Even maybe that could have been preferred in some cases as would have made it possible to determine when an image was inserted there.