-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A different approach #105
Comments
I would be okay with changing approaches, lets just make sure to benchmark it and confirm that it won't break existing tests. |
Great! I've continued to work out the details, and it still looks promising. I faced a number of problems but they could be resolved. After the weekend I hope to be able to present a working version that passes the tests. I understand how to run the tests. What should I do to benchmark? The problem of removing whitespace is fascinating. It seems to me that to do it right, you would at least need to tokenize and parse the Javascript. But even then you could not just remove all whitespace tokens and serialize the parse tree. There are some very specific rules you need to follow to ensure that the resulting JS code still works. But these rules are not given, they need to be discovered by trial and error. Tokenizing and parsing JS in PHP is out of the question, just because it takes up a lot more time. But without it, you run the real chance to mess things up. For example, I ran across what I will call "the double division problem":
In my code, and your code as well, if I remember correctly, It freaked me out pretty bad. This is just an example of the things that may go wrong without proper tokenization and parsing. But we have to deal with it. What is also troubling me is that the existing JShrink code will probably create output that is just a little different than mine. An extra enter here and there. More to the point: why does JShrink add enters after every function?
If the output differs to much, or the users depend on the exceptions being thrown, consider the possibility of running in two modes: "classic mode" (existing code) and "preg_replace_callback mode" (or something). I see this in several tests, but I have not found a reason for it. They seem to be superfluous. I hesitate to add these enters just to make the tests pass; I would like to know why they are necessary. Finally I don't see the relevance for the IE 10- style conditional comments and the bizarre series of whitespace characters (mongolian vowel separator?) but they don't slow things much to I will put them in. These are my concerns. I hope you can find some time to look at them. |
Hello Robert, I can now present a fully working version. I changed the code many times and tried all sorts of techniques. This latest version is fast and extendable. The code needs a lot more testing on real-world examples. I could use some help with that. I pasted the code below for you to study and try. Let me know what you think of it, and how to proceed. Next thursday I will be away for holidays; I will not be able to respond for 2 weeks. [1] 4x on PHP5.6; 7x on PHP7; 15x on PHP8.0
|
Sorry to have to replace this code so soon; I hope you didn't start reading it. I thought of a better way to handle the comments, that didn't involve markers and allowed the resulting code to be even smaller, and simpler to reason about. This new version consists of 2 phases: in the first phase all non-essential comments are removed. Then, in phase 2, with all nasty comments out of the way, it is very simple to reason about whitespace, because there are no comments polluting it. I restricted whitespace to
|
Okay, I did another rewrite. It occurred to me that most replacements are simple whitespace deletions, and these don't require a callback, a simple replace will do. This version is again faster than the one before. I also added a solution for the double division problem. The check for a valid regexp is extended by matching the characters that follow it. I am off on a holiday now. Won't be able to answer any questions at this time. Hope you will have some time to look at the code and decide whether or not you would like to use it in JShrink, and which new form this will take. I realize this is quite a change to the library. If there's any way I can help to ease the change, I will be happy to help. And I will be available after the integration to fix the bugs that will pop up. Here are the main issues with this change, as I see it
And these are the advantages
|
very nice @garfix thank you for the code. Edit: I think there is an issue with semi colons. You mave have to automatically add semi colons after lines. |
Thanks, @yellow1912 ! Can you give an example of the problem with semicolons? |
Let me know if you need full code, but for example something like this:
You notice that if you minify using the given code, it will generate something like this:
The gerated code will not work because of a missing semi colon:
|
Yes, I understand. This probably the reason why JShrink adds newlines after functions. Thanks for the clear example. I will dive into it. |
Hi @yellow1912 , yes this was a big thing. In Javascript the semantics may depend on newlines, in pretty complicated ways. To tackle this problem, newlines are only removed when this is known to be safe. This is a very important when the code has no semicolons, but it is also present when there are semicolons. For readers new into this, the issue can best be explained with an example:
The newlines between the function and the assignment may be omitted:
But if we write
The newline may not be removed
I changed to code to deal with this.
This new code now passes the tests that require newlines to be present. Tests that are still not passed are:
I have prepared some new tests as well. |
After jQuery, I checked Lodash and Babel. This brought to light a JIT stack overflow problem, which troubled me quite a bit, but which proved to be solvable by using greedy quantifiers. Any problem in the execution of the regular expressions is now reported via a RuntimeException. I changed some of the expressions to have the output look a bit more like the old JShrink output. There remains an obvious difference in the use of newlines after each - and + by JShrink that I don't understand. Would like to have some feedback or further instructions from @tedivm ;)
|
Hello Robert, I worked out the regular-expressions-based approach. It tested it on some libraries and it seems to have reached an acceptable level of completeness and robustness. On PHP 7 it's about 10 times faster than the existing code. On PHP 8 (with JIT enabled!) it's about 5 times faster. It also solves a number of the open issues of the package. As mentioned, it does not pass all tests, and this is intentional. I think it's now up to you to decide if you accept the new code or not. I can imagine that you feel this code is just too different from the old code; or that for some reason you just don't feel comfortable with it. This is fine too. I will just start a new package under my own account and move this code into it. If you like it, but don't trust it enough to swap the new code in completely, we could add the option "mode" ("classic": the existing code, default; or "regexp": the new code) The users may then explicitly set the new "regexp" mode if they lack speed or run into some errors with the existing code. If you just think it's good, I suppose the package needs to jump to version 2. In any case, I would like you to tell me what to do next. It is not clear to me and I need your assistance. |
Since it has been three weeks since I placed my request, I assume you have neither the time nor the inclination to do this integration. And because the proposed code is quite different from the existing one I think it is probably a good idea to start a new package. I will start making preparations. |
@garfix sound good I think. Please make sure to put a link here. |
https://github.com/garfix/js-minify Still working on it. Feedback is welcome. Robert, I would like to add a thank-you-Robert-for-your-work-on-JShrink text in the README, but your license says I have to ask your permission. So if you don't mind you have to say so. Also, I copied some of your tests (changing some of the content and the names). If you don't agree with this, I will rewrite them. |
I have made the first release. It's on Packagist: |
The new package looks awesome! Thanks for your work on it! |
Hi,
I found out about this package because Magento 2 is using it in its build process. The build process is very time consuming, and part this time is spent compressing Javascript files with JShrink. When I looked at the code it occurred to me that a different approach might make it much faster. This approach would be based on the pivotal use of the PHP function preg_replace_callback, which would process the entire Javascript file at once.
Here's a bit of (very incomplete) code I used to test if this would work:
So, basically, the outermost loop is replaced by a single preg_replace_callback. It is much faster because it implements the inner loop with C code (the implementation of preg_replace_callback is written in C), rather than PHP code.
Before I work this out in detail, I was wondering if you are open to this complete rewrite. I am willing to perform the complete rewrite myself, but there are also some points in which the new code will not be backwards compatible with the existing code, like checking if a regular expression is well formed and throwing a RuntimeException. This new code will not do that. Doubtlessly there will be some other minor points that are not completely backwards compatible. So the question is: are you willing to loose some of the compatibility for a signification increase in processing speed?
If you are interested, I would like the help of some beta testers to debug my code. I just had this idea, it doesn't mean I make no mistakes working it out.
Of course I can start a repository in my own domain, but then it would have to start its user base from the start. By rewriting JShrink it may benefit all of your users.
What do you think?
The text was updated successfully, but these errors were encountered: