-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary data converted? #454
Comments
V8JS_ZSTR uses v8::String::NewFromUtf8, so I think that is the problem. |
Hmmm found something interesting... |
@chrisbckr This one is an old one but still a thorn in my side 😂 It's possibly a little down to my lack of knowledge around JS, too. Sounds like there's something in what you've found!! Very promising. I'm just not sure whether it's my approach that's wrong, or whether there is something already suited to doing this. |
Need more tests (and improvements), but... case IS_STRING:
value_str = Z_STR_P(value);
if (ZSTR_LEN(value_str) > std::numeric_limits<int>::max()) {
zend_throw_exception(php_ce_v8js_exception,
"String exceeds maximum string length", 0);
break;
}
zval fname, retval;
zval params[2];
ZVAL_STRING(&fname, "mb_check_encoding");
ZVAL_COPY_VALUE(¶ms[0], value);
ZVAL_STRING(¶ms[1], "UTF-8");
if ((SUCCESS == call_user_function(CG(function_table), NULL, &fname,
&retval, 2, params)) && (Z_TYPE(retval) == IS_TRUE)) {
jsValue = V8JS_ZSTR(value_str);
}
else {
jsValue = v8::String::NewFromOneByte(isolate, (unsigned char *)ZSTR_VAL(value_str), v8::NewStringType::kNormal, ZSTR_LEN(value_str)).ToLocalChecked();
}
break; And at the end of Makefile.frag
@stesie do you know any side effect of using NewFromOneByte instead of NewFromUtf8 in this case? |
And maybe we can enable this checking with a flag or something like this. |
1 similar comment
And maybe we can enable this checking with a flag or something like this. |
It feels like this is probably the safest option if the PR is a go, but I’ll be quite happy to give this one a good test! (The OP is me from my other account) |
i've been playing around with this test script, along with a PNG <?php
class Foo extends \V8Js
{
private static $file = __DIR__ . '/test.png';
public function read(): string
{
return file_get_contents(self::$file);
}
public function write(string $content)
{
$oldContent = file_get_contents(self::$file);
echo ($oldContent === $content) ? "Files are the same!" : "Files are different :(";
echo "\n" . strlen($oldContent) . ' vs ' . strlen($content) . "\n"; // 6339 vs 9362 :(
echo mb_detect_encoding($content) . "\n"; // utf-8 :(
}
}
$v = new Foo();
$result = $v->executeString('
const content = PHP.read();
print(typeof content + "\n"); // outputs: string
print(content.length + "\n"); // same as what the actual filesize is :)
PHP.write(content);
'); Things work great on read() !! Just struggling with the changes needed around |
Good point. I will investigate this. |
@chrisbckr ahh man, turns out what I was trying to do wasn’t far off ! Sure, will have a play in a little bit when I get a little time! |
Do you know if it’s recursive? e.g. if I have a PHP array and perhaps one string is regular (like a filename) and the other is binary (like the content), does it iterate into the array calling these conversions? Edit: Had a quick play, and it certainly does handle strings inside of arrays etc, too! My crude test (I used force array flag to keep it as an array going both ways, else it's a V8Object): <?php
class Foo extends \V8Js
{
private static $file = __DIR__ . '/test.png';
public function read(): string
{
echo "File: " . self::$file . "\n";
return file_get_contents(self::$file);
}
public function readArray(): array
{
return [
'file' => self::$file,
'content' => $this->read()
];
}
public function write(string $content)
{
$oldContent = file_get_contents(self::$file);
echo ($oldContent === $content) ? "Files are the same!" : "Files are different :(";
echo "\n" . strlen($oldContent) . ' vs ' . strlen($content) . "\n";
echo mb_detect_encoding($content) . "\n"; // utf-8 :(
}
public function writeArray($array)
{
$this->write($array['content']);
echo (strlen($array['file']) === strlen(self::$file) ? "YES!" : "NO") . "\n";
}
public function outLen(string $str)
{
echo "str is " . strlen($str) . "\n";
}
}
$v = new Foo();
$result = $v->executeString('
const content = PHP.read();
PHP.write(content);
const arr = PHP.readArray();
PHP.writeArray(arr);
PHP.outLen(arr.file);
PHP.outLen(arr.content);
', null, V8Js::FLAG_FORCE_ARRAY); |
Just added a flag V8Js::FLAG_CHECK_BINARY_STRING to enable this feature(?) hehe |
Awesome! One thing i picked up on testing (as mentioned above, in having to use the FORCE_ARRAY flag - and probably another topic/issue but more of a passing thought for now hehe) - is it possible that when an array (associative or otherwise) is passed FROM php to JS, it can also maintain its type when passed back? e.g. if i do this without force array flag: class Foo extends V8Js
{
public function getArray()
{
return ['foo' => 'bar'];
}
public function setArray(array $array)
{
}
}
$v = new Foo();
$v->executeString('
const arr = PHP.getArray();
PHP.setArray(arr);
'); Then you get:
I get why returning regular objects is useful, and it probably opens up a can of worms as to WHEN you might want a V8Object or an array (like, what if you constructed an "associative array" in JS, and passed it - how would you know whether to cast it to an array for PHP, or keep it as a V8Object? etc) |
Otherwise - I love that this one could potentially solve a lot of headaches here! Awesome stuff :) |
Hej, sorry for the kinda late reply, since you've already started coding. While I like the straight-forward solution and agree that it solves the raised issue, ... I still wonder if it's the right way to do it or the idiomatic solution. To generalize further ... In plain PHP you usually handle binary data in strings, since PHP doesn't care about the encoding at all. Which however has its limitations, since e.g. Contrary JavaScript land: strings "usually" are used for "real strings" only. Binary data is idiomatically handled via So far php-v8js bridges PHP strings to JS strings just fine as long as you stick to the de facto "standard" of encoding strings with utf-8. And this has been fine for quite a while. Hence from an idiomatic standpoint I think it'd be better that we'd allow to export a PHP string (i.e. zend_string) to V8 as a Uint8Array. We could even make it an external Uint8Array so data wouldn't have to be copied in memory from PHP to V8 (and likely vice versa) -- so as a benefit it'd also be suitable to pass around larger data sets efficiently. And in contiunuation of the UTF-16 example above, if we'd provide the plain ArrayBuffer to JS land, a v8js user could simply do something like From PHP side this could look like this: $v8 = new V8Js();
$binaryBlob = \file_get_contents('blob.bin');
$temp = new V8Uint8Array($binaryBlob); // this wraps the string
// (and ref-counts it, so it remains if user does "$binaryBlob = null")
// ... and it's also a marker for V8Js itself
$v8->arrayBufferInstance = $temp; // make ArrayBuffer available to JS land (`zval_to_v8js` detects it) ... that said, I've just tried passing back a Uint8Array from JS to PHP and it crashes (on my MacBook, haven't tried with Linux):
Aside from that, and contrary to the We could offer a $v8 = new V8Js();
$binaryBlob = \file_get_contents('blob.bin');
$v8->arrayBufferInstance = $v8->makeUint8Array($binaryBlob);
$v8->executeString('var_dump(PHP.arrayBufferInstance instanceof Uint8Array)'); // true Let me know what you think of this alternative approach. Looking forward to your replies. |
I had the same when playing with ArrayObject, Uint8Array, etc on Linux. Passing them to PHP segfaults. Anyways - if the coded approach isn't "JS-like" then any solutions would be welcome! I the thing I'd need to work out would be how to handle a PHP V8Js extension function potentially returning different things with different prototypes. e.g. if I had a function called
In the solution @chrisbckr produced, it was possible to get a binary (or other) string, do whatever with it, and pass the binary strings back to PHP completely in-tact to a function typehinted as a string. I don't know what the difference would be from his I think though, if it getting the
Then perhaps 99% of our issues are solved. |
@stesie I agree. My proposed solution keeps what PHP do with binary strings: it's not reliable and can (and probably will) give headaches some time in the future. Just need Indeed using Uint8Array will be the correct way to guarantee that the binary data will be 100% accurate all the time. And the control (and responsability) will be with the user, if something goes wrong, he/she knows where the problem is and how to fix. And having Uint8Array as a tools, this will be easy and reliable. Personally I don't care to have a little more lines of code and assure that my data will not corrupt. But I just didn't get how to use V8Object for this? Will create a property to set that it has a Uint8Array data? And maybe use this in the future for other cases? |
And in fact PHP has the flag |
@redbullmarky If V8Js had a "makeUint8Array" you could create a wrapper (maybe with __get) to return Uint8Array for some functions class V8Env extends \V8Js {
public function getUint8Array($fn, ...$args) {
return $this->makeUint8Array($fn(...$args));
}
}
$v8 = new V8Env;
$v8->file_get_contents = fn(...$args) => $v8->getUint8Array('file_get_contents', ...$args);
$v8->executeString('var_dump(PHP.file_get_contents("binaryFile.bin"));'); or class V8Env extends \V8Js {
public function makeUint8Array($bla) {
// just to simulate, since we really don't have this method yet
return $bla;
}
}
class Uint8Magic {
// here you can list what methods you will wrap to return Uint8Array
private $methodsWrapper = [
'file_get_contents'
];
private $v8;
public function __construct($v8) {
$this->v8 = $v8;
foreach ($this->methodsWrapper as $method) {
$v8->{$method} = $this->{$method};
}
}
public function __get($name) {
return fn(...$args) => $this->v8->makeUint8Array($name(...$args));
}
}
$v8 = new V8Env;
$uint = new Uint8Magic($v8);
$v8->executeString('var_dump(PHP.file_get_contents("binFile.bin"));'); |
Definitely a viable solution! Like I say, as long as there is a mechanism to read, manipulate and write binary contents across the PHP/JS boundary, all is good! Much of the inner workings can easily be abstracted away from the user and just..work! |
I'm not sure if it's possible. It's just an idea I didn't want to rule out ... Before thinking of how to pass Uint8Array (and others) from PHP to JS we (maybe?) should have a look at the JS -> PHP way first. I haven't yet found the time to investigate why it immediately crashes if you return a JS-side created It feels a bit strange to me, I would have expected that php-v8js just wraps it as a So first thing I'd strive for is keeping it from crashing and passing an instance from JS to PHP and back (without PHP being able to access the content). Then we could consider how PHP may get access to those TypedArray instances. Likely you can then already invoke To provide such a mechanism we might have a new V8ArrayBuffer (extending V8Object) that has a If the above works we can think of
So if you $v8 = new V8Js();
$blob = file_get_contents('blob.bin');
$buf = $v8->makeArrayBuffer( $blob ); ... you'll have a So JS side can then create a TypedArray view on it $v8->executeString('
const u8 = new Uint8Array( PHP.buf );
const result = u8.find/slice/replace( ... ) // do whatever
result; // pass back result, possible a UInt8Array (or an ArrayBuffer, or ...)
'); The more I think of it I feel like leaving all TypedArray instances opaque to PHP, i.e. pass them around with Hope that makes my thoughts a little clearer. Feel free to ask further questions if anything remains unclear. Upcoming week will be pretty busy. I won't be able to code anything before next weekend. But likely will find some time to possibly discuss this further |
It seems this is where the segfault occurs, though at this point I don't have many other details: https://github.com/phpv8/v8js/blob/php8/v8js_object_export.cc#L149 Z_ADDREF_P(&fci.params[i]); |
Yes, just tested with the same code that @stesie tested. $v8 = new V8Js();
echo "1" . PHP_EOL;
$a = $v8->executeString(' (new Uint8Array([65,66,67])); ');
var_dump($a); In this case it uses it calls
Then I changed the validation to
old school debugging... remembering when I was 11 using MS-DOS 6.22 and playing with QuickBasic haha |
What the heck!?
|
https://www.mail-archive.com/[email protected]/msg07066.html |
So for the moment it should suffice if we add And then let's see how to the actual ArrayBuffer <-> PHP integration |
Hey
Not sure whether this is a bug or a hole in knowledge...
I have a PHP function invoked from V8Js which returns raw binary data. The idea being that a user writing JS scripts in their sandbox can use this data with another V8Js-invoked PHP function to send it to an API.
The issue is the file appears to undergo a bit of conversion and i'm not sure of the best one-size-fits-all approach that'll work whether the file is binary or not. Ideally, no conversion at all would be best :)
We'll call the function called from JS
PHP.get_file()
for simplicity and assume that it does a basic file_get_contents() in PHP, returned directly.The only thing that worked in this case after randomly trying stuff was, instead of returning the value directly, doing a quick conversion first:
which seems a little odd, given that PHP tells me that the result of the mb_convert_encoding is 46013 but the variable in JS it's directly assigned to now reports it as 32404....the correct number. mb_detect_encoding at this stage gives me nothing, so i hard-coded "ASCII" for testing.
Until i pass the binary data back into another PHP function from my JS. Now it's back to 46013 and a quick mb_convert_encoding the other way (ASCII from UTF-8) gives me back my original 32404 file. mb_detect_encoding before that gives me UTF-8
Encodings give me headaches on good days, so apologies if that was a long-winded way to explain a simple thing. I'm hoping that there is a way this can be resolved WITHOUT the JS code writer having to change their stuff.
Is this a bug? Or something that's easily solvable? Hoping that there's:
The text was updated successfully, but these errors were encountered: