Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] evalSha not working the same as in Redis #1537

Open
PatNowak opened this issue Jan 9, 2025 · 11 comments
Open

[BUG] evalSha not working the same as in Redis #1537

PatNowak opened this issue Jan 9, 2025 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@PatNowak
Copy link

PatNowak commented Jan 9, 2025

Describe the bug

We have PHP application using symfony/lock. Version 6.4 works fine, as it's using eval. Version 7.2 and beyond use evalSha under the hood.
It works fine for Redis 6.2, 7.2 and so on. It didn't work well for Valkey 8 (7.2.4 redis version).

Lock::acquire attempt for Redis returns true, for Valkey false.
In the previous version (using eval) it works fine.

Is there any significant difference between evalSha of Valkey and Redis?

Reference: symfony/symfony#59387

To reproduce

I've made a repository that allows you to reproduce it: https://github.com/PatNowak/symfony-valkey-poc.
Long story short:

git clone [email protected]:PatNowak/symfony-valkey-poc.git
cd symfony-valkey-poc
./app.sh valkey

It will by default use symfony/lock in the version 7.2 that use evalSha under the hood.

To downgrade the version please do:

./app.sh 6.4
./app.sh down
./app.sh valkey

You can also check it works fine (before downgrading) on the Redis:

./app.sh down
./app.sh redis

I left the comments in the repository README.md file.

Expected behavior

The evalSha will work the same as in the Redis.

Additional information

I referenced the issue in the Symfony, as first I suspected something on their side. I also tested both PHP redis connectors - phpredis and predis. No difference.

@rjd15372 rjd15372 self-assigned this Jan 9, 2025
@zuiderkwast
Copy link
Contributor

There have been some changes to the script cache over the years. I don't remember exactly when each change happened, but both of changes below were made in Redis when it was still open source, before Valkey was forked, but one the second one was only released in Valkey 8 (and Redis 7.4 probably).

  1. Long ago, scripts used to be replicated and stored in RDB files, so a replica has the same scripts as a primary. This was removed in 7.0(?). I don't know exactly, but there is some discussion about this in Print the number of lua scripts if any when doing check-rdb #1448.
  2. There is a new feature to evict scripts, which protects against clients that create too many scripts. It was added in this commit: ad28d22. Some follow-ups were done in Valkey after that, such as Lua scripts promoted from eval to script load to avoid evict #637.

I don't know if these are reasons for your problems, but in any case, I think you should always handle the NOSCRIPT error by EVASHA and fallback to EVAL or SCRIPT LOAD. The script cache is just a cache. You shouldn't rely on scripts remaining cached forever.

@rjd15372
Copy link
Contributor

rjd15372 commented Jan 9, 2025

TL;DR The problem is caused by a change made to Valkey in https://github.com/valkey-io/valkey/pull/617/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9L1819-R1819, which changed the error string returned by EVALSHA when the script does not exist.

I've debugged the issue and the problem is indeed in a change made to Valkey, which changed the error string returned by EVALSHA when the script does not exist.

What's happening is that symfony is trying to evaluate a Lua script, and the function that implements the execution of Lua scripts in redis/valkey first tries to run EVALSHA with the sha1 of the script body, and in the case that EVALSHA returns the NOSCRIPT error, it runs SCRIPT LOAD with the script body, and retries to run the EVALSHA command.

To check that EVALSHA returned the NOSCRIPT error, symphony uses the following condition present in https://github.com/symfony/symfony/blob/v7.2.2/src/Symfony/Component/Lock/Store/RedisStore.php:

if (self::NO_SCRIPT_ERROR_MESSAGE !== $e->getMessage()) {
    throw new LockStorageException($err);
}
// otherwise add the script with SCRIPT LOAD

And self::NO_SCRIPT_ERROR_MESSAGE is a constant defined as:

private const NO_SCRIPT_ERROR_MESSAGE = 'NOSCRIPT No matching script. Please use EVAL.';

The problem is that in Valkey 8, the PR #617 changed the error message for the NOSCRIPT error by removing the Please use EVAL. suffix from the original error message, which makes symphony to fail the above check and decide to throw an error instead of adding the script.

So basically we've changed a user-facing error message in 8.0, which is breaking other software that relies on it.

@zuiderkwast should we fix this by re-adding the removed suffix? And then backport it?

@rjd15372 rjd15372 added the bug Something isn't working label Jan 9, 2025
@madolson
Copy link
Member

madolson commented Jan 9, 2025

I think we should. Application compatibility is such a bugger to get right, it seems like we should continue to be as consistent as possible with Redis.

@zuiderkwast
Copy link
Contributor

I guess the reason we removed "Please use EVAL" is that "SCRIPT LOAD" can also be used for loading a script.

It's a small breaking change and it was done in a major release, right? Applications should
only match on the first word of the error message, i.e. "NOSCRIPT", not the verbatim text. I thought that's documented, but now I can only find this on https://valkey.io/topics/protocol/:

The error prefix allows the client to understand the type of error returned by the server without checking the exact error message.

So we can never change any formulation in any error message without considering it a breaking change then? We have replaced "Redis" with "Valkey" in a number of error messages in our first release....

I would hope we could just make the documentation more clear, such that clients shall match only the first uppercase word and MUST NOT match the full text.

@madolson
Copy link
Member

madolson commented Jan 9, 2025

It's a small breaking change and it was done in a major release, right? Applications should
only match on the first word of the error message, i.e. "NOSCRIPT", not the verbatim text. I thought that's documented, but now I can only find this on https://valkey.io/topics/protocol/:

I agree. I think we were within our right to do it. I guess my point is we didn't really need to change this one, the only reason we did it was to be generic. We could have introduced a new error message just for the new command.

@PingXie
Copy link
Member

PingXie commented Jan 10, 2025

"within our right" - yes
"should've checked error prefix" - agreed

That said, I agree with @madolson that it is in our best interests to eliminate migration frictions whenever possible.

@zuiderkwast
Copy link
Contributor

I think you're right. We can't rely on clients only checking the first word of the error. In many cases checking the first word is not enough, because many errors don't have a proper tag. Then the client needs to check the full text. Example: ERR This instance has cluster support disabled.

I agree we should avoid changing error messages. When we do, I think we shall list them as breaking changes from now on. Regarding reverting this particular error message, I don't have a strong opinion.

@johanwilfer
Copy link

johanwilfer commented Jan 10, 2025

We are considering to contribute a PR to Symfony Lock component to make it as robust as possible.

Would it be correct then from your perspective to check if the message begins with "NOSCRIPT " instead of as today check the full message? For all other cases an expection is thrown (like today if it isn't matching exactly).

Would this be a more robust implementation for both Valkey and Redis or do you see problems with that?

@zuiderkwast
Copy link
Contributor

@johanwilfer Yes, we recommended this way to check errors. It is more robust.

@PatNowak
Copy link
Author

Ok, I'll work on the PR next week then. Thanks a lot for your insights!

@PatNowak
Copy link
Author

FYI: The PR for symfony is there: symfony/symfony#59488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants