Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stats and keyspace notifications about lazy expiration #1547

Open
wants to merge 32 commits into
base: unstable
Choose a base branch
from

Conversation

proost
Copy link

@proost proost commented Jan 12, 2025

resolve: #1507

Copy link

codecov bot commented Jan 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.96%. Comparing base (8af35a1) to head (828550f).
Report is 15 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1547      +/-   ##
============================================
+ Coverage     70.85%   70.96%   +0.11%     
============================================
  Files           120      120              
  Lines         64991    65066      +75     
============================================
+ Hits          46050    46175     +125     
+ Misses        18941    18891      -50     
Files with missing lines Coverage Δ
src/db.c 89.51% <100.00%> (-0.05%) ⬇️
src/server.c 87.61% <100.00%> (+0.04%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 21 files with indirect coverage changes

@proost proost changed the title feat: keyspace expiration misses feat: stats and keyspace notifications about lazy expiration Jan 17, 2025
proost and others added 28 commits January 17, 2025 23:29
This PR replaces dict with hashtable in the ZSET datatype. Instead of
mapping key to score as dict did, the hashtable maps key to a node in
the skiplist, which contains the score. This takes advantage of
hashtable performance improvements and saves 15 bytes per set item - 24
bytes overhead before, 9 bytes after.

Closes valkey-io#1096

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: proost <[email protected]>
Skip logreqres on tests for the HELLO command

Signed-off-by: Rueian <[email protected]>
Signed-off-by: proost <[email protected]>
Resolves issue with valkey-cli not auto exiting from subscribed mode on
reaching zero pub/sub subscription (previously filed on Redis)
redis/redis#12592

---------

Signed-off-by: Nikhil Manglore <[email protected]>
Signed-off-by: proost <[email protected]>
- remove old ignores
- fix a "new" typo 🎁

Signed-off-by: Viktor Szépe <[email protected]>
Signed-off-by: proost <[email protected]>
This PR introduces improvements to the hashtable iterator, implementing
prefetching technique described in the blog post [Unlock One Million RPS
- Part 2](https://valkey.io/blog/unlock-one-million-rps-part2/) . The
changes lay the groundwork for further enhancements in use cases
involving iterators. Future PRs will build upon this foundation to
improve performance and functionality in various iterator-dependent
operations.

In the pursuit of maximizing iterator performance, I conducted a
comprehensive series of experiments. My tests encompassed a wide range
of approaches, including processing multiple bucket indices in parallel,
prefetching the next bucket upon completion of the current one, and
several other timing and quantity variations. Surprisingly, after
rigorous testing and performance analysis, the simplest implementation
presented in this PR consistently outperformed all other more complex
strategies.

## Implementation

Each time we start iterating over a bucket, we prefetch data for future
iterations:

- We prefetch the entries of the next bucket (if it exists).
- We prefetch the structure (but not the entries) of the bucket after
  the next.

This prefetching is done when we pick up a new bucket, increasing the
chance that the data will be in cache by the time we need it.

## Performance

The data below was taken by conducting keys command on 64cores Graviton
3 Amazon EC2 instance with 50 mil keys in size of 100 bytes each. The
results regarding the duration of “keys *” command was taken from “info
all” command.

```
+--------------------+------------------+-----------------------------+
| prefetching        | Time (seconds)   | Keys Processed per Second   |
+--------------------+------------------+-----------------------------+
| No                 | 11.112279        | 4,499,529                   |
| Yes                | 3.141916         | 15,913,862                  |
+--------------------+------------------+-----------------------------+
Improvement:
Comparing the iterator without prefetching to the one with prefetching,
we can see a speed improvement of 11.112279 / 3.141916 ≈ 3.54 times faster.
```

### Save command improvment

#### Setup:
- 64cores Graviton 3 Amazon EC2 instance.
-  50 mil keys in size of 100 bytes each.
-  Running valkey server over RAM file system.
-  crc checksum and comperssion off.

#### Results

```
+--------------------+------------------+-----------------------------+
| prefetching        | Time (seconds)   | Keys Processed per Second   |
+--------------------+------------------+-----------------------------+
| No                 | 28               | 1,785,700                   |
| Yes                | 19.6             | 2,550,000                   |
+--------------------+------------------+-----------------------------+
Improvement:
- Reduced SAVE time by 30% (8.4 seconds faster)
- Increased key processing rate by 42.8% (764,300 more keys/second)
```

Signed-off-by: NadavGigi <[email protected]>
Signed-off-by: proost <[email protected]>
This PR is to cleanup the `SERVER_TEST` compiler flag from cmake compile
definitions, as it is no longer required in the new unit test framework, see valkey-io#428.

Signed-off-by: Karthick Ariyaratnam <[email protected]>
Signed-off-by: proost <[email protected]>
We need to add a hash tag in cluster mode.
Fixes valkey-io#1531.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
The fix that Redis gave us for the CVE-2024-46981 was freeing lctx.lua,
and I didn't merge it correctly. We made some changes so that we are
able to async free the lua context, so we need to free the passed in
context. This was applied correctly on the two released versions (8.0
and 7.2) just not on unstable.

Signed-off-by: Madelyn Olson <[email protected]>
Signed-off-by: proost <[email protected]>
Fixes valkey-io#1538

Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: proost <[email protected]>
…lkey-io#1535)

In valkey-io#1441, we found a assert, and decided remove this assert and instead
just free the newly created node and close the link, since if we cannot
get the IP from the link it probably means the connection was closed.
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED ===
17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true

------ STACK TRACE ------

17847 valkey-server *
src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634]
src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de]
/__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e]
src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea]
src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547]
/lib64/libc.so.6(+0x40c8) [0x7f083985a0c8]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b]
src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5]
```

But it also introduces another assert. The reason is that this new node
is not added to the cluster nodes dict.
```
17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED ===
17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true

------ STACK TRACE ------

17128 valkey-server *
src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4]
src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2]
src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618]
/__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278]
src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09]
src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5]
src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e]
```

This closes valkey-io#1527.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
… is 0 (valkey-io#1541)

When latency-monitor-threshold is set to 0, it means the latency monitor
is disabled, and in VM_LatencyAddSample, we wrote the condition
incorrectly, causing us to record latency when latency was turned off.

This bug was introduced in the very first day, see e3b1d6d, it was merged
in 2019.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
…flicts (valkey-io#1018)

When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.

Therefore, we introduced into ranking based on the failed primary shard-id.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.

If a single primary is down, the behavior is the same as before. If multiple
primaries are down, their replica election initiation time will be delayed
by 500ms according to the ranking.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
… the FAIL (valkey-io#1191)

Imagine we have a cluster, for example a three-shard cluster,
if shard 1 doing a CLUSTER RESET HARD, it will change the node
name, and then other nodes will mark it as NOADR since the node
name received by PONG has changed.

In the eyes of other nodes, there is one working primary node
left but with no address, and in this case, the address report
in MOVED will be invalid and will confuse the clients. And in
the same time, the replica will not failover since its primary
is not in the FAIL state. And the cluster looks OK to everyone.

This leaves a cluster that appears OK, but with no coverage for
shard 1, obviously we should do something like CLUSTER FORGET
to remove the node and fix the cluster before using it.

But the point in here, we can mark the NOADDR node as FAIL to
advance the cluster state. If a node is NOADDR means it does
not have a valid address, so we won't reconnect it, we won't
send PING, we won't gossip it, it seems reasonable to mark it
as FAIL.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
When the cluster changes, we need to persist the cluster configuration,
and these file IO operations may cause latency.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
The commands used in valkey-cli tests are not important the reply schema
validation. Skip them to avoid the problem if tests hanging. This has
failed lately in the daily job:

```
[TIMEOUT]: clients state report follows.
sock55fedcc19be0 => (IN PROGRESS) valkey-cli pubsub mode with single standard channel subscription
Killing still running Valkey server 33357
```

These test cases use a special valkey-cli command `:get pubsub` command,
which is an internal command to valkey-cli rather than a Valkey server
command. This command hangs when compiled with with logreqres enabled.
Easy solution is to skip the tests in this setup.

The test cases were introduced in valkey-io#1432.

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: proost <[email protected]>
After valkey-io#1545 disabled some tests for reply schema validation, we now have
another issue that ECHO is not covered.

```
WARNING! The following commands were not hit at all:
  echo
ERROR! at least one command was not hit by the tests
```

This patch adds a test case for ECHO in the unit/other test suite. I
haven't checked if there are more commands that aren't covered.

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: proost <[email protected]>
This PR replaces dict with the new hashtable data structure in the HASH
datatype. There is a new struct for hashtable items which contains a
pointer to value sds string and the embedded key sds string. These
values were previously stored in dictEntry. This structure is kept
opaque so we can easily add small value embedding or other optimizations
in the future.

closes valkey-io#1095

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: proost <[email protected]>
In some cases unix groups could have whitespace and/or `\` in them.
One example is my workstation. It's a MacOS in an Active Directory
domain. So my user has group `LD\Domain Users`.
Running `make test` on `unstable` and `8.0` branches fails with:

I'm not sure if we need to fix this in 8.0. But it seems that it should
be fixed in unstable.

Signed-off-by: secwall <[email protected]>
Signed-off-by: proost <[email protected]>
`sds` is a typedef of `char *`.

`const sds` means `char * const`, i.e. a const-pointer to non-const
content.

More often, you would want `const char *`, i.e. a pointer to
const-content. Until now, it's not possible to express that. This PR
adds `const_sds` which is a pointer to const-content sds.

To get a const-pointer to const-content sds, you can use `const
const_sds`.

In this PR, some uses of `const sds` are replaced by `const_sds`. We can
use it more later.

Fixes valkey-io#1542

---------

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: proost <[email protected]>
Add `paused_actions` and `paused_timeout_milliseconds` for INFO Clients
to inform users about if clients are paused.

---------

Signed-off-by: zhaozhao.zz <[email protected]>
Signed-off-by: proost <[email protected]>
Adds filter options to CLIENT LIST:

    * USER <username>
      Return clients authenticated by <username>.
    * ADDR <ip:port>
      Return clients connected from the specified address.
    * LADDR <ip:port>
      Return clients connected to the specified local address.
    * SKIPME (YES|NO)
      Exclude the current client from the list (default: no).
    * MAXAGE <maxage>
      Only list connections older than the specified age.

Modifies the ID filter to CLIENT KILL to allow multiple IDs

    * ID <client-id> [<client-id>...]
      Kill connections by client ids.

This makes CLIENT LIST and CLIENT KILL accept the same options.

For backward compatibility, the default value for SKIPME is NO for
CLIENT LIST and YES for CLIENT KILL.

The MAXAGE comes from CLIENT KILL, where it *keeps* clients with the
given max age and kills the older ones. This logic becomes weird for
CLIENT LIST, but is kept for similary with CLIENT KILL, for the use case
of first testing manually using CLIENT LIST, and then running CLIENT
KILL with the same filters.

The `ID client-id [client-id ...]` no longer needs to be the last
filter. The parsing logic determines if an argument is an ID or not
based on whether it can be parsed as an integer or not.

Partly addresses: valkey-io#668

---------

Signed-off-by: Sarthak Aggarwal <[email protected]>
Signed-off-by: proost <[email protected]>
Just like spell-check workflow, we should allow to trigger it
in push events, so that the forks repo can notice the format
thing way before submitting the PR.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
…d other commands(valkey-io#1517)

Some commands that use unix-time, such as `EXPIREAT` and `SET EXAT`, should include the deleted keys in the `expired_keys` statistics if the specified time has already expired, and notifications should be sent in the manner of expired.

---------

Signed-off-by: Ray Cao <[email protected]>
…1312)

This commit creates a new compilation unit for the scripting engine code
by extracting the existing code from the functions unit.
We're doing this refactor to prepare the code for running the `EVAL`
command using different scripting engines.

This PR has a module API change: we changed the type of error messages
returned by the callback
`ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc` to be a
`ValkeyModuleString` (aka `robj`);

This PR also fixes valkey-io#1470.

---------

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: proost <[email protected]>
hpatro and others added 3 commits January 17, 2025 23:30
…1563)

This issue affected only two message types (CLUSTERMSG_TYPE_PUBLISH and CLUSTERMSG_TYPE_PUBLISHSHARD) because they used a light message header, which caused the CLUSTER INFO stats to miss sent/received message information for those types.

---------

Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Co-authored-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
When processing a cluster bus PING extension, there is a memory leak
when adding a new key to the `nodes_black_list` dict. We now make sure
to free the key `sds` if the dict did not take ownership of it.

Signed-off-by: Pierre Turin <[email protected]>
Signed-off-by: proost <[email protected]>
Update comments and log message in `cluster_legacy.c`.

Follow-up from valkey-io#1441.

Signed-off-by: Pierre Turin <[email protected]>
Co-authored-by: Ping Xie <[email protected]>
Co-authored-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
@proost proost force-pushed the feat-keyspace-expiration-misses branch from e3452dd to e9a9894 Compare January 17, 2025 14:32
…eat-keyspace-expiration-misses

Signed-off-by: proost <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NEW] keyspace_expiration_misses stats