Skip to content

Conversation

@cloehle
Copy link
Contributor

@cloehle cloehle commented Sep 4, 2025

The new scx_bpf_cpu_curr() is queued for v6.18.
Use them instead of the soon deprecated scx_bpf_cpu_rq().

Error handling (on NULL) is kept as is for the schedulers

This needs the queued:
https://lore.kernel.org/lkml/[email protected]/

Copy link
Contributor

@htejun htejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, you need to port the compat macro too.

static inline bool is_cpu_idle(s32 cpu)
{
struct rq *rq = scx_bpf_cpu_rq(cpu);
struct task_struct *p = __COMPAT_scx_bpf_cpu_curr(cpu);
Copy link
Contributor

@arighi arighi Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add __COMPAT_scx_bpf_cpu_curr() https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/tree/tools/sched_ext/include/scx/compat.bpf.h?h=for-6.18#n238 to scheds/include/scx/compat.bpf.h or this won't build.

Copy link
Contributor Author

@cloehle cloehle Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I thought these were getting pulled in from the kernel source (at some point)?
Do we update both compat.bpf.h and common.h separately for kernel and scx repo?

Edit: Nevermind, I misunderstood the process, I've added the two commits (definitions and Andrea's compat helper) here now, i.e. this PR compiles and runs self-sufficient if the kernel includes the mentioned series.

return false;
}
return rq->curr->flags & PF_IDLE;
return !!(p->flags & PF_IDLE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we probably don't need !! here, since the result would be cast to bool anyway.

@cloehle cloehle force-pushed the cloehle/scx_bpf_cpu_rq-compat branch from 8d41e53 to 11bb34c Compare September 4, 2025 22:50
@multics69
Copy link
Contributor

Overall LGTM.

@cloehle cloehle force-pushed the cloehle/scx_bpf_cpu_rq-compat branch from 11bb34c to 9deff4b Compare September 4, 2025 23:32
@arighi arighi added this pull request to the merge queue Sep 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 6, 2025
Copy link
Contributor

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I'm a bit confused.

We have:

BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_RET_NULL | KF_RCU)

I thought the verifier would complain if we try to use scx_bpf_cpu_curr() outside of a bpf_rcu_read_lock/unlock() section, which is exactly what we're doing here. But the verifier is still happy and the kernel is (correctly) complaining:

[    5.938735] =============================
[    5.939878] WARNING: suspicious RCU usage
[    5.941002] sched_ext: BPF scheduler "cosmos_1.0.2_gd0e71ca_x86_64_unknown_linux_gnu_debug" enabled
[    5.943254] 6.17.0-rc1 #1-NixOS Not tainted
[    5.944477] -----------------------------
[    5.945636] kernel/sched/ext.c:6415 suspicious rcu_dereference_check() usage!

So we can fix this PR adding the proper bpf_rcu_read_lock/unlock() around __COMPAT_scx_bpf_cpu_curr(), but I'd like to understand why the verifier is happy.

@etsal do you have any idea? Thanks.

Copy link
Contributor

@multics69 multics69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It looks good to me.

@cloehle cloehle force-pushed the cloehle/scx_bpf_cpu_rq-compat branch 2 times, most recently from eb700fc to 5cd7c5f Compare September 7, 2025 21:12
@arighi
Copy link
Contributor

arighi commented Oct 28, 2025

Hi @cloehle we have all the required pieces upstream now, so I think we should definitely merge this, if you have time can you update this PR and check if everything is still up to date? Thanks!

@cloehle cloehle force-pushed the cloehle/scx_bpf_cpu_rq-compat branch from 5cd7c5f to 9de9b92 Compare October 29, 2025 11:40
@cloehle
Copy link
Contributor Author

cloehle commented Oct 29, 2025

Andrea, the updated PR should be correct now, although for the compat layer to actually work (detect scx_bpf_cpu_curr()) we need the new
vmlinux-v6.18.h
Should I add it?

@arighi
Copy link
Contributor

arighi commented Oct 30, 2025

Andrea, the updated PR should be correct now, although for the compat layer to actually work (detect scx_bpf_cpu_curr()) we need the new vmlinux-v6.18.h Should I add it?

@etsal was looking at updating the vmlinux files, maybe sync with him?

@etsal
Copy link
Contributor

etsal commented Oct 30, 2025

Andrea, the updated PR should be correct now, although for the compat layer to actually work (detect scx_bpf_cpu_curr()) we need the new vmlinux-v6.18.h Should I add it?

@etsal was looking at updating the vmlinux files, maybe sync with him?

Yeah I'll be regenerating the vmlinux.h to include up to 6.18, will push the diff today or tomorrow.

@cloehle
Copy link
Contributor Author

cloehle commented Nov 3, 2025

Any update on this @etsal ?
FWIW it can be simply hacked for testing obviously with something like:

--- a/scheds/vmlinux/arch/arm64/vmlinux-v6.16-g038d61fd6422.h
+++ b/scheds/vmlinux/arch/arm64/vmlinux-v6.16-g038d61fd6422.h
@@ -228459,6 +228459,7 @@ extern const struct cpumask *scx_bpf_get_online_cpumask(void) __weak __ksym;
 extern const struct cpumask *scx_bpf_get_possible_cpumask(void) __weak __ksym;
 extern void scx_bpf_kick_cpu(s32 cpu, u64 flags) __weak __ksym;
 extern u64 scx_bpf_now(void) __weak __ksym;
+extern struct task_struct *scx_bpf_cpu_curr(s32 cpu) __weak __ksym;
 extern u32 scx_bpf_nr_cpu_ids(void) __weak __ksym;
 extern u32 scx_bpf_nr_node_ids(void) __weak __ksym;
 extern s32 scx_bpf_pick_any_cpu(const struct cpumask *cpus_allowed, u64 flags) __weak __ksym;

I tested the schedulers I've touched, although some of them never seemed to call scx_bpf_cpu_rq() / __COMPAT_scx_bpf_cpu_curr() in my testing!

@etsal
Copy link
Contributor

etsal commented Nov 3, 2025

Any update on this @etsal ? FWIW it can be simply hacked for testing obviously with something like:

--- a/scheds/vmlinux/arch/arm64/vmlinux-v6.16-g038d61fd6422.h
+++ b/scheds/vmlinux/arch/arm64/vmlinux-v6.16-g038d61fd6422.h
@@ -228459,6 +228459,7 @@ extern const struct cpumask *scx_bpf_get_online_cpumask(void) __weak __ksym;
 extern const struct cpumask *scx_bpf_get_possible_cpumask(void) __weak __ksym;
 extern void scx_bpf_kick_cpu(s32 cpu, u64 flags) __weak __ksym;
 extern u64 scx_bpf_now(void) __weak __ksym;
+extern struct task_struct *scx_bpf_cpu_curr(s32 cpu) __weak __ksym;
 extern u32 scx_bpf_nr_cpu_ids(void) __weak __ksym;
 extern u32 scx_bpf_nr_node_ids(void) __weak __ksym;
 extern s32 scx_bpf_pick_any_cpu(const struct cpumask *cpus_allowed, u64 flags) __weak __ksym;

I tested the schedulers I've touched, although some of them never seemed to call scx_bpf_cpu_rq() / __COMPAT_scx_bpf_cpu_curr() in my testing!

Sorry for the delay, updating right now,

@etsal
Copy link
Contributor

etsal commented Nov 3, 2025

Any update on this @etsal ? FWIW it can be simply hacked for testing obviously with something like:

--- a/scheds/vmlinux/arch/arm64/vmlinux-v6.16-g038d61fd6422.h
+++ b/scheds/vmlinux/arch/arm64/vmlinux-v6.16-g038d61fd6422.h
@@ -228459,6 +228459,7 @@ extern const struct cpumask *scx_bpf_get_online_cpumask(void) __weak __ksym;
 extern const struct cpumask *scx_bpf_get_possible_cpumask(void) __weak __ksym;
 extern void scx_bpf_kick_cpu(s32 cpu, u64 flags) __weak __ksym;
 extern u64 scx_bpf_now(void) __weak __ksym;
+extern struct task_struct *scx_bpf_cpu_curr(s32 cpu) __weak __ksym;
 extern u32 scx_bpf_nr_cpu_ids(void) __weak __ksym;
 extern u32 scx_bpf_nr_node_ids(void) __weak __ksym;
 extern s32 scx_bpf_pick_any_cpu(const struct cpumask *cpus_allowed, u64 flags) __weak __ksym;

I tested the schedulers I've touched, although some of them never seemed to call scx_bpf_cpu_rq() / __COMPAT_scx_bpf_cpu_curr() in my testing!

Sorry for the delay, updating right now,

Hitting an issue with our scripts not properly generating vmlinux.h for the arm target on Fedora, I will just upload the rest of the archs for now.

The new functions provide a safer way to access a rq or a remote
rq->curr respectively. They are both part of v6.18.

See kernel commits:
e0ca169638be ("sched_ext: Introduce scx_bpf_locked_rq()")
20b158094a1a ("sched_ext: Introduce scx_bpf_cpu_curr()")

Signed-off-by: Christian Loehle <[email protected]>
Introduce a compatibility helper that allows BPF schedulers to use
scx_bpf_cpu_curr() on older kernels.

See kernel commit
20b158094a1a ("sched_ext: Introduce scx_bpf_cpu_curr()")

Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Christian Loehle <[email protected]>
@cloehle cloehle force-pushed the cloehle/scx_bpf_cpu_rq-compat branch from 9de9b92 to e3e914b Compare November 4, 2025 09:26
@cloehle
Copy link
Contributor Author

cloehle commented Nov 4, 2025

Thank you Emil!
So I've rebased the patches and retested them, should be all good.
Note that I couldn't actually reach the scx_bpf_cpu_curr compat header call in my testing of
flash, layered
So only
lavd, tickless, cosmos are really tested.

@mati865
Copy link
Contributor

mati865 commented Nov 5, 2025

#2734 (comment) still isn't addressed or replied.
(the comment about !!)

@cloehle
Copy link
Contributor Author

cloehle commented Nov 5, 2025

@mati865 Oops, sorry about that, I'll fix it right away.

Use the new scx_bpf_cpu_curr() introduced in v6.18 as a safer way to
access rq->curr instead of the deprecated scx_bpf_cpu_rq().

Signed-off-by: Christian Loehle <[email protected]>
Use the new scx_bpf_cpu_curr() introduced in v6.18 as a safer way to
access rq->curr instead of the deprecated scx_bpf_cpu_rq().

Signed-off-by: Christian Loehle <[email protected]>
Use the new scx_bpf_cpu_curr() introduced in v6.18 as a safer way to
access rq->curr instead of the deprecated scx_bpf_cpu_rq().

Signed-off-by: Christian Loehle <[email protected]>
Use the new scx_bpf_cpu_curr() introduced in v6.18 as a safer way to
access rq->curr instead of the deprecated scx_bpf_cpu_rq().

Signed-off-by: Christian Loehle <[email protected]>
Use the new scx_bpf_cpu_curr() introduced in v6.18 as a safer way to
access rq->curr instead of the deprecated scx_bpf_cpu_rq().

Signed-off-by: Christian Loehle <[email protected]>
@cloehle cloehle force-pushed the cloehle/scx_bpf_cpu_rq-compat branch from e3e914b to 6a9363a Compare November 5, 2025 08:42
@cloehle
Copy link
Contributor Author

cloehle commented Nov 14, 2025

Gentle ping on this!

Copy link
Contributor

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my stuff this looks good, I think we can merge it.

@multics69 @hodgesds what do you think about the changes to lavd and layered?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants