-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
At/optimizations #135
base: 1.x
Are you sure you want to change the base?
At/optimizations #135
Conversation
// | ||
//go:linkname CompressCtx ZSTD_compressCCtx | ||
//go:noescape | ||
func CompressCtx(ctx *C.ZSTD_CCtx, dst unsafe.Pointer, dc C.size_t, src unsafe.Pointer, sc C.size_t, level C.int) C.size_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea 😅 .
I haven't verified this, but if this completely bypasses the runtime's cgo machinery, I could imagine a number of bad things to happen:
- Long GC STW pauses because the runtime is unable to preempt goroutines that are currently in a zstd call.
- The runtime incorrectly trying to do async preemption on a goroutine running C code, which might cause data corruption, crashes, etc.
- All Ps (GOMAXPROCs) calling zstd, causing significant latency on other goroutines.
Generally speaking, I'd says there is no presumption of innocence here. This is code likely to by guilty until we can proof otherwise 🙈.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these specific concerns! I'd definitely like to understand this more - obviously CGO exists for a reason 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not specific to cgo, but this talk explains the P-M-G abstraction in the scheduler and why/how non-go function calls (syscalls/cgo - they are treated mostly the same) require special treatment: https://youtu.be/-K11rY57K7k?t=980&si=wFEQWUPe3NayCjBh
tl;dr: a syscall (or cgo call) might block off-cpu (network, sleep, etc.), so if a lot of goroutine would make such calls, it would cause a lot of wasted CPU. Therefor cgo/syscalls get special treatement by the runtime so they don't end up blocking other goroutines from executing.
I can try to explain his a bit more when I find time, but this might be a good starting point for finding articles on the subject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remembered that I had this bookmarked which you'll probably find interesting: https://words.filippo.io/rustgo/
I also realized I forgot to mention one of the most important issues (that might explain the crashes you get in CI) 😅:
C requires large immovable stacks. When using cgo, the Go runtime takes care of switching from a goroutine stack (which is small and movable) to a system stack (which is large and immovable).
Without this switch, you're running C code on a goroutine stack, which will lead to stack overflows and other nasty problems.
So yeah, I think this is likely a dead-end 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't verified this, but if this completely bypasses the runtime's cgo machinery
In addition to the issues you already mentioned, this is also bypassing the ABI translation. Go has an internal calling convention which is different than the C calling convention, and is not guaranteed to be stable across releases.
I can cook up examples that break on my laptop, e.g. make a C function which takes 9 parameters and call it this way. On my arm64 laptop, the 9th argument is junk since the C and Go ABIs use different registers after the 8th argument.
@@ -1,6 +1,7 @@ | |||
package zstd | |||
|
|||
/* | |||
#cgo CFLAGS: -O3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 If we aren't already compiling (our external?) zstd library with -O3, this is probably a good idea. But I don't know if that's the case.
Testing some ideas to reduce the overhead of the Go wrapper:
It looks like linkname may not be reliable (panics in some tests?), and -o 3 may not be relevant with external libzstd?
Benchmark results
M1 max (macbook pro)
Without -o 3
With -o 3