Skip to content

Fix - freeze when reading stdout from external process #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 25, 2019

Conversation

grungy-ado
Copy link
Contributor

In one project I encounter with random freezing behavior when reading stdout from spawned external process. I start to suspect uvloop because using pure asyncio loop does not cause this problem.
Read from stdout was provided by external library that spawn new process using low level popen() call that seems to create new process by forking parent process. Debugger reveals that forked child process is hanged by trying to acquire GIL (that was strange because external process has nothing to do with python). Uvloop seems to register __atfork_child interceptor that add some logic in forking behavior that require GIL to be acquired. That make sense when fork is provided by uvloop, but not so much when fork is provided by anybody else.

I manage to replicate the problem in test case and hopefully fix it.

@pvizeli
Copy link

pvizeli commented Sep 23, 2019

Should also fix the issue of #118

@grungy-ado
Copy link
Contributor Author

grungy-ado commented Sep 25, 2019

Should also fix the issue of #118

If you are willing to provide some test case I can look at it. Probably in separate PR.

/* Auxiliary function to call global fork handler if defined.

Note: Fork handler needs to be in C (not cython) otherwise it would require
GIL to be present, but some forks can exec non-python processes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow... The old handler had a nogil decoration, i.e. cdef void __atfork_child() nogil:. How's this indirection makes that different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is small but significant difference in case you compile it with cython and nogil traces enabled (-DCYTHON_TRACE=1 -DCYTHON_TRACE_NOGIL=1).
In this mode cython will insert trace calls during compilation and resulting compiled __atfork_child function will contain __Pyx_TraceCall. It is not a problem when fork contains python runtime, but it will freeze at this trace call in case forking child does not contains python runtime.

Traces are enabled in make debug, so I decide to write it in pure C where no such traces are inserted by cython.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so that I understand the scope: this PR is only needed to unbreak make debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, on the contrary, this PR should fix freezing uvloop (under some conditions) and freezing occurs even when compiled with pure make (without debug), but for a different reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll review the PR soon.

In the meantime, could you please squash all commits into 1 and write a commit message that explains the bug/problem in a detailed way? It would help me / future maintainers a lot.

Huge props for coming up with a functional test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I squash it. Give me a note if there is something not clear enough.

@1st1 1st1 requested a review from elprans October 21, 2019 17:32
Copy link
Member

@elprans elprans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why this change would fix the GIL issue. You are still calling the original __get_fork_handler which still grabs the GIL. The fact that you've wrapped the call into a C function shouldn't change that. What would make sense is to avoid installing __atfork_child globally, and only for the period when the loop is doing the fork.

@grungy-ado
Copy link
Contributor Author

I don't fully understand why this change would fix the GIL issue. You are still calling the original __get_fork_handler which still grabs the GIL. The fact that you've wrapped the call into a C function shouldn't change that.

The fact that I've wrapped the call into a C function is the crucial part because decision if __get_fork_handler should be called or not (uvloop/includes/fork_handler.h:12) is in C function where no GIL is even present. That means that if forking is done by uvloop it is save to acquire GIL and call __get_fork_handler. On the other hand if forking is NOT provided by uvloop this condition is false, no python/cython function handler is called and child process continues uninterrupted.

In current implementation this decision is located inside cython function __atfork_child that always assume some python runtime is present and so before it make this crucial decision it tries to acquire GIL ( uvloop/loop.pyx:3125). But there are situations where you can fork and execute child process that has no python (means no GIL) as shown in provided test case (tests/test_process_spawning.py) where /bin/echo is being forked and freeze because trying to call __atfork_child without any python runtime that can acquire GIL. You can notice this if you run provided test case in gdb.

Now you may assume that solution can be as simple as mark __atfork_child as nogil and remove all GIL acquisitions, but that is not true either because if you use cython function and compile it in debug mode, cython will insert trace calls in each cdef function. In this solution you get into the same situation as before because inserted trace calls also assume that there is python runtime and sometimes also acquire GIL.

I hope my description is now more clear why this implementation should avoid using any cython function as a fork handler and why C function is much better choice.

What would make sense is to avoid installing __atfork_child globally, and only for the period when the loop is doing the fork.

I totally agree but currently pthread_atfork system call is used to register handler globally and there is no way (known to me) how to remove this handler. So the solution is to use fork handler to decide when loop is doing the fork. If you know about some other suitable system call that is capable to add and remove fork handler let me know and I will change it.

…ork and execute non-python process

Problem:
Uvloop for each loop register `atFork` handler that is called after fork is executed by forked child.
This handler works fine when fork was invoked by uvloop. In case fork is invoked by something else (such as external library)
uvloop freeze in this handler because:
- GIL is acquired inside `atFork` handler -> in case forked child does not contain python runtime `atFork` handler freeze at obtaining GIL
- when compiled in debug mode (`make debug`) cython trace calls are inserted inside compiled `atFork` handler -> in case forked child does not contain python runtime `atFork` handler freeze at providing trace call

Solution:
This fix solve described problems by implementing `atFork` handler as C function so that forked child can call it safely whether or not contains python runtime.
@1st1
Copy link
Member

1st1 commented Oct 22, 2019

@grungy-ado I'm trying to run the unit test you added with unmodified uvloop on Linux and macOS and it doesn't fail. Any advice on how to make sure that the unit test actually works?

@grungy-ado
Copy link
Contributor Author

@1st1 That's strange because it fails for me on my ubuntu 18.04LTS machine. It even failed in travis when I push only the test case #283 : https://travis-ci.org/MagicStack/uvloop/builds/601677156?utm_source=github_status&utm_medium=notification . Though it seems like linux issue only. There is no fail for OSX.

@pvizeli
Copy link

pvizeli commented Oct 23, 2019

We had this issue also only on linux

@elprans
Copy link
Member

elprans commented Oct 23, 2019

there is no way (known to me) how to remove this handler

You're right, there seems to be no (portable) way to do this.

decision if __get_fork_handler should be called or not (uvloop/includes/fork_handler.h:12) is in C function where no GIL is even present

Yes, I see it now. I misread the diff initially. This makes sense, although it strikes me that those __forking_ globals should really be thread-local.

That's strange because it fails for me on my ubuntu 18.04LTS machine

I can reproduce on Ubuntu 18.04 and Debian Buster, but not on other distros like Gentoo or Alpine or even Ubuntu 19.10, so this definitely also depends on the version of libc.

@pvizeli
Copy link

pvizeli commented Oct 24, 2019

but not on other distros like Gentoo or Alpine

It's possible if you run ffmpeg and also only random on Alpine.

@1st1 1st1 merged commit fde5d14 into MagicStack:master Oct 25, 2019
@1st1
Copy link
Member

1st1 commented Oct 25, 2019

Thanks so much for your patience and help, @grungy-ado. Merged. I'll do some cosmetic changes & refactorings but your approach is good.

@grungy-ado
Copy link
Contributor Author

Great, thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants