Skip to content

Enable ProactorEventLoop on windows for ipykernel#1469

Open
NewUserHa wants to merge 7 commits into
ipython:mainfrom
NewUserHa:patch-2
Open

Enable ProactorEventLoop on windows for ipykernel#1469
NewUserHa wants to merge 7 commits into
ipython:mainfrom
NewUserHa:patch-2

Conversation

@NewUserHa

Copy link
Copy Markdown
Contributor

see #1468

@ZupoLlask

Copy link
Copy Markdown

Hello @NewUserHa,

Thanks for your work here!

Dear @ianthomas23,

Is it possible to get this merged on the next release?

It is really important for Windows users, as current solution creates incompatibilities between libraries.

Thank you!

@ianthomas23

Copy link
Copy Markdown
Collaborator

@ZupoLlask It is not passing CI, so of course it will not be merged.

@ZupoLlask

Copy link
Copy Markdown

@NewUserHa Can you please have a look at this? TY

@NewUserHa

Copy link
Copy Markdown
Contributor Author

I have no idea how this occur. this modification works in previous versions of ipython

@NewUserHa NewUserHa force-pushed the patch-2 branch 3 times, most recently from 37bad2c to 903b031 Compare February 24, 2026 20:28
@NewUserHa

Copy link
Copy Markdown
Contributor Author

the tests/test_debugger.py::test_stop_on_breakpoint issue seems to be related to IOStream.flush timed out and pyzmq ioloop and proactoreventloop.
it may need help

@ianthomas23

Copy link
Copy Markdown
Collaborator

@NewUserHa There were some problems with the CI that I think I have now fixed, so can you rebase this on main to see what is still failing? I should be able to help next week with any further problems here.

@NewUserHa

Copy link
Copy Markdown
Contributor Author

PR modified:

  1. test_async.py: asyncio>asynclib. reason: was typo
  2. zmqshell.py‎: warnings.filterwarnings("ignore",. the test test_async has no issue before changing to proactor eventloop. reason: because when user command is autowait ..., it is ran within the asyncio loop of ipykernel. after changing to proactoreventloop, autowait trio trio will prompt it need to be set running with guest mode. but there's no a good way to set that by dynamically detecting the user commands, and it works fine without setting that guest mode, except reciving the prompts from trio. therefore the fix is disabling the warning.
  3. eventloops.py‎: blocking_poll(). reason: because ProactorEventLoop has no add_reader method and IOCP doesn't support ZMQ's FD directly, therefore use a thread to wait the poll() in a blocking way to make it work. the other solution is to warp the FD into a handle and use IocpProactor class 's handle wait_for_handle method to wait it, but it needs win32api to wrap. even though the latter method looks better but the former is more straightforward, so the fomer method is used.

current:

  1. the fix of def loop_asyncio(kernel): of eventloops.py seemes to have no relation with the failed tests. whether it's fixed, the tests still pass. the loop_asyncio() seems to be not used or not covered.
  2. the remained failed test test_stop_on_breakpoint, it seems that the debugpy did send the stopped msg, but the main loop of ipykernel seems to be unable to recive it because of the hread of the main loop of ipykernel is paused by debugpy. the IOStream.flush timed out can be seen after manually set the debug flag of ioPub class to True.
    but the selectoreventloop has no this issue.

@ZupoLlask

Copy link
Copy Markdown

Thank you @NewUserHa!

Let's wait and see if @ianthomas23 can help a bit here... 🙏

@ianthomas23

Copy link
Copy Markdown
Collaborator

Let's wait and see if @ianthomas23 can help a bit here... 🙏

I can, but it will be next week now.

@ianthomas23

Copy link
Copy Markdown
Collaborator

I've taken a look but I am not there is anything I can add here.

I notice that most of the eventloops testing is disabled on windows, e.g.

@windows_skip
def test_asyncio_loop(kernel):

so this has evidently been a problematic area in the past.

@NewUserHa

Copy link
Copy Markdown
Contributor Author

@windows_skip
def test_asyncio_loop(kernel):

it should have no problem, since the difference between proactor and selector is add_reader() only mostly.

However, the debugpy can't debug ipyknerel and the test, reporting No module named ipykernel_launcher. it may be a issue of the test codes.
can you take a look at it? so I can debug why the eventloop is stucked and unable to recieve the "stopped" msg from debugpy and cause the failure of the CI test.

And the ipykernel's class IOPub starts a thread for tornado's class IOloop, while tornado >6.1 version starts a thread for proactor on windows as well, while pyzmq uses tornado's AsyncIOloop to start a thread also.
can you explain it a little? how many thread ipykernel will start, and is it starting a thread for IOloop and another thread for zmq internally? could ipykernel use the tornado's new class AsyncIOloop, since the old IOloop class is decrepted in its class's doc string, and its new class in tornado is right for solving the proactorloop issue of lacking add_reader(), by using selector in a thread already

@BoykoNeov

Copy link
Copy Markdown

@NewUserHa I dug into the test_stop_on_breakpoint deadlock you flagged — here's the root cause and a fix. (This is the area you already suspected: "could ipykernel use … selector in a thread already" — yes, almost.)

Root cause — ipykernel's read side, not debugpy

On a failing run debugpy does hit the breakpoint and write the stopped event; ipykernel's idle control loop just never wakes to read it. Mechanism:

  • ProactorEventLoop has no native add_reader, so tornado drives a Proactor loop's zmq sockets through a helper "Tornado selector" thread that does select() then call_soon_threadsafe() to wake the loop. ipykernel's IOPub/control service loops each spawn one of these.
  • ipykernel marks its own service threads debugger-exempt (is_pydev_daemon_thread / pydev_do_not_trace) but not tornado's helper thread.
  • When debugpy suspends every thread at a breakpoint under sys.monitoring (Python ≥ 3.12, interpreter-global), that un-exempt helper freezes mid-wake — after select() returns but before call_soon_threadsafe() completes — so the control/debug read path never advances and the Proactor loop sits in the IOCP poll forever.
  • On 3.11 (sys.settrace, per-thread) the helper isn't frozen → no repro. Matches the CI matrix exactly: Win 3.10/3.11 pass, 3.12/3.13/3.14/pypy fail.

(Proven on 3.13 with faulthandler: the wedged kernel's stack shows the "Tornado selector" thread frozen inside call_soon_threadsafedo_wait_suspend; the 3.11 mirror shows it completing.)

Fix

Run the ipykernel service loops (control, IOPub, shell channel, subshells) on a SelectorEventLoop: it has a native add_reader, so no helper thread is spawned and there's nothing for the debugger to freeze. The main/user-code loop stays on Proactor, so your subprocess support (#1468) is preserved. Off-Windows it's a no-op (default loop is already selector-based). ~30 lines, 2 files, win32-guarded, no pydevd coupling.

Branch (ready to pull): https://github.com/BoykoNeov/ipykernel/tree/pr-1469 — single commit b142cae on top of your patch-2. Happy to open it as a PR into patch-2 if you'd like.

Result (Windows, local): full test_debugger.py on 3.13 Proactor goes from 5 breakpoint tests deadlocking (6 failed/5 passed, ~1014s) to 1 failed/10 passed, ~15s; identical on 3.11; test_kernel.py 24 passed/6 skipped; no new regressions.

One separate, minor thing (not fixed by the above — flagging it)

While bisecting I found test_attach_debug also flips under Proactor, but it's cosmetic, not a functional break. I verified the repl is fully functional under Proactor — a = 5 then a'5', a + 100'105', namespace persists. The only difference is the first repl evaluate's DAP result field: empty under Selector, the actual value under Proactor (Proactor just surfaces it one evaluate earlier — arguably more correct). The main loop stays Proactor by design, so this fix doesn't change it. You'll probably want to update test_attach_debug's == "" assertion as part of this PR — it currently encodes the Selector-timing result, and editing it against current main would break CI under Selector, so it belongs with the Proactor switch.

BoykoNeov added a commit to BoykoNeov/steel-sim that referenced this pull request Jun 18, 2026
…rect test_attach_debug

#1469 batch (this session):
- Fix A committed b142cae on pr-1469 (= #1469 head + 1 commit) and pushed to fork;
  root-cause+fix comment posted on ipython/ipykernel#1469 (offers a PR into patch-2).
  Standalone PR held: current main is Selector-everywhere, so a PR-to-main is a near-no-op;
  the #1469 comment is the load-bearing artifact (NewUserHa had asked for exactly this help).
- CORRECTION: test_attach_debug was wrongly logged as "NOT loop-related / debugpy-version
  artifact". Bisected same-box (origin/main Selector PASS vs pr-1469 Proactor FAIL, pure-#1469
  too): it IS a second #1469 Proactor regression, but COSMETIC not functional — a stateful repl
  probe shows namespace/computation work under Proactor; only the first repl evaluate's DAP
  result field differs (empty Selector / value Proactor). Out of Fix A scope (main stays Proactor).

Carried forward (uncommitted from prior batches):
- upstream-pr-filed: #1529 stream-send fix + regression test pushed, reply posted, title fixed.
- yield-case-depth-inversion-built: minor touch-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@NewUserHa

Copy link
Copy Markdown
Contributor Author

Thanks for your effort.
There are 3 issues:

Your commit changes all self.io_loop = IOLoop(make_current=False) to IOLoop(make_current=False, asyncio_loop=asyncio.SelectorEventLoop()), but IOLoop has no the asyncio_loop parameter, which is passed through anonymously and into AsyncIOLoop. Is this good to implement?

The BaseAsyncIOLoop of AsyncIOLoop in tornado already implemented
https://github.com/tornadoweb/tornado/blob/f491e4c1914be0ac6635a0eacb3c978d89eec4f1/tornado/platform/asyncio.py#L89-L106
can we re-ultilize that

Result (Windows, local): full test_debugger.py on 3.13 Proactor goes from 5 breakpoint tests deadlocking (6 failed/5 passed, ~1014s) to 1 failed/10 passed, ~15s; identical on 3.11; test_kernel.py 24 passed/6 skipped; no new regressions.

there is still a test failed, can you help take a look at that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants