-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Budgeting with a conditional check against an always ready stream blocks the condition #2542
Comments
I guess |
That doesn't fix any generic combinators from outside the Tokio ecosystem. |
If you put an infinite loop into the async fn, there is not really much Tokio can do? If you block the executor, the executor is blocked. This is exactly the issue that coop was introduced to help with. |
Hmm, I guess coop does interact here, but still, this is broken for the same reason that you should not use |
I'm not 100% sure what the best strategy is here. Leaving
At this point, resources external from Tokio are expected to provide their own budgeting strategies. |
For reference, the example is: use tokio::time::{self, Duration};
async fn some_async_work() {
// do work
}
#[tokio::main]
async fn main() {
let mut delay = time::delay_for(Duration::from_millis(50));
while !delay.is_elapsed() {
tokio::select! {
_ = &mut delay, if !delay.is_elapsed() => {
println!("operation timed out");
}
_ = some_async_work() => {
println!("operation completed");
}
}
}
} |
One option that I can think of is combinators could check to ensure that if |
I wonder how much of a problem this is in practice? How many futures running on a tokio runtime contain no calls into tokio resources? I think I agree with the analysis above that this is really a non-future-friendly busy loop in disguise, and is a disjoint problem from the one coop was added to solve. That said, of course, it would be nice to be able to do the right thing in as many cases as we can, and maybe coop can be used to help with this particular case. It's a weird one though; the user is selecting in a loop over a future that is always ready and polls no resource. To me, this seems like something that wouldn't happen in real code. I feel like either the future would do something (in which case the problem goes away) or they wouldn't be using Or, phrased differently, do we think this particular construct ( |
Well, if you want a more realistic case where this would happen, consider this: use tokio::time::{self, Duration};
use futures::channel::mpsc::channel;
use futures::stream::StreamExt;
#[tokio::main]
async fn main() {
let mut delay = time::delay_for(Duration::from_millis(50));
let (send, recv) = channel::<()>(10);
drop(send);
let mut recv = recv.fuse();
loop {
tokio::select! {
_ = &mut delay => {
println!("operation timed out");
break;
}
msg = recv.next() => {
if let Some(()) = msg {
println!("operation completed");
}
}
}
}
} I just had someone make a similar mistake over on Discord, albeit that was with futures' |
Thanks, that example is helpful. I feel like it only goes to prove my point though. If the user uses |
So, the context in which I noticed this was while trying to write an example of what would happen with some utilities during starvation for rust-lang/futures-rs#2135. That resulted in me noticing that this would hang: let stream = futures::stream::select_all(vec![
futures::stream::repeat(0).boxed(),
tokio::time::interval(Duration::from_millis(10)).map(|_| 1).boxed(),
tokio::time::interval(Duration::from_millis(100)).map(|_| 2).boxed(),
]);
let mut counts = [0, 0, 0];
stream
.take_until(tokio::time::delay_for(Duration::from_secs(1)))
.for_each(|val| { counts[val] += 1; async {} }).await; Which then lead to rust-lang/futures-rs#2157, and eventually this as I was trying to find an example using only Tokio utilities ( |
These would all run endlessly with and without budgeting, right? The idea is that budgeting improve fairness in some situations, but it can not fix all of these. In the synchronous world we would call those things deadlocks (or livelocks), and there is no automagical mitigation against those. The examples could all be "fixed" by running the timer off another thread than the current executor thread. But I don't think that's desirable, since it will decrease performance. async/await is about reaching highest performance and thereby explicitly opting into a harder cooperative programming model. If you want that - you have to live within the constraints of the model. I actually don't think the described issues are too bad - you will immediately observe them during a debug run. What would be far more painful is if an app would run into an endless blocking situation somewhere later during normal operation. Do we have any examples where this could happen? Obviously we can easily build a contrived example, but that isn't helpful to determine how serious the issue might be. I also think there might be some potential for new runtime diagnostics in tokio, which could e.g. detect blocked executor threads and tasks. That could at least help to point out these issues in a live application. |
Without budgeting, polling the timer after the timeout has expired returns I guess the idea is that budgeting would make it more comparable to a while now() < timeout {
tokio::task::yield_now().await;
} so at least other tasks could run, whenever it yields. |
That will depend on how the timer is implemented. It might only change it's state if the timer handler is running inside the eventloop - which won't happen if the code never yields to the loop. It might be currently implemented to also check the time on each |
Yes of course, and in some sense that is what we are experiencing here. If you run it with pre-coop Tokio, it does complete, because it happened to have such an implementation back then. |
I'm definitely conflicted here. I'm not opposed to finding a way to have |
No, |
Ah, sorry, yes, you're right. I think the issue here then is actually with I think it's important to stress here that I don't think we should place the onus on implementors of |
Is there any action to take here? |
Honestly, I think the action here is to standardize |
I disagree that forcing periodic yields is the way this should be solved. Periodically yielding means the entire call frame has to unwind all the way to the executor and back every time. Currently I hit this problem when I was using a fn poll(self, cx) {
loop {
if let Ready(()) = self.timeout.poll(cx) { // self.timeout is a Sleep
self.timeout.reset(...);
yield_now().poll(cx);
return Poll::Pending;
}
let work = ready!(self.stream_of_work.poll_next(cx));
process(work);
}
} This was in a single-threaded executor, and under program load I would like to have a way of being in control of when the cooperative-yielding happens rather than letting tokio's budget system handle it. Making the budget system public doesn't help either - my code can encode budgeting strategies that suit my needs rather than just a single number that tokio's system uses. Alternatively, if every timer checking for elapsed-ness on every poll is a perf concern, it would be nice to have some way the user can drive the timer wheel forward from within the user task. Eg given some |
@Arnavion It seems like your use-case would be better served by storing a |
No. I simplified the example a bit too much perhaps, but there's more work being done with the timeout elapses than just yielding and resetting the timer. So there's a reason to have the timer register a waker when |
Version
0.2.20
Platform
playground
Description
Taking the first correct example from the
tokio::select!
docs and running it in the playground results in an infinite hang (playground with println commented out).The tested code includes an additional
time::delay_for(Duration::from_millis(10)).await;
bit of work in thesome_async_work
which avoids this issue.This has the same underlying problem as rust-lang/futures-rs#2157, the budgeting is only applied to the conditional and blocks it from ever becoming true, while the actual work continues running because it is not subject to the budgeting.
The text was updated successfully, but these errors were encountered: