Re: PHP True Async RFC

From: Date: Sun, 02 Mar 2025 14:08:35 +0000
Subject: Re: PHP True Async RFC
References: 1  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message


On Sat, Mar 1, 2025, at 10:11, Edmond Dantes wrote:
> Good day, everyone. I hope you're doing well.
> 
> I’d like to introduce a draft version of the RFC for the True Async component.
> 
> https://wiki.php.net/rfc/true_async
> 
> I believe this version is not perfect and requires analysis. And I strongly believe that things
> like this shouldn't be developed in isolation. So, if you think any important (or even minor)
> aspects have been overlooked, please bring them to attention.
> 
> The draft status also highlights the fact that it includes doubts about the implementation and
> criticism. The main global issue I see is the lack of "future experience" regarding how
> this API will be used—another reason to bring it up for public discussion.
> 
> Wishing you all a great day, and thank you for your feedback!
> 

Hey Edmond:

I find this feature quite exciting! I've got some feedback so far, though most of it is for
clarification or potential optimizations:

> A PHP developer *SHOULD NOT* make any assumptions about the order in which Fibers will be
> executed, as this order may change or be too complex to predict.

There should be a defined ordering (or at least, some guarantees). Being able to understand what
things run in what order can help with understanding a complex system. Even if it is just a vague
notion (user tasks are processed before events, or vice versa), it would still give developers more
confidence in the code they write. You actually mention a bit of the order later (microtasks happen
before fibers/events), so this sentence maybe doesn't make complete sense.

Personally, I feel as though an async task should run as though it were a function call until it
hits a suspension. This is mostly an optimization though (C# does this), but it could potentially
reduce overhead of queueing a function that may never suspend (which you mention as a potential
problem much later on):

Async\run(*function*() {
 
   $fiber = Async\async(*function*() {
       sleep <http://www.php.net/sleep>(1); // this
gets enqueued now
       return "Fiber completed!";
   });
 
   *// Execution is paused until the fiber completes*
   $result = Async\await($fiber); // immediately enter $fiber without queuing
 
   echo $result . "*\n*";
 
   echo "Done!*\n*";
});

> Until it is activated, PHP code behaves as before: calls to blocking functions will block the
> execution thread and will not switch the *Fiber* context. Thus, code written without the *Scheduler*
> component will function exactly the same way, without side effects. This ensures backward
> compatibility.

I'm not sure I understand this. Won't php code behave exactly the same as it did before
once enabling the scheduler? Will libraries written before this feature existed suddenly behave
differently? Do we need to worry about the color of functions because it changes the behavior?

> True Async prohibits initializing the Scheduler twice.

How will a library take advantage of this feature if it cannot be certain the scheduler is running
or not? Do I need to write a library for async and another version for non-async? Or do all the
async functions with this feature work without the scheduler running, or do they throw a catchable
error?

> This is crucial because the process may handle an OS signal that imposes a time limit on
> execution (for example, as Windows does).

Will this change the way os signals are handled then? Will it break compatibility if a library uses
pcntl traps and I'm using true async traps too? Note there are several different ways (timeout)
signals are handled in PHP -- so if (per-chance) the scheduler could always be running, maybe we can
unify the way signals are handled in php.

> Code that uses *Resume* cannot rely on when exactly the *Fiber* will resume execution.

What if it never resumes at all? Will it call a finally block if it is try/catched or will execution
just be abandoned? Is there some way to ensure cleanup of resources? It should probably mention this
case and how abandoning execution works.

> If an exception is thrown inside a fiber and not handled, it will stop the Scheduler and be
> thrown at the point where Async\launchScheduler() is called.

The RFC doesn't mention the stack trace. Will it throw away any information about the inner
exception?

> The *Graceful Shutdown* mode can also be triggered using the function:

What will calling exit or die do?

> A concurrent runtime allows handling requests using Fibers, where each Fiber can process its
> own request. In this case, storing request-associated data in global variables is no longer an
> option.

Why is this the case? Furthermore, if it inherits from the fiber that started its current fiber,
won't using Resume/Notifier potentially cause problems when used manually? There are examples
over the RFC using global variables in closures; so do these examples not actually work? Will
sharing instances of objects in scope of the functions break things? For example:

Async\run($obj->method1(...));
Async\run($obj->method2(...));

This is technically sharing global variables (well, global to that scope -- global is just a scope
after all) -- so what happens here? Would it make sense to delegate this fiber-local storage to
user-land libraries instead?

> Objects of the Future class are high-level patterns for handling deferred results.
> 

By this point we have covered FiberHandle, Resume, and Contexts. Now we have Futures? Can we
simplify this to just Futures? Why do we need all these different ways to handle execution?

> A channel is a primitive for message exchange between Fibers.

Why is there an isEmpty and isNotEmpty function? Wouldn't
!$channel->isEmpty() suffice?

It's also not clear what the value of most of these function is. For example:

if ($chan->isFull()) {
  doSomething(); // suspends at some point inside? We may not know when we write the code.
  // chan is no longer full, or maybe it is -- who knows, but the original assumption entering this
branch is no longer true.
  ...
}

Whether a channel is full or not is not really important, and if you rely on that information, this
is usually an architectural smell (at least in other languages). Same thing with empty or writable,
or many others of these functions. You basically just write to a channel and eventually (or not,
which is a bug and causes a deadlock) something will read it. The entire point is to use channels to
decouple async code, but most of the functions here allow for code to become strongly coupled.

As for the single producer method, I am not sure why you would use this. I can see some upside for
the built-in constraints (potentially in a dev-mode environment) but in a production system,
single-producer bottlenecks are a real thing that can cause serious performance issues. This is
usually something you explicitly want to avoid.

> In addition to the send/receive methods, which suspend the execution of a
> Fiber, the channel also provides non-blocking methods: trySend,
> tryReceive, and auxiliary explicit blocking methods: waitUntilWritable and
> waitUntilReadable. 

It isn't clear what happens when trySend fails. Is this an error or does nothing? 

Thinking through it, there may be cases where trySend is valid, but more often than
not, it is probably an antipattern. I cannot think of a valid reason for tryReceive and
it's usage is most likely guaranteed to cause a deadlock in real code. For true multi-threaded
applications, it makes more sense, but not for single-threaded concurrency like this.

In other words, the following code is likely to be more robust, and not depend on execution order
(which we are told at the beginning not to do):

Async\run(*function*() {
    $channel = *new* Async\Channel();
 
    $reader = Async\async(*function*() *use*($channel) {
        while ($data = $channel->read() && $data !== NULL) {
            echo "receive: *$data**\n*";
        }
    });
 
    for ($i = 0; $i < 4; $i++) {
        echo "send: event data *$i**\n*";
        $data = $channel->send("event data *$i*");
    }
    
    $reader->cancel(); // clean up our reader
    // or
    $channel->close(); // will receive NULL I believe?
});

A trySend is still useful when you want to send a message but don't want to block
if it is full. However, this is going to largely depend on how long is has been since the developer
last suspended the current fiber, and nothing else -- thus it is probably an antipattern since it
totally depends on the literal structure of the code, not the structure of the program -- if that
makes sense.

> This means that trapSignal is not intended for “regular code” and should not
> be used “anywhere”.

Can you expand on what this means in the RFC? Why expose it if it shouldn't be used?

-----

I didn't go into the low level api details yet -- this email is already pretty long. But I
would suggest maybe thinking about how to unify Notifiers/Resume/FiberHandle/Future into a single
thing. These things are pretty similar to one another (from a developer's standpoint) -- a way
to continue execution, and they all offer a slightly different api.

I also noticed that you seem to be relying heavily on the current implementation to define behavior.
Ideally, the RFC should define behavior and the implementation implement that behavior as described
in the RFC. In other words, the RFC is used as a reference point as to whether something is a bug or
an enhancement in the future. There has been more than once where the list looks back at an old RFC
to try and determine the intent for discovering if something is working as intended or a bug. RFCs
are also used to write documentation, so the more detailed the RFC, the better the documentation
will be for new users of PHP.

— Rob


Thread (110 messages)

« previous php.internals (#126545) next »