Re: [RFC] Pipe Operator (again)

From: Date: Fri, 04 Apr 2025 04:36:34 +0000
Subject: Re: [RFC] Pipe Operator (again)
References: 1 2 3 4 5 6  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
On Thu, Apr 3, 2025, at 4:06 PM, Rowan Tommins [IMSoP] wrote:
> On 03/04/2025 18:06, Larry Garfield wrote:
>> So if we expect higher order functions to be common (and I would probably mainly use them
>> myself), then it would be wise to figure out some way to make them more efficient.  Auto-first-arg
>> is one way. 
>
> From this angle, auto-first-arg is a very limited compiler optimisation 
> for partial application. 

I'd say it has the dual benefit of optimization and ergonomics.  (Though see discussion below.)

> With PFA and one-arg-callable pipes, you could add a parser rule that 
> matches this, with the same output:
>
> $foo |> bar(?, $baz);
>
> But you'd also be able to do this:
>
> $baz |> bar($foo, ?);
>
> And maybe the compiler could optimise that case too.

From what Arnaud has told me, any PFA that has a single, fixed-position-number argument remaining
should be optimizable.  (Though that's a task for whenever PFA is next worked on, if it is next
worked on.)

> Neither helps with the performance of higher order functions which are 
> doing more than partial application, like map and filter themselves. I 
> understand there's a high cost to context-switching between C and PHP; 
> presumably if there was an easy solution for that someone would have 
> done it already.

> On 03/04/2025 18:39, Ilija Tovilo wrote: 
>> To me, pipes improve readability when they behave like methods, i.e.
>> they perform some operation on a subject. This resembles Swift's
>> protocol extensions or Rust's trait default implementations, except
>> using a different "method" call operator. 
>> [...]
>> If we decide not to add an iterator API that works well with
>> first-arg, then I agree that this is not the right approach. But if we
>> do, then neither of your examples are problematic. 
>
>
> I guess those two things go together quite well as a mental model: 
> pipes as a way to implement extension methods, and new functions 
> designed for use as extension methods.
>
> I think I'd be more welcoming of it if we actually implemented 
> extension methods instead of pipes, and then the new iterator API was 
> extension-method-only. It feels less like "one of the arguments is 
> missing" if that argument is *always* expressed as the left-hand side 
> of an arrow or some sort.

As I've noted, classic pipes (current RFC, unary function only) and extension functions are not
mutually exclusive, and I see no reason we couldn't add both.  Auto-partialing first-arg pipes
and dedicated extension functions step on each other's toes a bit more, however.

To address both this and Ilija's email, I was toying with extension functions as a concept a
while back.  I also did extensive research into "collections" in other languages last year
with Derick.  (See discussion in a previous PHP Foundation report[1]).  That led me to a number of
conclusions that I still hold to:

* A new iterable API is absolutely a good thing and we should do it.
* That said, we *need* to split Sequence, Set, and Dictionary into separate types.  We are the only
language I reviewed that didn't have them as separate constructs with their own APIs.
* The use of the same construct (arrays and iterables) for all three types is a fundamental and core
flaw in PHP's design that we should not double-down on.  It's ergonomically awful,
it's bad for performance, and it invites major security holes.  (The "Drupageddon"
remote exploit was caused by using an array and assuming it was sequential when it was actually a
map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable
$it, callable $fn) style functions would not be the right way to do it.  That would be easy, but
also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which
type you're dealing with.  Whether the input is lazy or not is the least of the concerns.  The
bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and
certainly never in Set (as there are no meaningful keys).  Similarly, when filtering a Dict, you
would want keys preserved.  When filtering a Seq, you'd want the indexes re-zeroed.  (Or to
seem like it, given or take implementation details.)  And then, yes, there's the laziness
question.

So we'd effectively want three different versions of map(), filter(), etc. if we didn't
want to perpetuate and further entrench the design flaw and security hole that is "sequences
and hashes are the same thing if you squint."  And... frankly I'd probably vote against an
interable/collections API that didn't address that issue.

However, a simple "first arg" pipe wouldn't allow for that.  Or rather, we'd
need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and
dictMap(iterable $it, callable $fn).  And the same split for filter, and probably a few other
things.  That seems ergonomically suspect, at best, and still wouldn't really address the issue
since you would have no way to ensure you're using the "right" version of each
function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the
other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make
easy is... probably not the iterable API we want anyway.  There may well be other cases for
Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this
form.

Which brings us then to extension functions.  Pipes and higher order functions, or first-arg pipes,
can act as a sort of "junior" extension functions, but for the reasons listed above fall
short of being real extension functions.

For comparison, extension functions in Kotlin look like this:

fun SomeType.foo(a: Int) {
  // a is a variable. "this" is the SomeType the function was called on.
  // However, this is still "external" scope so only public members are usable.
}

val s = SomeType()
s->foo(5)

(Kotlin doesn't have a "new" keyword; the above is how you instantiate an object.)

Arguably, Go is entirely built as extension functions. It looks like this:

func (st SomeType) foo(a int) {
  // st and a are both variables here.  Do as you will.
}

Notably for us, the same function can be defined multiple times against different types.  That
allows the system to differentiate between A.foo() and B.foo().  You can also attach extension
functions to interfaces.  In fact, most of Kotlin's collections (list, set, map) API is
implemented as extension functions on interfaces, of which they have many.

However, both Go and Kotlin are compiled languages, which means the compiler has a complete view of
the code at compile time, and can sort out which extension function to use in a given situation
statically.  That is, of course, not the case in PHP.

That means even if we figure out a way to define multiple foo() functions that apply to different
types, and can agree that doing so is not evil (some have argued it's too close to
function/method overloading, which they claim is evil; I disagree with both points), there is still
a very non-trivial task of figuring out how to resolve the function to call at runtime, probably
somehow leveraging autoloading, which also then runs us up against function autoloading, etc.  I
hope that is a solvable problem, but I don't currently know how to solve it.

So "real" extension functions are an epic unto themselves, even though I really really
want them.  (They are fantastically ergonomic for converting from one representation to another,
like from an ORM entity to a minimal struct to serialize as JSON, and vice versa.  I quite miss them
from Kotlin).

It would be really nice if we could follow Kotlin's example and build 3 different collection
types (likely via objects), and then build most of the API for them in extension functions rather
than as methods.  However, that sounds harder every time I dig into it.

As a side note to Yakov[2], a Uniform Function Call Syntax in PHP would have all the same problems
as extension functions, even before we get into the issue that Rowan, Tim, and others have brought
up that PHP is wildly inconsistent in having the "subject" first in a function call. 
Without that UFCS doesn't make much sense.  While I appreciate the elegance of it, in practice,
figuring out extension functions as a dedicated syntax (akin to Kotlin or Go above) is probably the
best we could do, if we can even do that.

All of which is to say... I think I may have talked myself back around to just using basic unary
function pipes and "suck it up" on the extra call for higher order functions for now,
unless someone can show a fair number of non-iterable use cases where it would be helpful.  That
then would unblock the other incremental improvements listed in the RFC (compose, PFA, and
$$->foo()).  True extension functions could then be explored later (likely by people with way
more engine knowledge than me) as their own thing, whether using ->, +>, or something else
entirely.  We just need to agree that the existence of pipes does not render extension functions
moot.

Thoughts?

--Larry Garfield

[1] https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
[2] https://externals.io/message/127037


Thread (38 messages)

« previous php.internals (#127038) next »