Re: [RFC] Pipe Operator (again)

From: Ilija Tovilo Date: Tue, 08 Apr 2025 23:29:45 +0000

Subject: Re: [RFC] Pipe Operator (again)

References: 1 2 3 4 5 6 7 Groups: php.internals

Request: Send a blank email to [email protected] to get a copy of this message

Hi Larry

Sorry again for the delay.

On Fri, Apr 4, 2025 at 6:37 AM Larry Garfield <[email protected]> wrote:
>
> * A new iterable API is absolutely a good thing and we should do it.
> * That said, we *need* to split Sequence, Set, and Dictionary into separate types.  We are the
> only language I reviewed that didn't have them as separate constructs with their own APIs.
> * The use of the same construct (arrays and iterables) for all three types is a fundamental and
> core flaw in PHP's design that we should not double-down on.  It's ergonomically awful,
> it's bad for performance, and it invites major security holes.  (The "Drupageddon"
> remote exploit was caused by using an array and assuming it was sequential when it was actually a
> map.)
>
> So while I want a new iterable API, the more I think on it, the more I think a bunch of
> map(iterable $it, callable $fn) style functions would not be the right way to do it.  That would be
> easy, but also ineffective.
>
> The behavior of even basic operations like map and filter are subtly different depending on
> which type you're dealing with.  Whether the input is lazy or not is the least of the concerns.
>  The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq,
> and certainly never in Set (as there are no meaningful keys).  Similarly, when filtering a Dict, you
> would want keys preserved.  When filtering a Seq, you'd want the indexes re-zeroed.  (Or to
> seem like it, given or take implementation details.)  And then, yes, there's the laziness
> question.
>
> So we'd effectively want three different versions of map(), filter(), etc.. if we
> didn't want to perpetuate and further entrench the design flaw and security hole that is
> "sequences and hashes are the same thing if you squint."  And... frankly I'd probably
> vote against an interable/collections API that didn't address that issue.

I fundamentally disagree with this assessment. In most languages,
including PHP, iterators are simply a sequence of values that can be
consumed. Usually, the consumer should not be concerned with the data
structure of the iterated value, this is abstracted away through the
iterator. For most languages, both Sequences and Sets are translated
1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
Dictionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
is a bit different in that all iterators require a key. Semantically,
this makes sense for both Sequences (which are logically indexed by
the elements position in the sequence, so Sequence<T> => Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, U> =>
Iterator<T, U>). Sets don't technically have a logical key, but IMO
this is not enough of a reason to fundamentally change how iterators
work. A sequential number would be fine, which is also what yield
without providing a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.
https://3v4l.org/LvIjP

The big upside of treating all iterators the same, regardless of their
data source is 1. the code becomes more generic, you don't need three
variants of a value map() functions when the one works on all of them.
And 2. you can populate any of the data structures from a generic
iterator without any data shuffling.

$users
    |> Iter\mapKeys(fn($u) => $u->getId())
    |> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some
other key. Actually, it works for any Traversable. If mapKeys() only
applied to Dict iterators you would necessarily have to create a
temporary dictionary first, or just not use the iterator API at all.

> However, a simple "first arg" pipe wouldn't allow for that.  Or rather,
> we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn),
> and dictMap(iterable $it, callable $fn).  And the same split for filter, and probably a few other
> things.  That seems ergonomically suspect, at best, and still wouldn't really address the issue
> since you would have no way to ensure you're using the "right" version of each
> function.. Similarly, a dict version of implode() would likely need to take 2 separators, whereas
> the other types would take only one.
>
> So the more I think on it, the more I think the sort of iterable API that first-arg pipes would
> make easy is... probably not the iterable API we want anyway.  There may well be other cases for
> Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this
> form.

After having talked to you directly, it seemed to me that there is
some confusion about the iterator API vs. the API offered by the data
structure itself. For example:

> $l = new List(1,2, 3);
> $l2 = $l |> map(fn($x) => $x*2);
>
> What is the type of $l2? I would expect it to be a List, but there's currently
> no way to write a map() that statically guarantees that. (And that's before we
> get into generics.)

$l2 wouldn't be a List (or Sequence, to stick with the same
terminology) but an iterator, specifically Iterator<int, int>. If you
want to get back a sequence, you need to populate a new sequence from
the iterator using Iter\toSeq(). We may also decide to introduce a
Sequence::map() method that maps directly to a new sequence, which may
be more efficient for single transformations. That said, the nice
thing about the iterator API is that it generically applies to all
data structures implementing Traversable. For example, an Iter\max()
function would not need to care about the implementation details of
the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

> Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension
functions that are exclusively local, static and detached from the
type system are rather useless. Looking at an example:

> function PointEntity.toMessage(): PointMessage {
>     return new PointMessage($this->x, $this->y);
> }
>
> $result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,
there's arguably no benefit of $point->toMessage() over `$point |>
PointEntityExtension\toMessage()` (with an optional import to make it
almost as short). All the extension really achieves is changing the
syntax, but we would already have the pipe operator for this.
Technically, you can use such extensions for untyped, local
polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { ... }
function RectEntity.toMessage(): RectMessage { ... }

$entities = [new Point, new Rect];

foreach ($entities as $e) {
    $e->toMessage(); // Technically works, but the type system is
entirely unaware.
    takesToMessage($e); // This breaks, because Point and Rect don't
actually implement the ToMessage interface.
}

Where extensions would really shine is if they could hook into the
type system by implementing interfaces on types that aren't in your
control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { ... }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension
functions already would. I won't go into detail because this e-mail is
already too long, but I'm happy to discuss it further off-list. All
this to say, I don't think extensions will work well in PHP, but I
also don't think they are necessary for the iterator API.

Regards,
Ilija

Thread (42 messages)

Larry GarfieldFri, 07 Feb 2025 04:57:28 +0000
Oladoyinbo VincentFri, 07 Feb 2025 07:15:21 +0000
Eugene SidelnykFri, 07 Feb 2025 07:36:50 +0000
Larry GarfieldFri, 07 Feb 2025 08:16:10 +0000
Rob LandersFri, 07 Feb 2025 08:14:47 +0000
Tim DüsterhusFri, 07 Feb 2025 13:32:14 +0000
Larry GarfieldFri, 07 Feb 2025 21:04:11 +0000
Rob LandersFri, 07 Feb 2025 22:54:47 +0000
Larry GarfieldFri, 07 Feb 2025 23:19:18 +0000
Rob LandersFri, 07 Feb 2025 23:35:52 +0000
Christoph M. BeckerFri, 07 Feb 2025 23:47:40 +0000
Larry GarfieldSat, 08 Feb 2025 04:05:34 +0000
Tim DüsterhusSat, 08 Feb 2025 11:41:43 +0000
Tim DüsterhusSat, 08 Feb 2025 11:36:25 +0000
Tim DüsterhusWed, 26 Feb 2025 20:01:28 +0000
Juris EvertovskisFri, 07 Feb 2025 15:15:19 +0000
Christoph M. BeckerFri, 07 Feb 2025 16:51:45 +0000
Larry GarfieldFri, 07 Feb 2025 19:48:02 +0000
Thomas HruskaFri, 07 Feb 2025 17:59:13 +0000
Faizan Akram DarFri, 07 Feb 2025 20:45:16 +0000
Côme ChillietTue, 11 Mar 2025 09:00:52 +0000
Rowan Tommins [IMSoP]Wed, 12 Mar 2025 14:19:30 +0000
Gina P. BanyardSat, 08 Feb 2025 15:43:53 +0000
Tim DüsterhusMon, 10 Feb 2025 10:04:32 +0000
Larry GarfieldWed, 26 Feb 2025 06:26:27 +0000
Tim DüsterhusWed, 26 Feb 2025 19:59:08 +0000
Ilija ToviloThu, 27 Mar 2025 14:30:26 +0000
Olaf Schmidt-WischhöferSat, 29 Mar 2025 14:46:16 +0000
Larry GarfieldThu, 03 Apr 2025 07:22:42 +0000
Rowan Tommins [IMSoP]Thu, 03 Apr 2025 11:58:46 +0000
Larry GarfieldThu, 03 Apr 2025 17:06:29 +0000
Rowan Tommins [IMSoP]Thu, 03 Apr 2025 21:06:15 +0000
Larry GarfieldFri, 04 Apr 2025 04:36:34 +0000
Ilija ToviloTue, 08 Apr 2025 23:29:45 +0000
Rob LandersWed, 09 Apr 2025 05:56:57 +0000
Larry GarfieldThu, 10 Apr 2025 16:00:21 +0000
Larry GarfieldSat, 10 May 2025 06:02:10 +0000
Ilija ToviloThu, 03 Apr 2025 17:39:19 +0000
Dmitry DerepkoMon, 02 Jun 2025 20:41:44 +0000
Larry GarfieldTue, 03 Jun 2025 02:25:18 +0000
Dmitry DerepkoTue, 03 Jun 2025 07:59:18 +0000
Larry GarfieldTue, 03 Jun 2025 15:41:08 +0000

« previous	php.internals (#127081)	next »

From:	Ilija Tovilo	Date:	Tue, 08 Apr 2025 23:29:45 +0000
Subject:	Re: [RFC] Pipe Operator (again)
References:	1 2 3 4 5 6 7	Groups:	php.internals
Request:	Send a blank email to [email protected] to get a copy of this message