Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API

From: Paul M. Jones Date: Tue, 29 Apr 2025 13:55:31 +0000

Subject: Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API

References: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Groups: php.internals

Request: Send a blank email to [email protected] to get a copy of this message

Hi Ignace & Maté and all,

tl;dr: I argue against Ignace's objections to splitting the URI class into two classes (one
that retains raw URI values and another that normalizes values as-it-goes). Jump to the very end for
a discussion regarding the with() methods (search for the word "asymmetry" herein).

* * *

> On Apr 28, 2025, at 15:47, ignace nyamagana butera <[email protected]> wrote:
> 
> The current approach in userland mixes both raw and half normalized components as well as
> RFC3986 and RFC3987 specification with ambiguity around normalization, input, constructior, what
> needs to be encoded where and when

Based on my research into existing URI projects <https://github.com/uri-interop/interface/blob/1.x/README-RESEARCH.md>
I don't think that's an accurate assessment of the ecosystem.

For example, can you point out which projects mix "raw and half-normalized components"?
Nette is the only one that comes to mind, in that (during parsing) it applies rawurldecode() to the
host, user, password, and fragment; but that's only one of the 18 projects.

Likewise, of the 15 URI-centric projects, only one of them (league/uri) offers both RFC3986 and 3987
parsing; the two IRI-centric projects (ml/iri and rmccue/requests) are explicitly IRIs; and rowbot
is clearly WHATWG-URL centric.  So I don't see much ambiguity in any projects there.

As far as normalization, only one project (opis) affords the ability to normalize at creation time,
though five of them offer a normalize() method with various effects (<https://github.com/uri-interop/interface/blob/1.x/README-RESEARCH.md#normalizing>).
So, again, I don't see much ambiguity there either; they don't do normalizing as-you-go,
it's something you have to apply explicitly.

Regarding inputs, they all presume "raw" inputs. Regarding constructors, they mostly side
with a full URI string. Regarding encoding, they mostly retain values in their encoded form (there
are three outliers, cf. <https://github.com/uri-interop/interface/blob/1.x/README-RESEARCH.md#component-encoding>).

With all that in mind, we can see that the various authors of userland projects have settled on
remarkably similar patterns of usage that they found valuable and useful for working with URIs.

> > - fulfill existing userland expectations;
> 
> Existing userland expectations are mostly built around parse_url

That's kind of true; 9 of the 18 projects use parse_url(), and 7/18 implement the RFC 3986
parsing algorithm ...

> which is one of the reasons the RFC exists to improve the status quo and to introduce in PHP
> valid parsers against recognizable URI specifications. Yes some adaptation will be needed to use
> them in userland but I believe this work is easy to do, talking from the POV of a URI package
> maintainer.

... but I don't imagine that replacing parse_url() in those projects with the RFC 3986 algo
would cause those projects to change any of their other design decisions. What adaptations do you
think would be needed around that replacement?

> > - replace the toString()/toRawString() with a single idiomatic __toString() in each class;
> 
> For all the reasons explained in the RFC, adding a __toString method is a bad
> architectural design for an URI. There are so many ways to represent an URI that  having a
> __toString for string representation gives a false sense of "there can be only one
> true representation for a single URI" which is not true.

For Rfc3986\Uri, it looks like there are only two that are recognized: raw and normalized. Are there
other string representations you feel the Uri class should recognize?

(For Whatwg\Url, it looks like there are also only two: as-parsed, and as ASCII, but I'm not
addressing that part of the RFC here.)

> > - move normalization logic into the NormalizedUri class.
> 
> The classes follow  specifications that describe how normalization should be. Why would you
> split the responsibilities in other classes ? What would be the added value ? 

For one, unless I am missing something, there is an asymmetry between the get() methods and the
with() methods. What I'm seeing is that (e.g.) Uri::withPath() expects a raw path argument, but
getPath() returns the normalized version.  For symmetry, I would expect either:

- Uri::withPath(raw_value) : self and Uri::getPath() : raw_value, or
- Uri::withRawPath(raw_value) : self and Uri::getRawPath() : raw_value

Thus my first intuition that the "main" values in the URI need to be the raw ones, and
that getting the normalized ones should be the more verbose case (e.g. getNormalizedPath() :
normalized_value).

So, one value added by splitting the classes is to resolve that asymmetry. Consumers expecting to
get back from the URI what they put into it can use the raw Uri variation; "API clients or
signers fall in this category that want to avoid introducing any unnecessary changes to URIs, in
order to avoid causing subtle bugs." 

Other consumers, who want to do things this new and different way (normalized as-you-go, unlike
anything currently in userland) can use the NormalizedUri.

(Or you could flip it around and say that the normalized variation is the Uri class, and the raw
version is RawUri.)

-- pmj

Thread (152 messages)

Máté KocsisFri, 28 Jun 2024 20:06:14 +0000
Marco PivettaFri, 28 Jun 2024 20:21:33 +0000
LynnFri, 28 Jun 2024 21:02:08 +0000
Niels DosscheFri, 28 Jun 2024 21:35:36 +0000
BilgeFri, 28 Jun 2024 22:53:12 +0000
Stephen ReaySat, 29 Jun 2024 09:57:17 +0000
Rob LandersSat, 29 Jun 2024 10:33:16 +0000
ignace nyamagana buteraSun, 30 Jun 2024 06:51:52 +0000
Máté KocsisSun, 07 Jul 2024 09:13:58 +0000
Rob LandersSun, 07 Jul 2024 10:40:02 +0000
Rob LandersSun, 07 Jul 2024 10:59:45 +0000
ignace nyamagana buteraSun, 07 Jul 2024 10:55:18 +0000
Rob LandersSun, 07 Jul 2024 11:10:11 +0000
Nicolas GrekasMon, 08 Jul 2024 07:51:27 +0000
Máté KocsisMon, 15 Jul 2024 09:20:02 +0000
Larry GarfieldMon, 15 Jul 2024 13:23:10 +0000
Ignace Nyamagana ButeraMon, 15 Jul 2024 19:31:27 +0000
Máté KocsisSun, 30 Jun 2024 06:00:00 +0000
Larry GarfieldFri, 28 Jun 2024 22:14:19 +0000
Máté KocsisSat, 29 Jun 2024 22:42:06 +0000
Ben RamseyFri, 28 Jun 2024 23:28:36 +0000
nyamsprod the funky webmasterSat, 29 Jun 2024 08:20:11 +0000
Ben RamseySat, 29 Jun 2024 17:35:46 +0000
Juris EvertovskisSat, 29 Jun 2024 16:19:12 +0000
KrinkleSat, 29 Jun 2024 20:27:50 +0000
LanreMon, 08 Jul 2024 17:24:09 +0000
LanreFri, 19 Jul 2024 22:55:27 +0000
Niels DosscheSun, 21 Jul 2024 11:21:39 +0000
ignace nyamagana buteraTue, 23 Jul 2024 06:38:40 +0000
Máté KocsisMon, 26 Aug 2024 07:40:56 +0000
Dennis SnellMon, 26 Aug 2024 22:25:35 +0000
Máté KocsisTue, 19 Nov 2024 08:49:41 +0000
Dennis SnellFri, 03 Jan 2025 07:18:33 +0000
ignace nyamagana buteraMon, 13 Jan 2025 15:09:50 +0000
Máté KocsisSun, 16 Feb 2025 22:01:36 +0000
Tim DüsterhusFri, 21 Feb 2025 12:06:57 +0000
Tim DüsterhusSun, 23 Feb 2025 15:05:25 +0000
Juris EvertovskisSun, 23 Feb 2025 17:47:41 +0000RE: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API
Tim DüsterhusMon, 24 Feb 2025 09:15:57 +0000Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API
Máté KocsisMon, 10 Mar 2025 22:58:16 +0000
Ignace Nyamagana ButeraMon, 24 Feb 2025 09:18:10 +0000
Tim DüsterhusMon, 24 Feb 2025 09:43:45 +0000
Nicolas GrekasMon, 24 Feb 2025 11:08:07 +0000
Tim DüsterhusMon, 24 Feb 2025 12:48:14 +0000
Nicolas GrekasMon, 24 Feb 2025 13:44:53 +0000
Marco PivettaMon, 24 Feb 2025 13:57:32 +0000
Sebastian BergmannMon, 24 Feb 2025 14:23:44 +0000
Gina P. BanyardMon, 24 Feb 2025 13:57:55 +0000
Hammed AjaoMon, 24 Feb 2025 14:05:37 +0000
Tim DüsterhusMon, 24 Feb 2025 16:22:39 +0000
Máté KocsisFri, 14 Mar 2025 21:23:06 +0000
Nicolas GrekasMon, 24 Feb 2025 14:29:28 +0000
ignace nyamagana buteraTue, 25 Feb 2025 16:00:32 +0000
ignace nyamagana buteraTue, 25 Feb 2025 16:00:32 +0000
Máté KocsisFri, 14 Mar 2025 19:54:22 +0000
Máté KocsisFri, 14 Mar 2025 19:45:23 +0000
ignace nyamagana buteraFri, 14 Mar 2025 22:26:04 +0000
Máté KocsisMon, 17 Mar 2025 19:58:27 +0000
Paul M. JonesTue, 18 Mar 2025 17:00:55 +0000
Máté KocsisTue, 18 Mar 2025 20:15:52 +0000
Paul M. JonesWed, 19 Mar 2025 15:13:42 +0000
Máté KocsisTue, 25 Mar 2025 08:45:12 +0000
Paul M . JonesFri, 28 Mar 2025 15:44:14 +0000
Máté KocsisMon, 05 May 2025 21:32:33 +0000
Ignace Nyamagana ButeraWed, 19 Mar 2025 21:18:24 +0000
Paul M. JonesSat, 22 Mar 2025 14:01:45 +0000
Tim DüsterhusSun, 30 Mar 2025 11:25:15 +0000
Máté KocsisThu, 27 Mar 2025 21:04:27 +0000
Ignace Nyamagana ButeraThu, 27 Mar 2025 22:49:39 +0000
Tim DüsterhusSun, 30 Mar 2025 12:42:33 +0000
Ignace Nyamagana ButeraSun, 30 Mar 2025 20:53:57 +0000
Ignace Nyamagana ButeraMon, 31 Mar 2025 19:15:47 +0000
Máté KocsisWed, 02 Apr 2025 17:59:11 +0000
Ignace Nyamagana ButeraFri, 04 Apr 2025 17:46:55 +0000
Máté KocsisWed, 02 Apr 2025 20:41:55 +0000
Máté KocsisSun, 02 Mar 2025 22:00:08 +0000
Tim DüsterhusSun, 30 Mar 2025 12:36:04 +0000
Máté KocsisSun, 13 Apr 2025 12:10:52 +0000
Tim DüsterhusTue, 15 Apr 2025 14:20:52 +0000
Ignace Nyamagana ButeraTue, 15 Apr 2025 17:12:37 +0000
Máté KocsisTue, 15 Apr 2025 21:55:25 +0000
Tim DüsterhusThu, 17 Apr 2025 07:22:34 +0000
Máté KocsisThu, 17 Apr 2025 11:18:21 +0000
ignace nyamagana buteraThu, 17 Apr 2025 11:49:54 +0000
Máté KocsisThu, 17 Apr 2025 11:53:34 +0000
Máté KocsisThu, 17 Apr 2025 12:04:53 +0000
Paul M. JonesThu, 17 Apr 2025 20:47:46 +0000
Tim DüsterhusThu, 17 Apr 2025 20:58:53 +0000
Paul M. JonesThu, 17 Apr 2025 21:14:55 +0000
Tim DüsterhusThu, 17 Apr 2025 21:19:20 +0000
Tim DüsterhusWed, 23 Apr 2025 10:50:44 +0000
ignace nyamagana buteraSun, 27 Apr 2025 20:30:24 +0000
Tim DüsterhusSun, 27 Apr 2025 20:32:44 +0000
ignace nyamagana buteraSun, 27 Apr 2025 20:50:45 +0000
Tim DüsterhusSun, 27 Apr 2025 21:05:37 +0000
Máté KocsisSat, 03 May 2025 21:18:35 +0000
Máté KocsisSun, 27 Apr 2025 21:47:04 +0000
Tim DüsterhusSun, 27 Apr 2025 22:33:15 +0000
ignace nyamagana buteraMon, 28 Apr 2025 07:05:29 +0000
ignace nyamagana buteraMon, 28 Apr 2025 08:42:23 +0000
Máté KocsisMon, 28 Apr 2025 21:20:57 +0000
ignace nyamagana buteraMon, 28 Apr 2025 21:31:02 +0000
ignace nyamagana buteraTue, 29 Apr 2025 08:54:45 +0000
Tim DüsterhusTue, 29 Apr 2025 18:55:04 +0000
ignace nyamagana buteraWed, 30 Apr 2025 07:58:02 +0000
ignace nyamagana buteraWed, 30 Apr 2025 16:42:03 +0000
Máté KocsisSat, 03 May 2025 21:07:43 +0000
Máté KocsisSat, 03 May 2025 21:05:56 +0000
Paul M. JonesMon, 28 Apr 2025 19:49:24 +0000
ignace nyamagana buteraMon, 28 Apr 2025 20:47:49 +0000
Paul M. JonesTue, 29 Apr 2025 13:55:31 +0000
ignace nyamagana buteraTue, 29 Apr 2025 20:08:24 +0000
Dennis SnellWed, 05 Mar 2025 22:45:37 +0000
Máté KocsisSat, 15 Mar 2025 22:05:14 +0000
Máté KocsisTue, 25 Mar 2025 22:23:03 +0000
Dennis SnellTue, 25 Mar 2025 23:06:03 +0000
Dennis SnellTue, 25 Mar 2025 23:53:08 +0000
Larry GarfieldSat, 31 Aug 2024 00:10:15 +0000
Máté KocsisSun, 24 Nov 2024 20:40:07 +0000
Tim DüsterhusFri, 29 Nov 2024 12:28:20 +0000
Tim DüsterhusFri, 29 Nov 2024 12:21:17 +0000
Máté KocsisThu, 05 Dec 2024 21:49:43 +0000
Christoph M. BeckerThu, 05 Dec 2024 23:16:10 +0000
Larry GarfieldThu, 05 Dec 2024 23:43:29 +0000
Gina P. BanyardSun, 23 Feb 2025 17:30:14 +0000
Paul M. JonesSun, 23 Feb 2025 17:57:09 +0000
Gina P. BanyardMon, 24 Feb 2025 00:48:06 +0000
Tim DüsterhusMon, 24 Feb 2025 09:36:48 +0000
Paul M . JonesTue, 25 Feb 2025 12:36:20 +0000
ignace nyamagana buteraTue, 25 Feb 2025 15:55:20 +0000
Paul M. JonesThu, 27 Feb 2025 13:48:02 +0000
Faizan Akram DarThu, 27 Feb 2025 21:01:10 +0000
Rob LandersThu, 27 Feb 2025 23:02:05 +0000
LynnFri, 28 Feb 2025 08:38:11 +0000
Rob LandersFri, 28 Feb 2025 09:26:48 +0000
Máté KocsisFri, 14 Mar 2025 21:41:28 +0000
ignace nyamagana buteraTue, 25 Feb 2025 15:55:20 +0000
Paul M. JonesThu, 27 Feb 2025 13:48:02 +0000
Faizan Akram DarThu, 27 Feb 2025 21:01:10 +0000
Rob LandersThu, 27 Feb 2025 23:02:05 +0000
LynnFri, 28 Feb 2025 08:38:11 +0000
Rob LandersFri, 28 Feb 2025 09:26:48 +0000
Máté KocsisFri, 14 Mar 2025 21:41:28 +0000
Tim DüsterhusMon, 24 Feb 2025 09:15:01 +0000
Máté KocsisWed, 12 Mar 2025 22:00:21 +0000
Tim DüsterhusSun, 30 Mar 2025 11:12:09 +0000
Máté KocsisMon, 10 Mar 2025 22:51:45 +0000
Larry GarfieldTue, 11 Mar 2025 04:34:37 +0000
Máté KocsisSat, 29 Mar 2025 22:18:53 +0000
Máté KocsisMon, 07 Apr 2025 23:00:25 +0000
Máté KocsisMon, 07 Apr 2025 23:27:06 +0000
Máté KocsisMon, 05 May 2025 21:36:05 +0000
Paul M. JonesWed, 07 May 2025 19:16:11 +0000
Gina P. BanyardWed, 07 May 2025 22:02:37 +0000
Paul M. JonesThu, 08 May 2025 17:38:08 +0000
Stephen ReaySat, 29 Jun 2024 09:31:41 +0000Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API
BilgeSat, 29 Jun 2024 11:52:37 +0000
Máté KocsisSun, 07 Jul 2024 09:26:00 +0000

« previous	php.internals (#127238)	next »

From:	Paul M. Jones	Date:	Tue, 29 Apr 2025 13:55:31 +0000
Subject:	Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API
References:	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19	Groups:	php.internals
Request:	Send a blank email to [email protected] to get a copy of this message