Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API

From: Date: Fri, 28 Jun 2024 22:14:19 +0000
Subject: Re: [RFC] [Discussion] Add WHATWG compliant URL parsing API
References: 1  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
On Fri, Jun 28, 2024, at 8:06 PM, Máté Kocsis wrote:
> Hi Everyone,
>
> I've been working on a new RFC for a while now, and time has come to 
> present it to a wider audience.
>
> Last year, I learnt that PHP doesn't have built-in support for parsing 
> URLs according to any well established standards (RFC 1738 or the 
> WHATWG URL living standard), since the parse_url() function is 
> optimized for performance instead of correctness.
>
> In order to improve compatibility with external tools consuming URLs 
> (like browsers), my new RFC would add a WHATWG compliant URL parser 
> functionality to the standard library. The API itself is not final by 
> any means, the RFC only represents how I imagined it first.
>
> You can find the RFC at the following link: 
> https://wiki.php.net/rfc/url_parsing_api
>
> Regards,
> Máté

I am all for proper data modeling of all the things, so I support this effort.

Comments:

* There's no need for UrlComponent to be backed.

* I don't understand why UrlParser is a static class.  We just had a whole big debate about
that. :-)

There's a couple of ways I could see it working, and I'm not sure which I prefer:

1. Better if we envision the parser getting options or configuration in the future.
$url = new UrlParser()->parseUrl(): Url;

2. The named-constructor pattern is quite common.
$url = Url::parseFromString()
$url = Url::parseToArray();

* I... do not understand the point of having public properties AND getters/withers.  A readonly
class with withers, OK, a bit clunky to implement but it would be your problem in C, not mine, so I
don't care. :-)  But why getters AND public properties?  If going that far, why not finish up
clone-with and then we don't need the withers, either? :-)

* Making all the parameters to Url required except port makes little sense to me.  User/pass is more
likely to be omitted 99% of the time than port.  In practice, most components are optional, in which
case it would be inaccurate to not make them nullable.  Empty string wouldn't be quite the
same, as that is still a value and code that knows to skip empty string when doing something is
basically the same as code that knows to skip nulls.  We should assume people are going to
instantiate this class themselves often, not just get it from the parser, so it should be designed
to support that.

* I would not make Url final.  "OMG but then people can extend it!"  Exactly.  I can
absolutely see a case for an HttpUrl subclass that enforces scheme as http/https, or an FtpUrl that
enforces a scheme of ftp, etc.  Or even an InternalUrl that assumes the host is one particular
company, or something.  (If this sounds like scope creep, it's because I am confident that
people will want to creep this direction and we should plan ahead for it.)

* If the intent of the withers is to mimic PSR-7, I don't think it does so effectively. 
Without the interface, it couldn't be a drop-in replacement for UriInterface anyway.  And we
cannot extend it to add the interface if it's final.  Widening the parameters in PSR-7
interfaces to support both wouldn't work, as that would be a hard-BC break for any existing
implementations.  So I don't really see what the goal is here.

* If we ever get "data classes", this would be a good candidate. :-)

* Crazy idea:

new UriParser(HttpUrl::class)->parse(string);

To allow a more restrictive set of rules.  Or even just to cast the object to that child class.

--Larry Garfield


Thread (152 messages)