Re: [RFC] Deprecations for PHP 8.4

From: Date: Thu, 27 Jun 2024 08:22:24 +0000
Subject: Re: [RFC] Deprecations for PHP 8.4
References: 1 2 3 4 5 6  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message


> On 27 Jun 2024, at 12:31, Mike Schinkel <[email protected]> wrote:
> 
>> On Jun 26, 2024, at 8:14 AM, Gina P. Banyard <[email protected] <mailto:[email protected]>> wrote:
>> 
>> 
>> On Wednesday, 26 June 2024 at 06:18, Mike Schinkel <[email protected] <mailto:[email protected]>> wrote:
>>> https://3v4l.org/RDYFs#v8.3.8
>>> 
>>> Note those seven use-cases are found in around the first 25 results when searching
>>> GitHub for "strtok(".  I could probably find more if I kept looking:
>>> 
>>> https://github.com/search?q=strtok%28+language%3APHP+&type=code
>>> 
>>> Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during
>>> compilation —it is a really inefficient way to find the substring up to the first character,
>>> especially for large strings and/or when in a tight loop where the explode is contained in a called
>>> function
>> 
>> Then use a regex: https://3v4l.org/SGWL5
> 
> Using preg_match() instead of strtok() to process the ~4k file of
> commas is, on average, same as using explode()[0], or 10x as long as using strtok() (at
> times it got as low as 4.4x, but that was rare):
> 
> https://onlinephp.io/c/e1fad
> 
> Size of file:          3972
> Number of commas:      359
> Time taken for strtok: 0.003 seconds
> Time taken for regex:  0.0307 seconds
> Times strtok() faster: 10.25
> 
>> Or a combination of strpos and substr.
> 
> 
> Using strpos()+ substr() instead of strtok() to process
> the ~4k file of commas is, took on average ~3x as long as using strtok(). I implemented
> a class for this and tried to optimize it by using only string positions and not copying the string
> repeatedly. It also took about 1/2 hour to get the code working vs. about 15 seconds to get the code
> working with strtok(); which will most programmers prefer?
> 
> https://onlinephp.io/c/2a09f
> 
> Size of file:           3972
> Number of commas:       359
> Time for strtok:        0.0027 seconds
> Time for strpos/substr: 0.0089 seconds
> Times strtok() faster:  3.31
> 
> 
>> There are *plenty* of solutions to the specific problem you pose here, and thus many
>> different solutions more or less appropriate.
> 
> Yes, and in all cases the existing solutions are significantly slower, except one.
> 
> And that one solution that is not significantly slower is to not deprecate
> strtok().  Not to mention not deprecating would keep from causing lots of BC breakage.
> 
> -Mike

Hi All,

I do appreciate that strtok has a kind of bizarre signature/use pattern and potential for confusion
due to how subsequent calls work, but to me that sounds like a better result for uses that need the
repeated call functionality, would be to introduce a builtin StringTokenizer class that
wraps the underlying strtok_r C call and uses internal state to keep track of the string being
tokenized. 


As a "works the same" solution for grabbing the first segment of a string up to any of the
delimiter chars, could the  strpbrk function be expanded with a
$before_needle arg like strstr has? (strstr matches on an exact substring,
not on any pf a list of characters)




Cheers

Stephen 


Thread (68 messages)

« previous php.internals (#123932) next »