| 1 | # Version 0.05 alpha $Revision: 1.6 $ $Date: 2001/06/24 17:11:26 $
|
|---|
| 2 |
|
|---|
| 3 | =head1 TO DO
|
|---|
| 4 |
|
|---|
| 5 | =over 4
|
|---|
| 6 |
|
|---|
| 7 | =item *
|
|---|
| 8 |
|
|---|
| 9 | LIST_CACHE doesn't work with ties to most DBM implementations, because
|
|---|
| 10 | Memouze tries to save a listref, and DB_File etc. can only store
|
|---|
| 11 | strings. This should at least be documented. Maybe Memoize could
|
|---|
| 12 | detect the problem at TIE time and throw a fatal error.
|
|---|
| 13 |
|
|---|
| 14 | 20010623 This was added sometime prior to 20001025.
|
|---|
| 15 |
|
|---|
| 16 | Try out MLDBM here and document it if it works.
|
|---|
| 17 |
|
|---|
| 18 | =item *
|
|---|
| 19 |
|
|---|
| 20 | We should extend the benchmarking module to allow
|
|---|
| 21 |
|
|---|
| 22 | timethis(main, { MEMOIZED => [ suba, subb ] })
|
|---|
| 23 |
|
|---|
| 24 | What would this do? It would time C<main> three times, once with
|
|---|
| 25 | C<suba> and C<subb> unmemoized, twice with them memoized.
|
|---|
| 26 |
|
|---|
| 27 | Why would you want to do this? By the third set of runs, the memo
|
|---|
| 28 | tables would be fully populated, so all calls by C<main> to C<suba>
|
|---|
| 29 | and C<subb> would return immediately. You would be able to see how
|
|---|
| 30 | much of C<main>'s running time was due to time spent computing in
|
|---|
| 31 | C<suba> and C<subb>. If that was just a little time, you would know
|
|---|
| 32 | that optimizing or improving C<suba> and C<subb> would not have a
|
|---|
| 33 | large effect on the performance of C<main>. But if there was a big
|
|---|
| 34 | difference, you would know that C<suba> or C<subb> was a good
|
|---|
| 35 | candidate for optimization if you needed to make C<main> go faster.
|
|---|
| 36 |
|
|---|
| 37 | Done.
|
|---|
| 38 |
|
|---|
| 39 | =item *
|
|---|
| 40 |
|
|---|
| 41 | Perhaps C<memoize> should return a reference to the original function
|
|---|
| 42 | as well as one to the memoized version? But the programmer could
|
|---|
| 43 | always construct such a reference themselves, so perhaps it's not
|
|---|
| 44 | necessary. We save such a reference anyway, so a new package method
|
|---|
| 45 | could return it on demand even if it wasn't provided by C<memoize>.
|
|---|
| 46 | We could even bless the new function reference so that it could have
|
|---|
| 47 | accessor methods for getting to the original function, the options,
|
|---|
| 48 | the memo table, etc.
|
|---|
| 49 |
|
|---|
| 50 | Naah.
|
|---|
| 51 |
|
|---|
| 52 | =item *
|
|---|
| 53 |
|
|---|
| 54 | The TODISK feature is not ready yet. It will have to be rather
|
|---|
| 55 | complicated, providing options for which disk method to use (GDBM?
|
|---|
| 56 | DB_File? Flat file? Storable? User-supplied?) and which stringizing
|
|---|
| 57 | method to use (FreezeThaw? Marshal? User-supplied?)
|
|---|
| 58 |
|
|---|
| 59 | Done!
|
|---|
| 60 |
|
|---|
| 61 | =item *
|
|---|
| 62 |
|
|---|
| 63 | Maybe an option for automatic expiration of cache values? (`After one
|
|---|
| 64 | day,' `After five uses,' etc.) Also possibly an option to limit the
|
|---|
| 65 | number of active entries with automatic LRU expiration.
|
|---|
| 66 |
|
|---|
| 67 | You have a long note to Mike Cariaso that outlines a good approach
|
|---|
| 68 | that you sent on 9 April 1999.
|
|---|
| 69 |
|
|---|
| 70 | What's the timeout stuff going to look like?
|
|---|
| 71 |
|
|---|
| 72 | EXPIRE_TIME => time_in_sec
|
|---|
| 73 | EXPIRE_USES => num_uses
|
|---|
| 74 | MAXENTRIES => n
|
|---|
| 75 |
|
|---|
| 76 | perhaps? Is EXPIRE_USES actually useful?
|
|---|
| 77 |
|
|---|
| 78 | 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES.
|
|---|
| 79 | MAXENTRIES can come later as a separate module.
|
|---|
| 80 |
|
|---|
| 81 | =item *
|
|---|
| 82 |
|
|---|
| 83 | Put in a better example than C<fibo>. Show an example of a
|
|---|
| 84 | nonrecursive function that simply takes a long time to run.
|
|---|
| 85 | C<getpwuid> for example? But this exposes the bug that you can't say
|
|---|
| 86 | C<memoize('getpwuid')>, so perhaps it's not a very good example.
|
|---|
| 87 |
|
|---|
| 88 | Well, I did add the ColorToRGB example, but it's still not so good.
|
|---|
| 89 | These examples need a lot of work. C<factorial> might be a better
|
|---|
| 90 | example than C<fibo>.
|
|---|
| 91 |
|
|---|
| 92 | =item *
|
|---|
| 93 |
|
|---|
| 94 | Add more regression tests for normalizers.
|
|---|
| 95 |
|
|---|
| 96 | =item *
|
|---|
| 97 |
|
|---|
| 98 | Maybe resolve normalizer function to code-ref at memoize time instead
|
|---|
| 99 | of at function call time for efficiency? I think there was some
|
|---|
| 100 | reason not to do this, but I can't remember what it was.
|
|---|
| 101 |
|
|---|
| 102 | =item *
|
|---|
| 103 |
|
|---|
| 104 | Add more array value tests to the test suite.
|
|---|
| 105 |
|
|---|
| 106 | Does it need more now?
|
|---|
| 107 |
|
|---|
| 108 | =item *
|
|---|
| 109 |
|
|---|
| 110 | Fix that `Subroutine u redefined ... line 484' message.
|
|---|
| 111 |
|
|---|
| 112 | Fixed, I think.
|
|---|
| 113 |
|
|---|
| 114 | =item *
|
|---|
| 115 |
|
|---|
| 116 | Get rid of any remaining *{$ref}{CODE} or similar magic hashes.
|
|---|
| 117 |
|
|---|
| 118 | =item *
|
|---|
| 119 |
|
|---|
| 120 | There should be an option to dump out the memoized values or to
|
|---|
| 121 | otherwise traverse them.
|
|---|
| 122 |
|
|---|
| 123 | What for?
|
|---|
| 124 |
|
|---|
| 125 | Maybe the tied hash interface taskes care of this anyway?
|
|---|
| 126 |
|
|---|
| 127 | =item *
|
|---|
| 128 |
|
|---|
| 129 | Include an example that caches DNS lookups.
|
|---|
| 130 |
|
|---|
| 131 | =item *
|
|---|
| 132 |
|
|---|
| 133 | Make tie for Storable (Memoize::Storable)
|
|---|
| 134 |
|
|---|
| 135 | A prototype of Memoize::Storable is finished. Test it and add to the
|
|---|
| 136 | test suite.
|
|---|
| 137 |
|
|---|
| 138 | Done.
|
|---|
| 139 |
|
|---|
| 140 | =item *
|
|---|
| 141 |
|
|---|
| 142 | Make tie for DBI (Memoize::DBI)
|
|---|
| 143 |
|
|---|
| 144 | =item *
|
|---|
| 145 |
|
|---|
| 146 | I think there's a bug. See `###BUG'.
|
|---|
| 147 |
|
|---|
| 148 | =item *
|
|---|
| 149 |
|
|---|
| 150 | Storable probably can't be done, because it doesn't allow updating.
|
|---|
| 151 | Maybe a different interface that supports readonly caches fronted by a
|
|---|
| 152 | writable in-memory cache? A generic tied hash maybe?
|
|---|
| 153 |
|
|---|
| 154 | FETCH {
|
|---|
| 155 | if (it's in the memory hash) {
|
|---|
| 156 | return it
|
|---|
| 157 | } elsif (it's in the readonly disk hash) {
|
|---|
| 158 | return it
|
|---|
| 159 | } else {
|
|---|
| 160 | not-there
|
|---|
| 161 | }
|
|---|
| 162 | }
|
|---|
| 163 |
|
|---|
| 164 | STORE {
|
|---|
| 165 | put it into the in-memory hash
|
|---|
| 166 | }
|
|---|
| 167 |
|
|---|
| 168 | Maybe `save' and `restore' methods?
|
|---|
| 169 |
|
|---|
| 170 | It isn't working right because the destructor doesn't get called at
|
|---|
| 171 | the right time.
|
|---|
| 172 |
|
|---|
| 173 | This is fixed. `use strict vars' would have caught it immediately. Duh.
|
|---|
| 174 |
|
|---|
| 175 | =item *
|
|---|
| 176 |
|
|---|
| 177 | Don't forget about generic interface to Storable-like packages
|
|---|
| 178 |
|
|---|
| 179 | 20010627 It would appear that you put this into 0.51.
|
|---|
| 180 |
|
|---|
| 181 | =item *
|
|---|
| 182 |
|
|---|
| 183 | Maybe add in TODISK after all, with TODISK => 'filename' equivalent to
|
|---|
| 184 |
|
|---|
| 185 | SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666],
|
|---|
| 186 | LIST_CACHE => MERGE
|
|---|
| 187 |
|
|---|
| 188 | =item *
|
|---|
| 189 |
|
|---|
| 190 | Maybe the default for LIST_CACHE should be MERGE anyway.
|
|---|
| 191 |
|
|---|
| 192 | =item *
|
|---|
| 193 |
|
|---|
| 194 | There's some terrible bug probably related to use under threaded perl,
|
|---|
| 195 | possibly connected with line 56:
|
|---|
| 196 |
|
|---|
| 197 | my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }";
|
|---|
| 198 |
|
|---|
| 199 | I think becayse C<@_> is lexically scoped in threadperl, the effect of
|
|---|
| 200 | C<unshift> never makes it into C<_memoizer>. That's probably a bug in
|
|---|
| 201 | Perl, but maybe I should work around it. Can anyone provide more
|
|---|
| 202 | information here, or lend me a machine with threaded Perl where I can
|
|---|
| 203 | test this theory? Line 59, currently commented out, may fix the
|
|---|
| 204 | problem.
|
|---|
| 205 |
|
|---|
| 206 | 20010623 Working around this in 0.65, but it still blows.
|
|---|
| 207 |
|
|---|
| 208 | =item *
|
|---|
| 209 |
|
|---|
| 210 | Maybe if the original function has a prototype, the module can use
|
|---|
| 211 | that to select the most appropriate default normalizer. For example,
|
|---|
| 212 | if the prototype was C<($)>, there's no reason to use `join'. If it's
|
|---|
| 213 | C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>.
|
|---|
| 214 |
|
|---|
| 215 | =item *
|
|---|
| 216 |
|
|---|
| 217 | Ariel Scolnikov suggests using the change counting problem as an
|
|---|
| 218 | example. (How many ways to make change of a dollar?)
|
|---|
| 219 |
|
|---|
| 220 | =item *
|
|---|
| 221 |
|
|---|
| 222 | Jonathan Roy found a use for `unmemoize'. If you're using the
|
|---|
| 223 | Storable glue, and your program gets SIGINT, you find that the cache
|
|---|
| 224 | data is not in the cache, because Perl normally writes it all out at
|
|---|
| 225 | once from a DESTROY method, and signals skip DESTROY processing. So
|
|---|
| 226 | you could add
|
|---|
| 227 |
|
|---|
| 228 | $sig{INT} = sub { unmemoize ... };
|
|---|
| 229 |
|
|---|
| 230 |
|
|---|
| 231 | =item *
|
|---|
| 232 |
|
|---|
| 233 | This means it would be useful to have a method to return references to
|
|---|
| 234 | all the currently-memoized functions so that you could say
|
|---|
| 235 |
|
|---|
| 236 | $sig{INT} = sub { for $f (Memoize->all_memoized) {
|
|---|
| 237 | unmemoize $f;
|
|---|
| 238 | }
|
|---|
| 239 | }
|
|---|
| 240 |
|
|---|
| 241 |
|
|---|
| 242 | =item *
|
|---|
| 243 |
|
|---|
| 244 | 19990917 There should be a call you can make to get back the cache
|
|---|
| 245 | itself. If there were, then you could delete stuff from it to
|
|---|
| 246 | manually expire data items.
|
|---|
| 247 |
|
|---|
| 248 | =item *
|
|---|
| 249 |
|
|---|
| 250 | 19990925 Randal says that the docs for Memoize;:Expire should make it
|
|---|
| 251 | clear that the expired entries are never flushed all at once. He
|
|---|
| 252 | asked if you would need to do that manually. I said:
|
|---|
| 253 |
|
|---|
| 254 | Right, if that's what you want. If you have EXISTS return false,
|
|---|
| 255 | it'll throw away the old cached item and replace it in the cache
|
|---|
| 256 | with a new item. But if you want the cache to actually get smaller,
|
|---|
| 257 | you have to do that yourself.
|
|---|
| 258 |
|
|---|
| 259 | I was planning to build an Expire module that implemented an LRU
|
|---|
| 260 | queue and kept the cache at a constant fixed size, but I didn't get
|
|---|
| 261 | to it yet. It's not clear to me that the automatic exptynig-out
|
|---|
| 262 | behavior is very useful anyway. The whole point of a cache is to
|
|---|
| 263 | trade space for time, so why bother going through the cache to throw
|
|---|
| 264 | away old items before you need to?
|
|---|
| 265 |
|
|---|
| 266 | Randal then pointed out that it could discard expired items at DESTRoY
|
|---|
| 267 | or TIEHASH time, which seemed like a good idea, because if the cache
|
|---|
| 268 | is on disk you might like to keep it as small as possible.
|
|---|
| 269 |
|
|---|
| 270 | =item *
|
|---|
| 271 |
|
|---|
| 272 | 19991219 Philip Gwyn suggests this technique: You have a load_file
|
|---|
| 273 | function that memoizes the file contexts. But then if the file
|
|---|
| 274 | changes you get the old contents. So add a normalizer that does
|
|---|
| 275 |
|
|---|
| 276 | return join $;, (stat($_[0])[9]), $_[0];
|
|---|
| 277 |
|
|---|
| 278 | Now when the modification date changes, the true key returned by the
|
|---|
| 279 | normalizer is different, so you get a cache miss and it loads the new
|
|---|
| 280 | contents. Disadvantage: The old contents are still in the cache. I
|
|---|
| 281 | think it makes more sense to have a special expiration manager for
|
|---|
| 282 | this. Make one up and bundle it.
|
|---|
| 283 |
|
|---|
| 284 | 19991220 I have one written: Memoize::ExpireFile. But how can you
|
|---|
| 285 | make this work when the function might have several arguments, of
|
|---|
| 286 | which some are filenames and some aren't?
|
|---|
| 287 |
|
|---|
| 288 | =item *
|
|---|
| 289 |
|
|---|
| 290 | 19991219 There should be an inheritable TIEHASH method that does the
|
|---|
| 291 | argument processing properly.
|
|---|
| 292 |
|
|---|
| 293 | 19991220 Philip Gwyn contributed a patch for this.
|
|---|
| 294 |
|
|---|
| 295 | 20001231 You should really put this in. Jonathan Roy uncovered a
|
|---|
| 296 | problem that it will be needed to solve. Here's the problem: He has:
|
|---|
| 297 |
|
|---|
| 298 | memoize "get_items",
|
|---|
| 299 | LIST_CACHE => ["TIE", "Memoize::Expire",
|
|---|
| 300 | LIFETIME => 86400,
|
|---|
| 301 | TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666]
|
|---|
| 302 | ];
|
|---|
| 303 |
|
|---|
| 304 | This won't work, because memoize is trying to store listrefs in a
|
|---|
| 305 | DB_File. He owuld have gotten a fatal error if he had done this:
|
|---|
| 306 |
|
|---|
| 307 | memoize "get_items",
|
|---|
| 308 | LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]'
|
|---|
| 309 |
|
|---|
| 310 |
|
|---|
| 311 | But in this case, he tied the cache to Memoize::Expire, which is *not*
|
|---|
| 312 | scalar-only, and the check for scalar-only ties is missing from
|
|---|
| 313 | Memoize::Expire. The inheritable method can take care of this.
|
|---|
| 314 |
|
|---|
| 315 | 20010623 I decided not to put it in. Instead, we avoid the problem by
|
|---|
| 316 | getting rid of TIE. The HASH option does the same thing, and HASH is
|
|---|
| 317 | so simple to support that a module is superfluous.
|
|---|
| 318 |
|
|---|
| 319 | =item *
|
|---|
| 320 |
|
|---|
| 321 | 20001130 Custom cache manager that checks to make sure the function
|
|---|
| 322 | return values actually match the memoized values.
|
|---|
| 323 |
|
|---|
| 324 | =item *
|
|---|
| 325 |
|
|---|
| 326 | 20001231 Expiration manager that watches cache performance and
|
|---|
| 327 | accumulates statistics. Variation: Have it automatically unmemoize
|
|---|
| 328 | the function if performance is bad.
|
|---|
| 329 |
|
|---|
| 330 | =item *
|
|---|
| 331 |
|
|---|
| 332 | 20010517 Option to have normalizer *modify* @_ for use by memoized
|
|---|
| 333 | function. This would save code and time in cases like the one in the
|
|---|
| 334 | manual under 'NORMALIZER', where both f() and normalize_f() do the
|
|---|
| 335 | same analysis and make the same adjustments to the hash. If the
|
|---|
| 336 | normalizer could make the adjustments and save the changes in @_, you
|
|---|
| 337 | wouldn't have to do it twice.
|
|---|
| 338 |
|
|---|
| 339 | =item *
|
|---|
| 340 | 20010623 Add CLEAR methods to tied hash modules.
|
|---|
| 341 |
|
|---|
| 342 | =item *
|
|---|
| 343 | 20010623 You get a warning if you try to use DB_File as LIST_CACHE,
|
|---|
| 344 | because it won't store lists. But if you use it as the underlying
|
|---|
| 345 | cache with an expiration manager in the middle, no warning---the
|
|---|
| 346 | expiration manager doesn't know it's managing a list cache, and
|
|---|
| 347 | memoize doesn't know that DB_File is underlying. Is this fixable?
|
|---|
| 348 | Probably not, but think about it.
|
|---|
| 349 |
|
|---|
| 350 | =item *
|
|---|
| 351 | There was probably some other stuff that I forgot.
|
|---|
| 352 |
|
|---|
| 353 |
|
|---|
| 354 |
|
|---|
| 355 | =back
|
|---|