| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlthrtut - tutorial on threads in Perl
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | B<NOTE>: this tutorial describes the new Perl threading flavour
|
|---|
| 8 | introduced in Perl 5.6.0 called interpreter threads, or B<ithreads>
|
|---|
| 9 | for short. In this model each thread runs in its own Perl interpreter,
|
|---|
| 10 | and any data sharing between threads must be explicit.
|
|---|
| 11 |
|
|---|
| 12 | There is another older Perl threading flavour called the 5.005 model,
|
|---|
| 13 | unsurprisingly for 5.005 versions of Perl. The old model is known to
|
|---|
| 14 | have problems, deprecated, and will probably be removed around release
|
|---|
| 15 | 5.10. You are strongly encouraged to migrate any existing 5.005
|
|---|
| 16 | threads code to the new model as soon as possible.
|
|---|
| 17 |
|
|---|
| 18 | You can see which (or neither) threading flavour you have by
|
|---|
| 19 | running C<perl -V> and looking at the C<Platform> section.
|
|---|
| 20 | If you have C<useithreads=define> you have ithreads, if you
|
|---|
| 21 | have C<use5005threads=define> you have 5.005 threads.
|
|---|
| 22 | If you have neither, you don't have any thread support built in.
|
|---|
| 23 | If you have both, you are in trouble.
|
|---|
| 24 |
|
|---|
| 25 | The user-level interface to the 5.005 threads was via the L<Threads>
|
|---|
| 26 | class, while ithreads uses the L<threads> class. Note the change in case.
|
|---|
| 27 |
|
|---|
| 28 | =head1 Status
|
|---|
| 29 |
|
|---|
| 30 | The ithreads code has been available since Perl 5.6.0, and is considered
|
|---|
| 31 | stable. The user-level interface to ithreads (the L<threads> classes)
|
|---|
| 32 | appeared in the 5.8.0 release, and as of this time is considered stable
|
|---|
| 33 | although it should be treated with caution as with all new features.
|
|---|
| 34 |
|
|---|
| 35 | =head1 What Is A Thread Anyway?
|
|---|
| 36 |
|
|---|
| 37 | A thread is a flow of control through a program with a single
|
|---|
| 38 | execution point.
|
|---|
| 39 |
|
|---|
| 40 | Sounds an awful lot like a process, doesn't it? Well, it should.
|
|---|
| 41 | Threads are one of the pieces of a process. Every process has at least
|
|---|
| 42 | one thread and, up until now, every process running Perl had only one
|
|---|
| 43 | thread. With 5.8, though, you can create extra threads. We're going
|
|---|
| 44 | to show you how, when, and why.
|
|---|
| 45 |
|
|---|
| 46 | =head1 Threaded Program Models
|
|---|
| 47 |
|
|---|
| 48 | There are three basic ways that you can structure a threaded
|
|---|
| 49 | program. Which model you choose depends on what you need your program
|
|---|
| 50 | to do. For many non-trivial threaded programs you'll need to choose
|
|---|
| 51 | different models for different pieces of your program.
|
|---|
| 52 |
|
|---|
| 53 | =head2 Boss/Worker
|
|---|
| 54 |
|
|---|
| 55 | The boss/worker model usually has one "boss" thread and one or more
|
|---|
| 56 | "worker" threads. The boss thread gathers or generates tasks that need
|
|---|
| 57 | to be done, then parcels those tasks out to the appropriate worker
|
|---|
| 58 | thread.
|
|---|
| 59 |
|
|---|
| 60 | This model is common in GUI and server programs, where a main thread
|
|---|
| 61 | waits for some event and then passes that event to the appropriate
|
|---|
| 62 | worker threads for processing. Once the event has been passed on, the
|
|---|
| 63 | boss thread goes back to waiting for another event.
|
|---|
| 64 |
|
|---|
| 65 | The boss thread does relatively little work. While tasks aren't
|
|---|
| 66 | necessarily performed faster than with any other method, it tends to
|
|---|
| 67 | have the best user-response times.
|
|---|
| 68 |
|
|---|
| 69 | =head2 Work Crew
|
|---|
| 70 |
|
|---|
| 71 | In the work crew model, several threads are created that do
|
|---|
| 72 | essentially the same thing to different pieces of data. It closely
|
|---|
| 73 | mirrors classical parallel processing and vector processors, where a
|
|---|
| 74 | large array of processors do the exact same thing to many pieces of
|
|---|
| 75 | data.
|
|---|
| 76 |
|
|---|
| 77 | This model is particularly useful if the system running the program
|
|---|
| 78 | will distribute multiple threads across different processors. It can
|
|---|
| 79 | also be useful in ray tracing or rendering engines, where the
|
|---|
| 80 | individual threads can pass on interim results to give the user visual
|
|---|
| 81 | feedback.
|
|---|
| 82 |
|
|---|
| 83 | =head2 Pipeline
|
|---|
| 84 |
|
|---|
| 85 | The pipeline model divides up a task into a series of steps, and
|
|---|
| 86 | passes the results of one step on to the thread processing the
|
|---|
| 87 | next. Each thread does one thing to each piece of data and passes the
|
|---|
| 88 | results to the next thread in line.
|
|---|
| 89 |
|
|---|
| 90 | This model makes the most sense if you have multiple processors so two
|
|---|
| 91 | or more threads will be executing in parallel, though it can often
|
|---|
| 92 | make sense in other contexts as well. It tends to keep the individual
|
|---|
| 93 | tasks small and simple, as well as allowing some parts of the pipeline
|
|---|
| 94 | to block (on I/O or system calls, for example) while other parts keep
|
|---|
| 95 | going. If you're running different parts of the pipeline on different
|
|---|
| 96 | processors you may also take advantage of the caches on each
|
|---|
| 97 | processor.
|
|---|
| 98 |
|
|---|
| 99 | This model is also handy for a form of recursive programming where,
|
|---|
| 100 | rather than having a subroutine call itself, it instead creates
|
|---|
| 101 | another thread. Prime and Fibonacci generators both map well to this
|
|---|
| 102 | form of the pipeline model. (A version of a prime number generator is
|
|---|
| 103 | presented later on.)
|
|---|
| 104 |
|
|---|
| 105 | =head1 What kind of threads are Perl threads?
|
|---|
| 106 |
|
|---|
| 107 | If you have experience with other thread implementations, you might
|
|---|
| 108 | find that things aren't quite what you expect. It's very important to
|
|---|
| 109 | remember when dealing with Perl threads that Perl Threads Are Not X
|
|---|
| 110 | Threads, for all values of X. They aren't POSIX threads, or
|
|---|
| 111 | DecThreads, or Java's Green threads, or Win32 threads. There are
|
|---|
| 112 | similarities, and the broad concepts are the same, but if you start
|
|---|
| 113 | looking for implementation details you're going to be either
|
|---|
| 114 | disappointed or confused. Possibly both.
|
|---|
| 115 |
|
|---|
| 116 | This is not to say that Perl threads are completely different from
|
|---|
| 117 | everything that's ever come before--they're not. Perl's threading
|
|---|
| 118 | model owes a lot to other thread models, especially POSIX. Just as
|
|---|
| 119 | Perl is not C, though, Perl threads are not POSIX threads. So if you
|
|---|
| 120 | find yourself looking for mutexes, or thread priorities, it's time to
|
|---|
| 121 | step back a bit and think about what you want to do and how Perl can
|
|---|
| 122 | do it.
|
|---|
| 123 |
|
|---|
| 124 | However it is important to remember that Perl threads cannot magically
|
|---|
| 125 | do things unless your operating systems threads allows it. So if your
|
|---|
| 126 | system blocks the entire process on sleep(), Perl usually will as well.
|
|---|
| 127 |
|
|---|
| 128 | Perl Threads Are Different.
|
|---|
| 129 |
|
|---|
| 130 | =head1 Thread-Safe Modules
|
|---|
| 131 |
|
|---|
| 132 | The addition of threads has changed Perl's internals
|
|---|
| 133 | substantially. There are implications for people who write
|
|---|
| 134 | modules with XS code or external libraries. However, since perl data is
|
|---|
| 135 | not shared among threads by default, Perl modules stand a high chance of
|
|---|
| 136 | being thread-safe or can be made thread-safe easily. Modules that are not
|
|---|
| 137 | tagged as thread-safe should be tested or code reviewed before being used
|
|---|
| 138 | in production code.
|
|---|
| 139 |
|
|---|
| 140 | Not all modules that you might use are thread-safe, and you should
|
|---|
| 141 | always assume a module is unsafe unless the documentation says
|
|---|
| 142 | otherwise. This includes modules that are distributed as part of the
|
|---|
| 143 | core. Threads are a new feature, and even some of the standard
|
|---|
| 144 | modules aren't thread-safe.
|
|---|
| 145 |
|
|---|
| 146 | Even if a module is thread-safe, it doesn't mean that the module is optimized
|
|---|
| 147 | to work well with threads. A module could possibly be rewritten to utilize
|
|---|
| 148 | the new features in threaded Perl to increase performance in a threaded
|
|---|
| 149 | environment.
|
|---|
| 150 |
|
|---|
| 151 | If you're using a module that's not thread-safe for some reason, you
|
|---|
| 152 | can protect yourself by using it from one, and only one thread at all.
|
|---|
| 153 | If you need multiple threads to access such a module, you can use semaphores and
|
|---|
| 154 | lots of programming discipline to control access to it. Semaphores
|
|---|
| 155 | are covered in L</"Basic semaphores">.
|
|---|
| 156 |
|
|---|
| 157 | See also L</"Thread-Safety of System Libraries">.
|
|---|
| 158 |
|
|---|
| 159 | =head1 Thread Basics
|
|---|
| 160 |
|
|---|
| 161 | The core L<threads> module provides the basic functions you need to write
|
|---|
| 162 | threaded programs. In the following sections we'll cover the basics,
|
|---|
| 163 | showing you what you need to do to create a threaded program. After
|
|---|
| 164 | that, we'll go over some of the features of the L<threads> module that
|
|---|
| 165 | make threaded programming easier.
|
|---|
| 166 |
|
|---|
| 167 | =head2 Basic Thread Support
|
|---|
| 168 |
|
|---|
| 169 | Thread support is a Perl compile-time option - it's something that's
|
|---|
| 170 | turned on or off when Perl is built at your site, rather than when
|
|---|
| 171 | your programs are compiled. If your Perl wasn't compiled with thread
|
|---|
| 172 | support enabled, then any attempt to use threads will fail.
|
|---|
| 173 |
|
|---|
| 174 | Your programs can use the Config module to check whether threads are
|
|---|
| 175 | enabled. If your program can't run without them, you can say something
|
|---|
| 176 | like:
|
|---|
| 177 |
|
|---|
| 178 | $Config{useithreads} or die "Recompile Perl with threads to run this program.";
|
|---|
| 179 |
|
|---|
| 180 | A possibly-threaded program using a possibly-threaded module might
|
|---|
| 181 | have code like this:
|
|---|
| 182 |
|
|---|
| 183 | use Config;
|
|---|
| 184 | use MyMod;
|
|---|
| 185 |
|
|---|
| 186 | BEGIN {
|
|---|
| 187 | if ($Config{useithreads}) {
|
|---|
| 188 | # We have threads
|
|---|
| 189 | require MyMod_threaded;
|
|---|
| 190 | import MyMod_threaded;
|
|---|
| 191 | } else {
|
|---|
| 192 | require MyMod_unthreaded;
|
|---|
| 193 | import MyMod_unthreaded;
|
|---|
| 194 | }
|
|---|
| 195 | }
|
|---|
| 196 |
|
|---|
| 197 | Since code that runs both with and without threads is usually pretty
|
|---|
| 198 | messy, it's best to isolate the thread-specific code in its own
|
|---|
| 199 | module. In our example above, that's what MyMod_threaded is, and it's
|
|---|
| 200 | only imported if we're running on a threaded Perl.
|
|---|
| 201 |
|
|---|
| 202 | =head2 A Note about the Examples
|
|---|
| 203 |
|
|---|
| 204 | Although thread support is considered to be stable, there are still a number
|
|---|
| 205 | of quirks that may startle you when you try out any of the examples below.
|
|---|
| 206 | In a real situation, care should be taken that all threads are finished
|
|---|
| 207 | executing before the program exits. That care has B<not> been taken in these
|
|---|
| 208 | examples in the interest of simplicity. Running these examples "as is" will
|
|---|
| 209 | produce error messages, usually caused by the fact that there are still
|
|---|
| 210 | threads running when the program exits. You should not be alarmed by this.
|
|---|
| 211 | Future versions of Perl may fix this problem.
|
|---|
| 212 |
|
|---|
| 213 | =head2 Creating Threads
|
|---|
| 214 |
|
|---|
| 215 | The L<threads> package provides the tools you need to create new
|
|---|
| 216 | threads. Like any other module, you need to tell Perl that you want to use
|
|---|
| 217 | it; C<use threads> imports all the pieces you need to create basic
|
|---|
| 218 | threads.
|
|---|
| 219 |
|
|---|
| 220 | The simplest, most straightforward way to create a thread is with new():
|
|---|
| 221 |
|
|---|
| 222 | use threads;
|
|---|
| 223 |
|
|---|
| 224 | $thr = threads->new(\&sub1);
|
|---|
| 225 |
|
|---|
| 226 | sub sub1 {
|
|---|
| 227 | print "In the thread\n";
|
|---|
| 228 | }
|
|---|
| 229 |
|
|---|
| 230 | The new() method takes a reference to a subroutine and creates a new
|
|---|
| 231 | thread, which starts executing in the referenced subroutine. Control
|
|---|
| 232 | then passes both to the subroutine and the caller.
|
|---|
| 233 |
|
|---|
| 234 | If you need to, your program can pass parameters to the subroutine as
|
|---|
| 235 | part of the thread startup. Just include the list of parameters as
|
|---|
| 236 | part of the C<threads::new> call, like this:
|
|---|
| 237 |
|
|---|
| 238 | use threads;
|
|---|
| 239 |
|
|---|
| 240 | $Param3 = "foo";
|
|---|
| 241 | $thr = threads->new(\&sub1, "Param 1", "Param 2", $Param3);
|
|---|
| 242 | $thr = threads->new(\&sub1, @ParamList);
|
|---|
| 243 | $thr = threads->new(\&sub1, qw(Param1 Param2 Param3));
|
|---|
| 244 |
|
|---|
| 245 | sub sub1 {
|
|---|
| 246 | my @InboundParameters = @_;
|
|---|
| 247 | print "In the thread\n";
|
|---|
| 248 | print "got parameters >", join("<>", @InboundParameters), "<\n";
|
|---|
| 249 | }
|
|---|
| 250 |
|
|---|
| 251 |
|
|---|
| 252 | The last example illustrates another feature of threads. You can spawn
|
|---|
| 253 | off several threads using the same subroutine. Each thread executes
|
|---|
| 254 | the same subroutine, but in a separate thread with a separate
|
|---|
| 255 | environment and potentially separate arguments.
|
|---|
| 256 |
|
|---|
| 257 | C<create()> is a synonym for C<new()>.
|
|---|
| 258 |
|
|---|
| 259 | =head2 Waiting For A Thread To Exit
|
|---|
| 260 |
|
|---|
| 261 | Since threads are also subroutines, they can return values. To wait
|
|---|
| 262 | for a thread to exit and extract any values it might return, you can
|
|---|
| 263 | use the join() method:
|
|---|
| 264 |
|
|---|
| 265 | use threads;
|
|---|
| 266 |
|
|---|
| 267 | $thr = threads->new(\&sub1);
|
|---|
| 268 |
|
|---|
| 269 | @ReturnData = $thr->join;
|
|---|
| 270 | print "Thread returned @ReturnData";
|
|---|
| 271 |
|
|---|
| 272 | sub sub1 { return "Fifty-six", "foo", 2; }
|
|---|
| 273 |
|
|---|
| 274 | In the example above, the join() method returns as soon as the thread
|
|---|
| 275 | ends. In addition to waiting for a thread to finish and gathering up
|
|---|
| 276 | any values that the thread might have returned, join() also performs
|
|---|
| 277 | any OS cleanup necessary for the thread. That cleanup might be
|
|---|
| 278 | important, especially for long-running programs that spawn lots of
|
|---|
| 279 | threads. If you don't want the return values and don't want to wait
|
|---|
| 280 | for the thread to finish, you should call the detach() method
|
|---|
| 281 | instead, as described next.
|
|---|
| 282 |
|
|---|
| 283 | =head2 Ignoring A Thread
|
|---|
| 284 |
|
|---|
| 285 | join() does three things: it waits for a thread to exit, cleans up
|
|---|
| 286 | after it, and returns any data the thread may have produced. But what
|
|---|
| 287 | if you're not interested in the thread's return values, and you don't
|
|---|
| 288 | really care when the thread finishes? All you want is for the thread
|
|---|
| 289 | to get cleaned up after when it's done.
|
|---|
| 290 |
|
|---|
| 291 | In this case, you use the detach() method. Once a thread is detached,
|
|---|
| 292 | it'll run until it's finished, then Perl will clean up after it
|
|---|
| 293 | automatically.
|
|---|
| 294 |
|
|---|
| 295 | use threads;
|
|---|
| 296 |
|
|---|
| 297 | $thr = threads->new(\&sub1); # Spawn the thread
|
|---|
| 298 |
|
|---|
| 299 | $thr->detach; # Now we officially don't care any more
|
|---|
| 300 |
|
|---|
| 301 | sub sub1 {
|
|---|
| 302 | $a = 0;
|
|---|
| 303 | while (1) {
|
|---|
| 304 | $a++;
|
|---|
| 305 | print "\$a is $a\n";
|
|---|
| 306 | sleep 1;
|
|---|
| 307 | }
|
|---|
| 308 | }
|
|---|
| 309 |
|
|---|
| 310 | Once a thread is detached, it may not be joined, and any return data
|
|---|
| 311 | that it might have produced (if it was done and waiting for a join) is
|
|---|
| 312 | lost.
|
|---|
| 313 |
|
|---|
| 314 | =head1 Threads And Data
|
|---|
| 315 |
|
|---|
| 316 | Now that we've covered the basics of threads, it's time for our next
|
|---|
| 317 | topic: data. Threading introduces a couple of complications to data
|
|---|
| 318 | access that non-threaded programs never need to worry about.
|
|---|
| 319 |
|
|---|
| 320 | =head2 Shared And Unshared Data
|
|---|
| 321 |
|
|---|
| 322 | The biggest difference between Perl ithreads and the old 5.005 style
|
|---|
| 323 | threading, or for that matter, to most other threading systems out there,
|
|---|
| 324 | is that by default, no data is shared. When a new perl thread is created,
|
|---|
| 325 | all the data associated with the current thread is copied to the new
|
|---|
| 326 | thread, and is subsequently private to that new thread!
|
|---|
| 327 | This is similar in feel to what happens when a UNIX process forks,
|
|---|
| 328 | except that in this case, the data is just copied to a different part of
|
|---|
| 329 | memory within the same process rather than a real fork taking place.
|
|---|
| 330 |
|
|---|
| 331 | To make use of threading however, one usually wants the threads to share
|
|---|
| 332 | at least some data between themselves. This is done with the
|
|---|
| 333 | L<threads::shared> module and the C< : shared> attribute:
|
|---|
| 334 |
|
|---|
| 335 | use threads;
|
|---|
| 336 | use threads::shared;
|
|---|
| 337 |
|
|---|
| 338 | my $foo : shared = 1;
|
|---|
| 339 | my $bar = 1;
|
|---|
| 340 | threads->new(sub { $foo++; $bar++ })->join;
|
|---|
| 341 |
|
|---|
| 342 | print "$foo\n"; #prints 2 since $foo is shared
|
|---|
| 343 | print "$bar\n"; #prints 1 since $bar is not shared
|
|---|
| 344 |
|
|---|
| 345 | In the case of a shared array, all the array's elements are shared, and for
|
|---|
| 346 | a shared hash, all the keys and values are shared. This places
|
|---|
| 347 | restrictions on what may be assigned to shared array and hash elements: only
|
|---|
| 348 | simple values or references to shared variables are allowed - this is
|
|---|
| 349 | so that a private variable can't accidentally become shared. A bad
|
|---|
| 350 | assignment will cause the thread to die. For example:
|
|---|
| 351 |
|
|---|
| 352 | use threads;
|
|---|
| 353 | use threads::shared;
|
|---|
| 354 |
|
|---|
| 355 | my $var = 1;
|
|---|
| 356 | my $svar : shared = 2;
|
|---|
| 357 | my %hash : shared;
|
|---|
| 358 |
|
|---|
| 359 | ... create some threads ...
|
|---|
| 360 |
|
|---|
| 361 | $hash{a} = 1; # all threads see exists($hash{a}) and $hash{a} == 1
|
|---|
| 362 | $hash{a} = $var # okay - copy-by-value: same effect as previous
|
|---|
| 363 | $hash{a} = $svar # okay - copy-by-value: same effect as previous
|
|---|
| 364 | $hash{a} = \$svar # okay - a reference to a shared variable
|
|---|
| 365 | $hash{a} = \$var # This will die
|
|---|
| 366 | delete $hash{a} # okay - all threads will see !exists($hash{a})
|
|---|
| 367 |
|
|---|
| 368 | Note that a shared variable guarantees that if two or more threads try to
|
|---|
| 369 | modify it at the same time, the internal state of the variable will not
|
|---|
| 370 | become corrupted. However, there are no guarantees beyond this, as
|
|---|
| 371 | explained in the next section.
|
|---|
| 372 |
|
|---|
| 373 | =head2 Thread Pitfalls: Races
|
|---|
| 374 |
|
|---|
| 375 | While threads bring a new set of useful tools, they also bring a
|
|---|
| 376 | number of pitfalls. One pitfall is the race condition:
|
|---|
| 377 |
|
|---|
| 378 | use threads;
|
|---|
| 379 | use threads::shared;
|
|---|
| 380 |
|
|---|
| 381 | my $a : shared = 1;
|
|---|
| 382 | $thr1 = threads->new(\&sub1);
|
|---|
| 383 | $thr2 = threads->new(\&sub2);
|
|---|
| 384 |
|
|---|
| 385 | $thr1->join;
|
|---|
| 386 | $thr2->join;
|
|---|
| 387 | print "$a\n";
|
|---|
| 388 |
|
|---|
| 389 | sub sub1 { my $foo = $a; $a = $foo + 1; }
|
|---|
| 390 | sub sub2 { my $bar = $a; $a = $bar + 1; }
|
|---|
| 391 |
|
|---|
| 392 | What do you think $a will be? The answer, unfortunately, is "it
|
|---|
| 393 | depends." Both sub1() and sub2() access the global variable $a, once
|
|---|
| 394 | to read and once to write. Depending on factors ranging from your
|
|---|
| 395 | thread implementation's scheduling algorithm to the phase of the moon,
|
|---|
| 396 | $a can be 2 or 3.
|
|---|
| 397 |
|
|---|
| 398 | Race conditions are caused by unsynchronized access to shared
|
|---|
| 399 | data. Without explicit synchronization, there's no way to be sure that
|
|---|
| 400 | nothing has happened to the shared data between the time you access it
|
|---|
| 401 | and the time you update it. Even this simple code fragment has the
|
|---|
| 402 | possibility of error:
|
|---|
| 403 |
|
|---|
| 404 | use threads;
|
|---|
| 405 | my $a : shared = 2;
|
|---|
| 406 | my $b : shared;
|
|---|
| 407 | my $c : shared;
|
|---|
| 408 | my $thr1 = threads->create(sub { $b = $a; $a = $b + 1; });
|
|---|
| 409 | my $thr2 = threads->create(sub { $c = $a; $a = $c + 1; });
|
|---|
| 410 | $thr1->join;
|
|---|
| 411 | $thr2->join;
|
|---|
| 412 |
|
|---|
| 413 | Two threads both access $a. Each thread can potentially be interrupted
|
|---|
| 414 | at any point, or be executed in any order. At the end, $a could be 3
|
|---|
| 415 | or 4, and both $b and $c could be 2 or 3.
|
|---|
| 416 |
|
|---|
| 417 | Even C<$a += 5> or C<$a++> are not guaranteed to be atomic.
|
|---|
| 418 |
|
|---|
| 419 | Whenever your program accesses data or resources that can be accessed
|
|---|
| 420 | by other threads, you must take steps to coordinate access or risk
|
|---|
| 421 | data inconsistency and race conditions. Note that Perl will protect its
|
|---|
| 422 | internals from your race conditions, but it won't protect you from you.
|
|---|
| 423 |
|
|---|
| 424 | =head1 Synchronization and control
|
|---|
| 425 |
|
|---|
| 426 | Perl provides a number of mechanisms to coordinate the interactions
|
|---|
| 427 | between themselves and their data, to avoid race conditions and the like.
|
|---|
| 428 | Some of these are designed to resemble the common techniques used in thread
|
|---|
| 429 | libraries such as C<pthreads>; others are Perl-specific. Often, the
|
|---|
| 430 | standard techniques are clumsy and difficult to get right (such as
|
|---|
| 431 | condition waits). Where possible, it is usually easier to use Perlish
|
|---|
| 432 | techniques such as queues, which remove some of the hard work involved.
|
|---|
| 433 |
|
|---|
| 434 | =head2 Controlling access: lock()
|
|---|
| 435 |
|
|---|
| 436 | The lock() function takes a shared variable and puts a lock on it.
|
|---|
| 437 | No other thread may lock the variable until the variable is unlocked
|
|---|
| 438 | by the thread holding the lock. Unlocking happens automatically
|
|---|
| 439 | when the locking thread exits the outermost block that contains
|
|---|
| 440 | C<lock()> function. Using lock() is straightforward: this example has
|
|---|
| 441 | several threads doing some calculations in parallel, and occasionally
|
|---|
| 442 | updating a running total:
|
|---|
| 443 |
|
|---|
| 444 | use threads;
|
|---|
| 445 | use threads::shared;
|
|---|
| 446 |
|
|---|
| 447 | my $total : shared = 0;
|
|---|
| 448 |
|
|---|
| 449 | sub calc {
|
|---|
| 450 | for (;;) {
|
|---|
| 451 | my $result;
|
|---|
| 452 | # (... do some calculations and set $result ...)
|
|---|
| 453 | {
|
|---|
| 454 | lock($total); # block until we obtain the lock
|
|---|
| 455 | $total += $result;
|
|---|
| 456 | } # lock implicitly released at end of scope
|
|---|
| 457 | last if $result == 0;
|
|---|
| 458 | }
|
|---|
| 459 | }
|
|---|
| 460 |
|
|---|
| 461 | my $thr1 = threads->new(\&calc);
|
|---|
| 462 | my $thr2 = threads->new(\&calc);
|
|---|
| 463 | my $thr3 = threads->new(\&calc);
|
|---|
| 464 | $thr1->join;
|
|---|
| 465 | $thr2->join;
|
|---|
| 466 | $thr3->join;
|
|---|
| 467 | print "total=$total\n";
|
|---|
| 468 |
|
|---|
| 469 |
|
|---|
| 470 | lock() blocks the thread until the variable being locked is
|
|---|
| 471 | available. When lock() returns, your thread can be sure that no other
|
|---|
| 472 | thread can lock that variable until the outermost block containing the
|
|---|
| 473 | lock exits.
|
|---|
| 474 |
|
|---|
| 475 | It's important to note that locks don't prevent access to the variable
|
|---|
| 476 | in question, only lock attempts. This is in keeping with Perl's
|
|---|
| 477 | longstanding tradition of courteous programming, and the advisory file
|
|---|
| 478 | locking that flock() gives you.
|
|---|
| 479 |
|
|---|
| 480 | You may lock arrays and hashes as well as scalars. Locking an array,
|
|---|
| 481 | though, will not block subsequent locks on array elements, just lock
|
|---|
| 482 | attempts on the array itself.
|
|---|
| 483 |
|
|---|
| 484 | Locks are recursive, which means it's okay for a thread to
|
|---|
| 485 | lock a variable more than once. The lock will last until the outermost
|
|---|
| 486 | lock() on the variable goes out of scope. For example:
|
|---|
| 487 |
|
|---|
| 488 | my $x : shared;
|
|---|
| 489 | doit();
|
|---|
| 490 |
|
|---|
| 491 | sub doit {
|
|---|
| 492 | {
|
|---|
| 493 | {
|
|---|
| 494 | lock($x); # wait for lock
|
|---|
| 495 | lock($x); # NOOP - we already have the lock
|
|---|
| 496 | {
|
|---|
| 497 | lock($x); # NOOP
|
|---|
| 498 | {
|
|---|
| 499 | lock($x); # NOOP
|
|---|
| 500 | lockit_some_more();
|
|---|
| 501 | }
|
|---|
| 502 | }
|
|---|
| 503 | } # *** implicit unlock here ***
|
|---|
| 504 | }
|
|---|
| 505 | }
|
|---|
| 506 |
|
|---|
| 507 | sub lockit_some_more {
|
|---|
| 508 | lock($x); # NOOP
|
|---|
| 509 | } # nothing happens here
|
|---|
| 510 |
|
|---|
| 511 | Note that there is no unlock() function - the only way to unlock a
|
|---|
| 512 | variable is to allow it to go out of scope.
|
|---|
| 513 |
|
|---|
| 514 | A lock can either be used to guard the data contained within the variable
|
|---|
| 515 | being locked, or it can be used to guard something else, like a section
|
|---|
| 516 | of code. In this latter case, the variable in question does not hold any
|
|---|
| 517 | useful data, and exists only for the purpose of being locked. In this
|
|---|
| 518 | respect, the variable behaves like the mutexes and basic semaphores of
|
|---|
| 519 | traditional thread libraries.
|
|---|
| 520 |
|
|---|
| 521 | =head2 A Thread Pitfall: Deadlocks
|
|---|
| 522 |
|
|---|
| 523 | Locks are a handy tool to synchronize access to data, and using them
|
|---|
| 524 | properly is the key to safe shared data. Unfortunately, locks aren't
|
|---|
| 525 | without their dangers, especially when multiple locks are involved.
|
|---|
| 526 | Consider the following code:
|
|---|
| 527 |
|
|---|
| 528 | use threads;
|
|---|
| 529 |
|
|---|
| 530 | my $a : shared = 4;
|
|---|
| 531 | my $b : shared = "foo";
|
|---|
| 532 | my $thr1 = threads->new(sub {
|
|---|
| 533 | lock($a);
|
|---|
| 534 | sleep 20;
|
|---|
| 535 | lock($b);
|
|---|
| 536 | });
|
|---|
| 537 | my $thr2 = threads->new(sub {
|
|---|
| 538 | lock($b);
|
|---|
| 539 | sleep 20;
|
|---|
| 540 | lock($a);
|
|---|
| 541 | });
|
|---|
| 542 |
|
|---|
| 543 | This program will probably hang until you kill it. The only way it
|
|---|
| 544 | won't hang is if one of the two threads acquires both locks
|
|---|
| 545 | first. A guaranteed-to-hang version is more complicated, but the
|
|---|
| 546 | principle is the same.
|
|---|
| 547 |
|
|---|
| 548 | The first thread will grab a lock on $a, then, after a pause during which
|
|---|
| 549 | the second thread has probably had time to do some work, try to grab a
|
|---|
| 550 | lock on $b. Meanwhile, the second thread grabs a lock on $b, then later
|
|---|
| 551 | tries to grab a lock on $a. The second lock attempt for both threads will
|
|---|
| 552 | block, each waiting for the other to release its lock.
|
|---|
| 553 |
|
|---|
| 554 | This condition is called a deadlock, and it occurs whenever two or
|
|---|
| 555 | more threads are trying to get locks on resources that the others
|
|---|
| 556 | own. Each thread will block, waiting for the other to release a lock
|
|---|
| 557 | on a resource. That never happens, though, since the thread with the
|
|---|
| 558 | resource is itself waiting for a lock to be released.
|
|---|
| 559 |
|
|---|
| 560 | There are a number of ways to handle this sort of problem. The best
|
|---|
| 561 | way is to always have all threads acquire locks in the exact same
|
|---|
| 562 | order. If, for example, you lock variables $a, $b, and $c, always lock
|
|---|
| 563 | $a before $b, and $b before $c. It's also best to hold on to locks for
|
|---|
| 564 | as short a period of time to minimize the risks of deadlock.
|
|---|
| 565 |
|
|---|
| 566 | The other synchronization primitives described below can suffer from
|
|---|
| 567 | similar problems.
|
|---|
| 568 |
|
|---|
| 569 | =head2 Queues: Passing Data Around
|
|---|
| 570 |
|
|---|
| 571 | A queue is a special thread-safe object that lets you put data in one
|
|---|
| 572 | end and take it out the other without having to worry about
|
|---|
| 573 | synchronization issues. They're pretty straightforward, and look like
|
|---|
| 574 | this:
|
|---|
| 575 |
|
|---|
| 576 | use threads;
|
|---|
| 577 | use Thread::Queue;
|
|---|
| 578 |
|
|---|
| 579 | my $DataQueue = Thread::Queue->new;
|
|---|
| 580 | $thr = threads->new(sub {
|
|---|
| 581 | while ($DataElement = $DataQueue->dequeue) {
|
|---|
| 582 | print "Popped $DataElement off the queue\n";
|
|---|
| 583 | }
|
|---|
| 584 | });
|
|---|
| 585 |
|
|---|
| 586 | $DataQueue->enqueue(12);
|
|---|
| 587 | $DataQueue->enqueue("A", "B", "C");
|
|---|
| 588 | $DataQueue->enqueue(\$thr);
|
|---|
| 589 | sleep 10;
|
|---|
| 590 | $DataQueue->enqueue(undef);
|
|---|
| 591 | $thr->join;
|
|---|
| 592 |
|
|---|
| 593 | You create the queue with C<new Thread::Queue>. Then you can
|
|---|
| 594 | add lists of scalars onto the end with enqueue(), and pop scalars off
|
|---|
| 595 | the front of it with dequeue(). A queue has no fixed size, and can grow
|
|---|
| 596 | as needed to hold everything pushed on to it.
|
|---|
| 597 |
|
|---|
| 598 | If a queue is empty, dequeue() blocks until another thread enqueues
|
|---|
| 599 | something. This makes queues ideal for event loops and other
|
|---|
| 600 | communications between threads.
|
|---|
| 601 |
|
|---|
| 602 | =head2 Semaphores: Synchronizing Data Access
|
|---|
| 603 |
|
|---|
| 604 | Semaphores are a kind of generic locking mechanism. In their most basic
|
|---|
| 605 | form, they behave very much like lockable scalars, except that they
|
|---|
| 606 | can't hold data, and that they must be explicitly unlocked. In their
|
|---|
| 607 | advanced form, they act like a kind of counter, and can allow multiple
|
|---|
| 608 | threads to have the 'lock' at any one time.
|
|---|
| 609 |
|
|---|
| 610 | =head2 Basic semaphores
|
|---|
| 611 |
|
|---|
| 612 | Semaphores have two methods, down() and up(): down() decrements the resource
|
|---|
| 613 | count, while up increments it. Calls to down() will block if the
|
|---|
| 614 | semaphore's current count would decrement below zero. This program
|
|---|
| 615 | gives a quick demonstration:
|
|---|
| 616 |
|
|---|
| 617 | use threads;
|
|---|
| 618 | use Thread::Semaphore;
|
|---|
| 619 |
|
|---|
| 620 | my $semaphore = new Thread::Semaphore;
|
|---|
| 621 | my $GlobalVariable : shared = 0;
|
|---|
| 622 |
|
|---|
| 623 | $thr1 = new threads \&sample_sub, 1;
|
|---|
| 624 | $thr2 = new threads \&sample_sub, 2;
|
|---|
| 625 | $thr3 = new threads \&sample_sub, 3;
|
|---|
| 626 |
|
|---|
| 627 | sub sample_sub {
|
|---|
| 628 | my $SubNumber = shift @_;
|
|---|
| 629 | my $TryCount = 10;
|
|---|
| 630 | my $LocalCopy;
|
|---|
| 631 | sleep 1;
|
|---|
| 632 | while ($TryCount--) {
|
|---|
| 633 | $semaphore->down;
|
|---|
| 634 | $LocalCopy = $GlobalVariable;
|
|---|
| 635 | print "$TryCount tries left for sub $SubNumber (\$GlobalVariable is $GlobalVariable)\n";
|
|---|
| 636 | sleep 2;
|
|---|
| 637 | $LocalCopy++;
|
|---|
| 638 | $GlobalVariable = $LocalCopy;
|
|---|
| 639 | $semaphore->up;
|
|---|
| 640 | }
|
|---|
| 641 | }
|
|---|
| 642 |
|
|---|
| 643 | $thr1->join;
|
|---|
| 644 | $thr2->join;
|
|---|
| 645 | $thr3->join;
|
|---|
| 646 |
|
|---|
| 647 | The three invocations of the subroutine all operate in sync. The
|
|---|
| 648 | semaphore, though, makes sure that only one thread is accessing the
|
|---|
| 649 | global variable at once.
|
|---|
| 650 |
|
|---|
| 651 | =head2 Advanced Semaphores
|
|---|
| 652 |
|
|---|
| 653 | By default, semaphores behave like locks, letting only one thread
|
|---|
| 654 | down() them at a time. However, there are other uses for semaphores.
|
|---|
| 655 |
|
|---|
| 656 | Each semaphore has a counter attached to it. By default, semaphores are
|
|---|
| 657 | created with the counter set to one, down() decrements the counter by
|
|---|
| 658 | one, and up() increments by one. However, we can override any or all
|
|---|
| 659 | of these defaults simply by passing in different values:
|
|---|
| 660 |
|
|---|
| 661 | use threads;
|
|---|
| 662 | use Thread::Semaphore;
|
|---|
| 663 | my $semaphore = Thread::Semaphore->new(5);
|
|---|
| 664 | # Creates a semaphore with the counter set to five
|
|---|
| 665 |
|
|---|
| 666 | $thr1 = threads->new(\&sub1);
|
|---|
| 667 | $thr2 = threads->new(\&sub1);
|
|---|
| 668 |
|
|---|
| 669 | sub sub1 {
|
|---|
| 670 | $semaphore->down(5); # Decrements the counter by five
|
|---|
| 671 | # Do stuff here
|
|---|
| 672 | $semaphore->up(5); # Increment the counter by five
|
|---|
| 673 | }
|
|---|
| 674 |
|
|---|
| 675 | $thr1->detach;
|
|---|
| 676 | $thr2->detach;
|
|---|
| 677 |
|
|---|
| 678 | If down() attempts to decrement the counter below zero, it blocks until
|
|---|
| 679 | the counter is large enough. Note that while a semaphore can be created
|
|---|
| 680 | with a starting count of zero, any up() or down() always changes the
|
|---|
| 681 | counter by at least one, and so $semaphore->down(0) is the same as
|
|---|
| 682 | $semaphore->down(1).
|
|---|
| 683 |
|
|---|
| 684 | The question, of course, is why would you do something like this? Why
|
|---|
|
|---|