| 1 | =head1 NAME
|
|---|
| 2 |
|
|---|
| 3 | perlothrtut - old tutorial on threads in Perl
|
|---|
| 4 |
|
|---|
| 5 | =head1 DESCRIPTION
|
|---|
| 6 |
|
|---|
| 7 | B<WARNING>:
|
|---|
| 8 | This tutorial describes the old-style thread model that was introduced in
|
|---|
| 9 | release 5.005. This model is now deprecated, and will be removed, probably
|
|---|
| 10 | in version 5.10. The interfaces described here were considered
|
|---|
| 11 | experimental, and are likely to be buggy.
|
|---|
| 12 |
|
|---|
| 13 | For information about the new interpreter threads ("ithreads") model, see
|
|---|
| 14 | the F<perlthrtut> tutorial, and the L<threads> and L<threads::shared>
|
|---|
| 15 | modules.
|
|---|
| 16 |
|
|---|
| 17 | You are strongly encouraged to migrate any existing threads code to the
|
|---|
| 18 | new model as soon as possible.
|
|---|
| 19 |
|
|---|
| 20 | =head1 What Is A Thread Anyway?
|
|---|
| 21 |
|
|---|
| 22 | A thread is a flow of control through a program with a single
|
|---|
| 23 | execution point.
|
|---|
| 24 |
|
|---|
| 25 | Sounds an awful lot like a process, doesn't it? Well, it should.
|
|---|
| 26 | Threads are one of the pieces of a process. Every process has at least
|
|---|
| 27 | one thread and, up until now, every process running Perl had only one
|
|---|
| 28 | thread. With 5.005, though, you can create extra threads. We're going
|
|---|
| 29 | to show you how, when, and why.
|
|---|
| 30 |
|
|---|
| 31 | =head1 Threaded Program Models
|
|---|
| 32 |
|
|---|
| 33 | There are three basic ways that you can structure a threaded
|
|---|
| 34 | program. Which model you choose depends on what you need your program
|
|---|
| 35 | to do. For many non-trivial threaded programs you'll need to choose
|
|---|
| 36 | different models for different pieces of your program.
|
|---|
| 37 |
|
|---|
| 38 | =head2 Boss/Worker
|
|---|
| 39 |
|
|---|
| 40 | The boss/worker model usually has one `boss' thread and one or more
|
|---|
| 41 | `worker' threads. The boss thread gathers or generates tasks that need
|
|---|
| 42 | to be done, then parcels those tasks out to the appropriate worker
|
|---|
| 43 | thread.
|
|---|
| 44 |
|
|---|
| 45 | This model is common in GUI and server programs, where a main thread
|
|---|
| 46 | waits for some event and then passes that event to the appropriate
|
|---|
| 47 | worker threads for processing. Once the event has been passed on, the
|
|---|
| 48 | boss thread goes back to waiting for another event.
|
|---|
| 49 |
|
|---|
| 50 | The boss thread does relatively little work. While tasks aren't
|
|---|
| 51 | necessarily performed faster than with any other method, it tends to
|
|---|
| 52 | have the best user-response times.
|
|---|
| 53 |
|
|---|
| 54 | =head2 Work Crew
|
|---|
| 55 |
|
|---|
| 56 | In the work crew model, several threads are created that do
|
|---|
| 57 | essentially the same thing to different pieces of data. It closely
|
|---|
| 58 | mirrors classical parallel processing and vector processors, where a
|
|---|
| 59 | large array of processors do the exact same thing to many pieces of
|
|---|
| 60 | data.
|
|---|
| 61 |
|
|---|
| 62 | This model is particularly useful if the system running the program
|
|---|
| 63 | will distribute multiple threads across different processors. It can
|
|---|
| 64 | also be useful in ray tracing or rendering engines, where the
|
|---|
| 65 | individual threads can pass on interim results to give the user visual
|
|---|
| 66 | feedback.
|
|---|
| 67 |
|
|---|
| 68 | =head2 Pipeline
|
|---|
| 69 |
|
|---|
| 70 | The pipeline model divides up a task into a series of steps, and
|
|---|
| 71 | passes the results of one step on to the thread processing the
|
|---|
| 72 | next. Each thread does one thing to each piece of data and passes the
|
|---|
| 73 | results to the next thread in line.
|
|---|
| 74 |
|
|---|
| 75 | This model makes the most sense if you have multiple processors so two
|
|---|
| 76 | or more threads will be executing in parallel, though it can often
|
|---|
| 77 | make sense in other contexts as well. It tends to keep the individual
|
|---|
| 78 | tasks small and simple, as well as allowing some parts of the pipeline
|
|---|
| 79 | to block (on I/O or system calls, for example) while other parts keep
|
|---|
| 80 | going. If you're running different parts of the pipeline on different
|
|---|
| 81 | processors you may also take advantage of the caches on each
|
|---|
| 82 | processor.
|
|---|
| 83 |
|
|---|
| 84 | This model is also handy for a form of recursive programming where,
|
|---|
| 85 | rather than having a subroutine call itself, it instead creates
|
|---|
| 86 | another thread. Prime and Fibonacci generators both map well to this
|
|---|
| 87 | form of the pipeline model. (A version of a prime number generator is
|
|---|
| 88 | presented later on.)
|
|---|
| 89 |
|
|---|
| 90 | =head1 Native threads
|
|---|
| 91 |
|
|---|
| 92 | There are several different ways to implement threads on a system. How
|
|---|
| 93 | threads are implemented depends both on the vendor and, in some cases,
|
|---|
| 94 | the version of the operating system. Often the first implementation
|
|---|
| 95 | will be relatively simple, but later versions of the OS will be more
|
|---|
| 96 | sophisticated.
|
|---|
| 97 |
|
|---|
| 98 | While the information in this section is useful, it's not necessary,
|
|---|
| 99 | so you can skip it if you don't feel up to it.
|
|---|
| 100 |
|
|---|
| 101 | There are three basic categories of threads-user-mode threads, kernel
|
|---|
| 102 | threads, and multiprocessor kernel threads.
|
|---|
| 103 |
|
|---|
| 104 | User-mode threads are threads that live entirely within a program and
|
|---|
| 105 | its libraries. In this model, the OS knows nothing about threads. As
|
|---|
| 106 | far as it's concerned, your process is just a process.
|
|---|
| 107 |
|
|---|
| 108 | This is the easiest way to implement threads, and the way most OSes
|
|---|
| 109 | start. The big disadvantage is that, since the OS knows nothing about
|
|---|
| 110 | threads, if one thread blocks they all do. Typical blocking activities
|
|---|
| 111 | include most system calls, most I/O, and things like sleep().
|
|---|
| 112 |
|
|---|
| 113 | Kernel threads are the next step in thread evolution. The OS knows
|
|---|
| 114 | about kernel threads, and makes allowances for them. The main
|
|---|
| 115 | difference between a kernel thread and a user-mode thread is
|
|---|
| 116 | blocking. With kernel threads, things that block a single thread don't
|
|---|
| 117 | block other threads. This is not the case with user-mode threads,
|
|---|
| 118 | where the kernel blocks at the process level and not the thread level.
|
|---|
| 119 |
|
|---|
| 120 | This is a big step forward, and can give a threaded program quite a
|
|---|
| 121 | performance boost over non-threaded programs. Threads that block
|
|---|
| 122 | performing I/O, for example, won't block threads that are doing other
|
|---|
| 123 | things. Each process still has only one thread running at once,
|
|---|
| 124 | though, regardless of how many CPUs a system might have.
|
|---|
| 125 |
|
|---|
| 126 | Since kernel threading can interrupt a thread at any time, they will
|
|---|
| 127 | uncover some of the implicit locking assumptions you may make in your
|
|---|
| 128 | program. For example, something as simple as C<$a = $a + 2> can behave
|
|---|
| 129 | unpredictably with kernel threads if $a is visible to other
|
|---|
| 130 | threads, as another thread may have changed $a between the time it
|
|---|
| 131 | was fetched on the right hand side and the time the new value is
|
|---|
| 132 | stored.
|
|---|
| 133 |
|
|---|
| 134 | Multiprocessor Kernel Threads are the final step in thread
|
|---|
| 135 | support. With multiprocessor kernel threads on a machine with multiple
|
|---|
| 136 | CPUs, the OS may schedule two or more threads to run simultaneously on
|
|---|
| 137 | different CPUs.
|
|---|
| 138 |
|
|---|
| 139 | This can give a serious performance boost to your threaded program,
|
|---|
| 140 | since more than one thread will be executing at the same time. As a
|
|---|
| 141 | tradeoff, though, any of those nagging synchronization issues that
|
|---|
| 142 | might not have shown with basic kernel threads will appear with a
|
|---|
| 143 | vengeance.
|
|---|
| 144 |
|
|---|
| 145 | In addition to the different levels of OS involvement in threads,
|
|---|
| 146 | different OSes (and different thread implementations for a particular
|
|---|
| 147 | OS) allocate CPU cycles to threads in different ways.
|
|---|
| 148 |
|
|---|
| 149 | Cooperative multitasking systems have running threads give up control
|
|---|
| 150 | if one of two things happen. If a thread calls a yield function, it
|
|---|
| 151 | gives up control. It also gives up control if the thread does
|
|---|
| 152 | something that would cause it to block, such as perform I/O. In a
|
|---|
| 153 | cooperative multitasking implementation, one thread can starve all the
|
|---|
| 154 | others for CPU time if it so chooses.
|
|---|
| 155 |
|
|---|
| 156 | Preemptive multitasking systems interrupt threads at regular intervals
|
|---|
| 157 | while the system decides which thread should run next. In a preemptive
|
|---|
| 158 | multitasking system, one thread usually won't monopolize the CPU.
|
|---|
| 159 |
|
|---|
| 160 | On some systems, there can be cooperative and preemptive threads
|
|---|
| 161 | running simultaneously. (Threads running with realtime priorities
|
|---|
| 162 | often behave cooperatively, for example, while threads running at
|
|---|
| 163 | normal priorities behave preemptively.)
|
|---|
| 164 |
|
|---|
| 165 | =head1 What kind of threads are perl threads?
|
|---|
| 166 |
|
|---|
| 167 | If you have experience with other thread implementations, you might
|
|---|
| 168 | find that things aren't quite what you expect. It's very important to
|
|---|
| 169 | remember when dealing with Perl threads that Perl Threads Are Not X
|
|---|
| 170 | Threads, for all values of X. They aren't POSIX threads, or
|
|---|
| 171 | DecThreads, or Java's Green threads, or Win32 threads. There are
|
|---|
| 172 | similarities, and the broad concepts are the same, but if you start
|
|---|
| 173 | looking for implementation details you're going to be either
|
|---|
| 174 | disappointed or confused. Possibly both.
|
|---|
| 175 |
|
|---|
| 176 | This is not to say that Perl threads are completely different from
|
|---|
| 177 | everything that's ever come before--they're not. Perl's threading
|
|---|
| 178 | model owes a lot to other thread models, especially POSIX. Just as
|
|---|
| 179 | Perl is not C, though, Perl threads are not POSIX threads. So if you
|
|---|
| 180 | find yourself looking for mutexes, or thread priorities, it's time to
|
|---|
| 181 | step back a bit and think about what you want to do and how Perl can
|
|---|
| 182 | do it.
|
|---|
| 183 |
|
|---|
| 184 | =head1 Threadsafe Modules
|
|---|
| 185 |
|
|---|
| 186 | The addition of threads has changed Perl's internals
|
|---|
| 187 | substantially. There are implications for people who write
|
|---|
| 188 | modules--especially modules with XS code or external libraries. While
|
|---|
| 189 | most modules won't encounter any problems, modules that aren't
|
|---|
| 190 | explicitly tagged as thread-safe should be tested before being used in
|
|---|
| 191 | production code.
|
|---|
| 192 |
|
|---|
| 193 | Not all modules that you might use are thread-safe, and you should
|
|---|
| 194 | always assume a module is unsafe unless the documentation says
|
|---|
| 195 | otherwise. This includes modules that are distributed as part of the
|
|---|
| 196 | core. Threads are a beta feature, and even some of the standard
|
|---|
| 197 | modules aren't thread-safe.
|
|---|
| 198 |
|
|---|
| 199 | If you're using a module that's not thread-safe for some reason, you
|
|---|
| 200 | can protect yourself by using semaphores and lots of programming
|
|---|
| 201 | discipline to control access to the module. Semaphores are covered
|
|---|
| 202 | later in the article. Perl Threads Are Different
|
|---|
| 203 |
|
|---|
| 204 | =head1 Thread Basics
|
|---|
| 205 |
|
|---|
| 206 | The core Thread module provides the basic functions you need to write
|
|---|
| 207 | threaded programs. In the following sections we'll cover the basics,
|
|---|
| 208 | showing you what you need to do to create a threaded program. After
|
|---|
| 209 | that, we'll go over some of the features of the Thread module that
|
|---|
| 210 | make threaded programming easier.
|
|---|
| 211 |
|
|---|
| 212 | =head2 Basic Thread Support
|
|---|
| 213 |
|
|---|
| 214 | Thread support is a Perl compile-time option-it's something that's
|
|---|
| 215 | turned on or off when Perl is built at your site, rather than when
|
|---|
| 216 | your programs are compiled. If your Perl wasn't compiled with thread
|
|---|
| 217 | support enabled, then any attempt to use threads will fail.
|
|---|
| 218 |
|
|---|
| 219 | Remember that the threading support in 5.005 is in beta release, and
|
|---|
| 220 | should be treated as such. You should expect that it may not function
|
|---|
| 221 | entirely properly, and the thread interface may well change some
|
|---|
| 222 | before it is a fully supported, production release. The beta version
|
|---|
| 223 | shouldn't be used for mission-critical projects. Having said that,
|
|---|
| 224 | threaded Perl is pretty nifty, and worth a look.
|
|---|
| 225 |
|
|---|
| 226 | Your programs can use the Config module to check whether threads are
|
|---|
| 227 | enabled. If your program can't run without them, you can say something
|
|---|
| 228 | like:
|
|---|
| 229 |
|
|---|
| 230 | $Config{usethreads} or die "Recompile Perl with threads to run this program.";
|
|---|
| 231 |
|
|---|
| 232 | A possibly-threaded program using a possibly-threaded module might
|
|---|
| 233 | have code like this:
|
|---|
| 234 |
|
|---|
| 235 | use Config;
|
|---|
| 236 | use MyMod;
|
|---|
| 237 |
|
|---|
| 238 | if ($Config{usethreads}) {
|
|---|
| 239 | # We have threads
|
|---|
| 240 | require MyMod_threaded;
|
|---|
| 241 | import MyMod_threaded;
|
|---|
| 242 | } else {
|
|---|
| 243 | require MyMod_unthreaded;
|
|---|
| 244 | import MyMod_unthreaded;
|
|---|
| 245 | }
|
|---|
| 246 |
|
|---|
| 247 | Since code that runs both with and without threads is usually pretty
|
|---|
| 248 | messy, it's best to isolate the thread-specific code in its own
|
|---|
| 249 | module. In our example above, that's what MyMod_threaded is, and it's
|
|---|
| 250 | only imported if we're running on a threaded Perl.
|
|---|
| 251 |
|
|---|
| 252 | =head2 Creating Threads
|
|---|
| 253 |
|
|---|
| 254 | The Thread package provides the tools you need to create new
|
|---|
| 255 | threads. Like any other module, you need to tell Perl you want to use
|
|---|
| 256 | it; use Thread imports all the pieces you need to create basic
|
|---|
| 257 | threads.
|
|---|
| 258 |
|
|---|
| 259 | The simplest, straightforward way to create a thread is with new():
|
|---|
| 260 |
|
|---|
| 261 | use Thread;
|
|---|
| 262 |
|
|---|
| 263 | $thr = new Thread \&sub1;
|
|---|
| 264 |
|
|---|
| 265 | sub sub1 {
|
|---|
| 266 | print "In the thread\n";
|
|---|
| 267 | }
|
|---|
| 268 |
|
|---|
| 269 | The new() method takes a reference to a subroutine and creates a new
|
|---|
| 270 | thread, which starts executing in the referenced subroutine. Control
|
|---|
| 271 | then passes both to the subroutine and the caller.
|
|---|
| 272 |
|
|---|
| 273 | If you need to, your program can pass parameters to the subroutine as
|
|---|
| 274 | part of the thread startup. Just include the list of parameters as
|
|---|
| 275 | part of the C<Thread::new> call, like this:
|
|---|
| 276 |
|
|---|
| 277 | use Thread;
|
|---|
| 278 | $Param3 = "foo";
|
|---|
| 279 | $thr = new Thread \&sub1, "Param 1", "Param 2", $Param3;
|
|---|
| 280 | $thr = new Thread \&sub1, @ParamList;
|
|---|
| 281 | $thr = new Thread \&sub1, qw(Param1 Param2 $Param3);
|
|---|
| 282 |
|
|---|
| 283 | sub sub1 {
|
|---|
| 284 | my @InboundParameters = @_;
|
|---|
| 285 | print "In the thread\n";
|
|---|
| 286 | print "got parameters >", join("<>", @InboundParameters), "<\n";
|
|---|
| 287 | }
|
|---|
| 288 |
|
|---|
| 289 |
|
|---|
| 290 | The subroutine runs like a normal Perl subroutine, and the call to new
|
|---|
| 291 | Thread returns whatever the subroutine returns.
|
|---|
| 292 |
|
|---|
| 293 | The last example illustrates another feature of threads. You can spawn
|
|---|
| 294 | off several threads using the same subroutine. Each thread executes
|
|---|
| 295 | the same subroutine, but in a separate thread with a separate
|
|---|
| 296 | environment and potentially separate arguments.
|
|---|
| 297 |
|
|---|
| 298 | The other way to spawn a new thread is with async(), which is a way to
|
|---|
| 299 | spin off a chunk of code like eval(), but into its own thread:
|
|---|
| 300 |
|
|---|
| 301 | use Thread qw(async);
|
|---|
| 302 |
|
|---|
| 303 | $LineCount = 0;
|
|---|
| 304 |
|
|---|
| 305 | $thr = async {
|
|---|
| 306 | while(<>) {$LineCount++}
|
|---|
| 307 | print "Got $LineCount lines\n";
|
|---|
| 308 | };
|
|---|
| 309 |
|
|---|
| 310 | print "Waiting for the linecount to end\n";
|
|---|
| 311 | $thr->join;
|
|---|
| 312 | print "All done\n";
|
|---|
| 313 |
|
|---|
| 314 | You'll notice we did a use Thread qw(async) in that example. async is
|
|---|
| 315 | not exported by default, so if you want it, you'll either need to
|
|---|
| 316 | import it before you use it or fully qualify it as
|
|---|
| 317 | Thread::async. You'll also note that there's a semicolon after the
|
|---|
| 318 | closing brace. That's because async() treats the following block as an
|
|---|
| 319 | anonymous subroutine, so the semicolon is necessary.
|
|---|
| 320 |
|
|---|
| 321 | Like eval(), the code executes in the same context as it would if it
|
|---|
| 322 | weren't spun off. Since both the code inside and after the async start
|
|---|
| 323 | executing, you need to be careful with any shared resources. Locking
|
|---|
| 324 | and other synchronization techniques are covered later.
|
|---|
| 325 |
|
|---|
| 326 | =head2 Giving up control
|
|---|
| 327 |
|
|---|
| 328 | There are times when you may find it useful to have a thread
|
|---|
| 329 | explicitly give up the CPU to another thread. Your threading package
|
|---|
| 330 | might not support preemptive multitasking for threads, for example, or
|
|---|
| 331 | you may be doing something compute-intensive and want to make sure
|
|---|
| 332 | that the user-interface thread gets called frequently. Regardless,
|
|---|
| 333 | there are times that you might want a thread to give up the processor.
|
|---|
| 334 |
|
|---|
| 335 | Perl's threading package provides the yield() function that does
|
|---|
| 336 | this. yield() is pretty straightforward, and works like this:
|
|---|
| 337 |
|
|---|
| 338 | use Thread qw(yield async);
|
|---|
| 339 | async {
|
|---|
| 340 | my $foo = 50;
|
|---|
| 341 | while ($foo--) { print "first async\n" }
|
|---|
| 342 | yield;
|
|---|
| 343 | $foo = 50;
|
|---|
| 344 | while ($foo--) { print "first async\n" }
|
|---|
| 345 | };
|
|---|
| 346 | async {
|
|---|
| 347 | my $foo = 50;
|
|---|
| 348 | while ($foo--) { print "second async\n" }
|
|---|
| 349 | yield;
|
|---|
| 350 | $foo = 50;
|
|---|
| 351 | while ($foo--) { print "second async\n" }
|
|---|
| 352 | };
|
|---|
| 353 |
|
|---|
| 354 | =head2 Waiting For A Thread To Exit
|
|---|
| 355 |
|
|---|
| 356 | Since threads are also subroutines, they can return values. To wait
|
|---|
| 357 | for a thread to exit and extract any scalars it might return, you can
|
|---|
| 358 | use the join() method.
|
|---|
| 359 |
|
|---|
| 360 | use Thread;
|
|---|
| 361 | $thr = new Thread \&sub1;
|
|---|
| 362 |
|
|---|
| 363 | @ReturnData = $thr->join;
|
|---|
| 364 | print "Thread returned @ReturnData";
|
|---|
| 365 |
|
|---|
| 366 | sub sub1 { return "Fifty-six", "foo", 2; }
|
|---|
| 367 |
|
|---|
| 368 | In the example above, the join() method returns as soon as the thread
|
|---|
| 369 | ends. In addition to waiting for a thread to finish and gathering up
|
|---|
| 370 | any values that the thread might have returned, join() also performs
|
|---|
| 371 | any OS cleanup necessary for the thread. That cleanup might be
|
|---|
| 372 | important, especially for long-running programs that spawn lots of
|
|---|
| 373 | threads. If you don't want the return values and don't want to wait
|
|---|
| 374 | for the thread to finish, you should call the detach() method
|
|---|
| 375 | instead. detach() is covered later in the article.
|
|---|
| 376 |
|
|---|
| 377 | =head2 Errors In Threads
|
|---|
| 378 |
|
|---|
| 379 | So what happens when an error occurs in a thread? Any errors that
|
|---|
| 380 | could be caught with eval() are postponed until the thread is
|
|---|
| 381 | joined. If your program never joins, the errors appear when your
|
|---|
| 382 | program exits.
|
|---|
| 383 |
|
|---|
| 384 | Errors deferred until a join() can be caught with eval():
|
|---|
| 385 |
|
|---|
| 386 | use Thread qw(async);
|
|---|
| 387 | $thr = async {$b = 3/0}; # Divide by zero error
|
|---|
| 388 | $foo = eval {$thr->join};
|
|---|
| 389 | if ($@) {
|
|---|
| 390 | print "died with error $@\n";
|
|---|
| 391 | } else {
|
|---|
| 392 | print "Hey, why aren't you dead?\n";
|
|---|
| 393 | }
|
|---|
| 394 |
|
|---|
| 395 | eval() passes any results from the joined thread back unmodified, so
|
|---|
| 396 | if you want the return value of the thread, this is your only chance
|
|---|
| 397 | to get them.
|
|---|
| 398 |
|
|---|
| 399 | =head2 Ignoring A Thread
|
|---|
| 400 |
|
|---|
| 401 | join() does three things: it waits for a thread to exit, cleans up
|
|---|
| 402 | after it, and returns any data the thread may have produced. But what
|
|---|
| 403 | if you're not interested in the thread's return values, and you don't
|
|---|
| 404 | really care when the thread finishes? All you want is for the thread
|
|---|
| 405 | to get cleaned up after when it's done.
|
|---|
| 406 |
|
|---|
| 407 | In this case, you use the detach() method. Once a thread is detached,
|
|---|
| 408 | it'll run until it's finished, then Perl will clean up after it
|
|---|
| 409 | automatically.
|
|---|
| 410 |
|
|---|
| 411 | use Thread;
|
|---|
| 412 | $thr = new Thread \&sub1; # Spawn the thread
|
|---|
| 413 |
|
|---|
| 414 | $thr->detach; # Now we officially don't care any more
|
|---|
| 415 |
|
|---|
| 416 | sub sub1 {
|
|---|
| 417 | $a = 0;
|
|---|
| 418 | while (1) {
|
|---|
| 419 | $a++;
|
|---|
| 420 | print "\$a is $a\n";
|
|---|
| 421 | sleep 1;
|
|---|
| 422 | }
|
|---|
| 423 | }
|
|---|
| 424 |
|
|---|
| 425 |
|
|---|
| 426 | Once a thread is detached, it may not be joined, and any output that
|
|---|
| 427 | it might have produced (if it was done and waiting for a join) is
|
|---|
| 428 | lost.
|
|---|
| 429 |
|
|---|
| 430 | =head1 Threads And Data
|
|---|
| 431 |
|
|---|
| 432 | Now that we've covered the basics of threads, it's time for our next
|
|---|
| 433 | topic: data. Threading introduces a couple of complications to data
|
|---|
| 434 | access that non-threaded programs never need to worry about.
|
|---|
| 435 |
|
|---|
| 436 | =head2 Shared And Unshared Data
|
|---|
| 437 |
|
|---|
| 438 | The single most important thing to remember when using threads is that
|
|---|
| 439 | all threads potentially have access to all the data anywhere in your
|
|---|
| 440 | program. While this is true with a nonthreaded Perl program as well,
|
|---|
| 441 | it's especially important to remember with a threaded program, since
|
|---|
| 442 | more than one thread can be accessing this data at once.
|
|---|
| 443 |
|
|---|
| 444 | Perl's scoping rules don't change because you're using threads. If a
|
|---|
| 445 | subroutine (or block, in the case of async()) could see a variable if
|
|---|
| 446 | you weren't running with threads, it can see it if you are. This is
|
|---|
| 447 | especially important for the subroutines that create, and makes C<my>
|
|---|
| 448 | variables even more important. Remember--if your variables aren't
|
|---|
| 449 | lexically scoped (declared with C<my>) you're probably sharing them
|
|---|
| 450 | between threads.
|
|---|
| 451 |
|
|---|
| 452 | =head2 Thread Pitfall: Races
|
|---|
| 453 |
|
|---|
| 454 | While threads bring a new set of useful tools, they also bring a
|
|---|
| 455 | number of pitfalls. One pitfall is the race condition:
|
|---|
| 456 |
|
|---|
| 457 | use Thread;
|
|---|
| 458 | $a = 1;
|
|---|
| 459 | $thr1 = Thread->new(\&sub1);
|
|---|
| 460 | $thr2 = Thread->new(\&sub2);
|
|---|
| 461 |
|
|---|
| 462 | sleep 10;
|
|---|
| 463 | print "$a\n";
|
|---|
| 464 |
|
|---|
| 465 | sub sub1 { $foo = $a; $a = $foo + 1; }
|
|---|
| 466 | sub sub2 { $bar = $a; $a = $bar + 1; }
|
|---|
| 467 |
|
|---|
| 468 | What do you think $a will be? The answer, unfortunately, is "it
|
|---|
| 469 | depends." Both sub1() and sub2() access the global variable $a, once
|
|---|
| 470 | to read and once to write. Depending on factors ranging from your
|
|---|
| 471 | thread implementation's scheduling algorithm to the phase of the moon,
|
|---|
| 472 | $a can be 2 or 3.
|
|---|
| 473 |
|
|---|
| 474 | Race conditions are caused by unsynchronized access to shared
|
|---|
| 475 | data. Without explicit synchronization, there's no way to be sure that
|
|---|
| 476 | nothing has happened to the shared data between the time you access it
|
|---|
| 477 | and the time you update it. Even this simple code fragment has the
|
|---|
| 478 | possibility of error:
|
|---|
| 479 |
|
|---|
| 480 | use Thread qw(async);
|
|---|
| 481 | $a = 2;
|
|---|
| 482 | async{ $b = $a; $a = $b + 1; };
|
|---|
| 483 | async{ $c = $a; $a = $c + 1; };
|
|---|
| 484 |
|
|---|
| 485 | Two threads both access $a. Each thread can potentially be interrupted
|
|---|
| 486 | at any point, or be executed in any order. At the end, $a could be 3
|
|---|
| 487 | or 4, and both $b and $c could be 2 or 3.
|
|---|
| 488 |
|
|---|
| 489 | Whenever your program accesses data or resources that can be accessed
|
|---|
| 490 | by other threads, you must take steps to coordinate access or risk
|
|---|
| 491 | data corruption and race conditions.
|
|---|
| 492 |
|
|---|
| 493 | =head2 Controlling access: lock()
|
|---|
| 494 |
|
|---|
| 495 | The lock() function takes a variable (or subroutine, but we'll get to
|
|---|
| 496 | that later) and puts a lock on it. No other thread may lock the
|
|---|
| 497 | variable until the locking thread exits the innermost block containing
|
|---|
| 498 | the lock. Using lock() is straightforward:
|
|---|
| 499 |
|
|---|
| 500 | use Thread qw(async);
|
|---|
| 501 | $a = 4;
|
|---|
| 502 | $thr1 = async {
|
|---|
| 503 | $foo = 12;
|
|---|
| 504 | {
|
|---|
| 505 | lock ($a); # Block until we get access to $a
|
|---|
| 506 | $b = $a;
|
|---|
| 507 | $a = $b * $foo;
|
|---|
| 508 | }
|
|---|
| 509 | print "\$foo was $foo\n";
|
|---|
| 510 | };
|
|---|
| 511 | $thr2 = async {
|
|---|
| 512 | $bar = 7;
|
|---|
| 513 | {
|
|---|
| 514 | lock ($a); # Block until we can get access to $a
|
|---|
| 515 | $c = $a;
|
|---|
| 516 | $a = $c * $bar;
|
|---|
| 517 | }
|
|---|
| 518 | print "\$bar was $bar\n";
|
|---|
| 519 | };
|
|---|
| 520 | $thr1->join;
|
|---|
| 521 | $thr2->join;
|
|---|
| 522 | print "\$a is $a\n";
|
|---|
| 523 |
|
|---|
| 524 | lock() blocks the thread until the variable being locked is
|
|---|
| 525 | available. When lock() returns, your thread can be sure that no other
|
|---|
| 526 | thread can lock that variable until the innermost block containing the
|
|---|
| 527 | lock exits.
|
|---|
| 528 |
|
|---|
| 529 | It's important to note that locks don't prevent access to the variable
|
|---|
| 530 | in question, only lock attempts. This is in keeping with Perl's
|
|---|
| 531 | longstanding tradition of courteous programming, and the advisory file
|
|---|
| 532 | locking that flock() gives you. Locked subroutines behave differently,
|
|---|
| 533 | however. We'll cover that later in the article.
|
|---|
| 534 |
|
|---|
| 535 | You may lock arrays and hashes as well as scalars. Locking an array,
|
|---|
| 536 | though, will not block subsequent locks on array elements, just lock
|
|---|
| 537 | attempts on the array itself.
|
|---|
| 538 |
|
|---|
| 539 | Finally, locks are recursive, which means it's okay for a thread to
|
|---|
| 540 | lock a variable more than once. The lock will last until the outermost
|
|---|
| 541 | lock() on the variable goes out of scope.
|
|---|
| 542 |
|
|---|
| 543 | =head2 Thread Pitfall: Deadlocks
|
|---|
| 544 |
|
|---|
| 545 | Locks are a handy tool to synchronize access to data. Using them
|
|---|
| 546 | properly is the key to safe shared data. Unfortunately, locks aren't
|
|---|
| 547 | without their dangers. Consider the following code:
|
|---|
| 548 |
|
|---|
| 549 | use Thread qw(async yield);
|
|---|
| 550 | $a = 4;
|
|---|
| 551 | $b = "foo";
|
|---|
| 552 | async {
|
|---|
| 553 | lock($a);
|
|---|
| 554 | yield;
|
|---|
| 555 | sleep 20;
|
|---|
| 556 | lock ($b);
|
|---|
| 557 | };
|
|---|
| 558 | async {
|
|---|
| 559 | lock($b);
|
|---|
| 560 | yield;
|
|---|
| 561 | sleep 20;
|
|---|
| 562 | lock ($a);
|
|---|
| 563 | };
|
|---|
| 564 |
|
|---|
| 565 | This program will probably hang until you kill it. The only way it
|
|---|
| 566 | won't hang is if one of the two async() routines acquires both locks
|
|---|
| 567 | first. A guaranteed-to-hang version is more complicated, but the
|
|---|
| 568 | principle is the same.
|
|---|
| 569 |
|
|---|
| 570 | The first thread spawned by async() will grab a lock on $a then, a
|
|---|
| 571 | second or two later, try to grab a lock on $b. Meanwhile, the second
|
|---|
| 572 | thread grabs a lock on $b, then later tries to grab a lock on $a. The
|
|---|
| 573 | second lock attempt for both threads will block, each waiting for the
|
|---|
| 574 | other to release its lock.
|
|---|
| 575 |
|
|---|
| 576 | This condition is called a deadlock, and it occurs whenever two or
|
|---|
| 577 | more threads are trying to get locks on resources that the others
|
|---|
| 578 | own. Each thread will block, waiting for the other to release a lock
|
|---|
| 579 | on a resource. That never happens, though, since the thread with the
|
|---|
| 580 | resource is itself waiting for a lock to be released.
|
|---|
| 581 |
|
|---|
| 582 | There are a number of ways to handle this sort of problem. The best
|
|---|
| 583 | way is to always have all threads acquire locks in the exact same
|
|---|
| 584 | order. If, for example, you lock variables $a, $b, and $c, always lock
|
|---|
| 585 | $a before $b, and $b before $c. It's also best to hold on to locks for
|
|---|
| 586 | as short a period of time to minimize the risks of deadlock.
|
|---|
| 587 |
|
|---|
| 588 | =head2 Queues: Passing Data Around
|
|---|
| 589 |
|
|---|
| 590 | A queue is a special thread-safe object that lets you put data in one
|
|---|
| 591 | end and take it out the other without having to worry about
|
|---|
| 592 | synchronization issues. They're pretty straightforward, and look like
|
|---|
| 593 | this:
|
|---|
| 594 |
|
|---|
| 595 | use Thread qw(async);
|
|---|
| 596 | use Thread::Queue;
|
|---|
| 597 |
|
|---|
| 598 | my $DataQueue = new Thread::Queue;
|
|---|
| 599 | $thr = async {
|
|---|
| 600 | while ($DataElement = $DataQueue->dequeue) {
|
|---|
| 601 | print "Popped $DataElement off the queue\n";
|
|---|
| 602 | }
|
|---|
| 603 | };
|
|---|
| 604 |
|
|---|
| 605 | $DataQueue->enqueue(12);
|
|---|
| 606 | $DataQueue->enqueue("A", "B", "C");
|
|---|
| 607 | $DataQueue->enqueue(\$thr);
|
|---|
| 608 | sleep 10;
|
|---|
| 609 | $DataQueue->enqueue(undef);
|
|---|
| 610 |
|
|---|
| 611 | You create the queue with new Thread::Queue. Then you can add lists of
|
|---|
| 612 | scalars onto the end with enqueue(), and pop scalars off the front of
|
|---|
| 613 | it with dequeue(). A queue has no fixed size, and can grow as needed
|
|---|
| 614 | to hold everything pushed on to it.
|
|---|
| 615 |
|
|---|
| 616 | If a queue is empty, dequeue() blocks until another thread enqueues
|
|---|
| 617 | something. This makes queues ideal for event loops and other
|
|---|
| 618 | communications between threads.
|
|---|
| 619 |
|
|---|
| 620 | =head1 Threads And Code
|
|---|
| 621 |
|
|---|
| 622 | In addition to providing thread-safe access to data via locks and
|
|---|
| 623 | queues, threaded Perl also provides general-purpose semaphores for
|
|---|
| 624 | coarser synchronization than locks provide and thread-safe access to
|
|---|
| 625 | entire subroutines.
|
|---|
| 626 |
|
|---|
| 627 | =head2 Semaphores: Synchronizing Data Access
|
|---|
| 628 |
|
|---|
| 629 | Semaphores are a kind of generic locking mechanism. Unlike lock, which
|
|---|
| 630 | gets a lock on a particular scalar, Perl doesn't associate any
|
|---|
| 631 | particular thing with a semaphore so you can use them to control
|
|---|
| 632 | access to anything you like. In addition, semaphores can allow more
|
|---|
| 633 | than one thread to access a resource at once, though by default
|
|---|
| 634 | semaphores only allow one thread access at a time.
|
|---|
| 635 |
|
|---|
| 636 | =over 4
|
|---|
| 637 |
|
|---|
| 638 | =item Basic semaphores
|
|---|
| 639 |
|
|---|
| 640 | Semaphores have two methods, down and up. down decrements the resource
|
|---|
| 641 | count, while up increments it. down calls will block if the
|
|---|
| 642 | semaphore's current count would decrement below zero. This program
|
|---|
| 643 | gives a quick demonstration:
|
|---|
| 644 |
|
|---|
| 645 | use Thread qw(yield);
|
|---|
| 646 | use Thread::Semaphore;
|
|---|
| 647 | my $semaphore = new Thread::Semaphore;
|
|---|
| 648 | $GlobalVariable = 0;
|
|---|
| 649 |
|
|---|
| 650 | $thr1 = new Thread \&sample_sub, 1;
|
|---|
| 651 | $thr2 = new Thread \&sample_sub, 2;
|
|---|
| 652 | $thr3 = new Thread \&sample_sub, 3;
|
|---|
| 653 |
|
|---|
| 654 | sub sample_sub {
|
|---|
| 655 | my $SubNumber = shift @_;
|
|---|
| 656 | my $TryCount = 10;
|
|---|
| 657 | my $LocalCopy;
|
|---|
| 658 | sleep 1;
|
|---|
| 659 | while ($TryCount--) {
|
|---|
| 660 | $semaphore->down;
|
|---|
| 661 | $LocalCopy = $GlobalVariable;
|
|---|
| 662 | print "$TryCount tries left for sub $SubNumber (\$GlobalVariable is $GlobalVariable)\n";
|
|---|
| 663 | yield;
|
|---|
| 664 | sleep 2;
|
|---|
| 665 | $LocalCopy++;
|
|---|
| 666 | $GlobalVariable = $LocalCopy;
|
|---|
| 667 | $semaphore->up;
|
|---|
| 668 | }
|
|---|
| 669 | }
|
|---|
| 670 |
|
|---|
| 671 | The three invocations of the subroutine all operate in sync. The
|
|---|
| 672 | semaphore, though, makes sure that only one thread is accessing the
|
|---|
| 673 | global variable at once.
|
|---|
|
|---|