| 1 | $Id: Fork.os2 1281 2004-03-06 21:48:26Z bird $
|
|---|
| 2 |
|
|---|
| 3 | Fork Design Draft
|
|---|
| 4 | --------------------
|
|---|
| 5 |
|
|---|
| 6 | 1.0 Intro
|
|---|
| 7 | ----------
|
|---|
| 8 |
|
|---|
| 9 | blah.
|
|---|
| 10 |
|
|---|
| 11 |
|
|---|
| 12 | 1.1 The SuS fork() Description
|
|---|
| 13 | ------------------------------
|
|---|
| 14 |
|
|---|
| 15 | NAME
|
|---|
| 16 |
|
|---|
| 17 | fork - create a new process
|
|---|
| 18 |
|
|---|
| 19 | SYNOPSIS
|
|---|
| 20 |
|
|---|
| 21 | #include <unistd.h>
|
|---|
| 22 |
|
|---|
| 23 | pid_t fork(void);
|
|---|
| 24 |
|
|---|
| 25 | DESCRIPTION
|
|---|
| 26 |
|
|---|
| 27 | The fork() function shall create a new process. The new process (child process) shall be an exact copy of the calling process (parent process) except as detailed below:
|
|---|
| 28 |
|
|---|
| 29 | * The child process shall have a unique process ID.
|
|---|
| 30 | * The child process ID also shall not match any active process
|
|---|
| 31 | group ID.
|
|---|
| 32 | * The child process shall have a different parent process ID,
|
|---|
| 33 | which shall be the process ID of the calling process.
|
|---|
| 34 | * The child process shall have its own copy of the parent's file
|
|---|
| 35 | descriptors. Each of the child's file descriptors shall refer
|
|---|
| 36 | to the same open file description with the corresponding file
|
|---|
| 37 | descriptor of the parent.
|
|---|
| 38 | * The child process shall have its own copy of the parent's open
|
|---|
| 39 | directory streams. Each open directory stream in the child process
|
|---|
| 40 | may share directory stream positioning with the corresponding
|
|---|
| 41 | directory stream of the parent.
|
|---|
| 42 | * [XSI] The child process shall have its own copy of the parent's
|
|---|
| 43 | message catalog descriptors.
|
|---|
| 44 | * The child process' values of tms_utime, tms_stime, tms_cutime, and
|
|---|
| 45 | tms_cstime shall be set to 0.
|
|---|
| 46 | * The time left until an alarm clock signal shall be reset to zero,
|
|---|
| 47 | and the alarm, if any, shall be canceled; see alarm() .
|
|---|
| 48 | * [XSI] All semadj values shall be cleared.
|
|---|
| 49 | * File locks set by the parent process shall not be inherited by
|
|---|
| 50 | the child process.
|
|---|
| 51 | * The set of signals pending for the child process shall be
|
|---|
| 52 | initialized to the empty set.
|
|---|
| 53 | * [XSI] Interval timers shall be reset in the child process.
|
|---|
| 54 | * [SEM] Any semaphores that are open in the parent process shall
|
|---|
| 55 | also be open in the child process.
|
|---|
| 56 | * [ML] The child process shall not inherit any address space memory
|
|---|
| 57 | locks established by the parent process via calls to mlockall()
|
|---|
| 58 | or mlock().
|
|---|
| 59 | * [MF|SHM] Memory mappings created in the parent shall be retained
|
|---|
| 60 | in the child process. MAP_PRIVATE mappings inherited from the
|
|---|
| 61 | parent shall also be MAP_PRIVATE mappings in the child, and any
|
|---|
| 62 | modifications to the data in these mappings made by the parent
|
|---|
| 63 | prior to calling fork() shall be visible to the child. Any
|
|---|
| 64 | modifications to the data in MAP_PRIVATE mappings made by the
|
|---|
| 65 | parent after fork() returns shall be visible only to the parent.
|
|---|
| 66 | Modifications to the data in MAP_PRIVATE mappings made by the
|
|---|
| 67 | child shall be visible only to the child.
|
|---|
| 68 | * [PS] For the SCHED_FIFO and SCHED_RR scheduling policies, the
|
|---|
| 69 | child process shall inherit the policy and priority settings
|
|---|
| 70 | of the parent process during a fork() function. For other s
|
|---|
| 71 | cheduling policies, the policy and priority settings on fork()
|
|---|
| 72 | are implementation-defined.
|
|---|
| 73 | * [TMR] Per-process timers created by the parent shall not be
|
|---|
| 74 | inherited by the child process.
|
|---|
| 75 | * [MSG] The child process shall have its own copy of the message
|
|---|
| 76 | queue descriptors of the parent. Each of the message descriptors
|
|---|
| 77 | of the child shall refer to the same open message queue
|
|---|
| 78 | description as the corresponding message descriptor of the parent.
|
|---|
| 79 | * [AIO] No asynchronous input or asynchronous output operations
|
|---|
| 80 | shall be inherited by the child process.
|
|---|
| 81 | * A process shall be created with a single thread. If a
|
|---|
| 82 | multi-threaded process calls fork(), the new process shall contain
|
|---|
| 83 | a replica of the calling thread and its entire address space,
|
|---|
| 84 | possibly including the states of mutexes and other resources.
|
|---|
| 85 | Consequently, to avoid errors, the child process may only execute
|
|---|
| 86 | async-signal-safe operations until such time as one of the exec
|
|---|
| 87 | functions is called. [THR] Fork handlers may be established by
|
|---|
| 88 | means of the pthread_atfork() function in order to maintain
|
|---|
| 89 | application invariants across fork() calls.
|
|---|
| 90 |
|
|---|
| 91 | When the application calls fork() from a signal handler and any of
|
|---|
| 92 | the fork handlers registered by pthread_atfork() calls a function
|
|---|
| 93 | that is not asynch-signal-safe, the behavior is undefined.
|
|---|
| 94 | * [TRC TRI] If the Trace option and the Trace Inherit option are
|
|---|
| 95 | both supported:
|
|---|
| 96 | If the calling process was being traced in a trace stream that
|
|---|
| 97 | had its inheritance policy set to POSIX_TRACE_INHERITED, the
|
|---|
| 98 | child process shall be traced into that trace stream, and the
|
|---|
| 99 | child process shall inherit the parent's mapping of trace event
|
|---|
| 100 | names to trace event type identifiers. If the trace stream in
|
|---|
| 101 | which the calling process was being traced had its inheritance
|
|---|
| 102 | policy set to POSIX_TRACE_CLOSE_FOR_CHILD, the child process
|
|---|
| 103 | shall not be traced into that trace stream. The inheritance
|
|---|
| 104 | policy is set by a call to the posix_trace_attr_setinherited()
|
|---|
| 105 | function.
|
|---|
| 106 | * [TRC] If the Trace option is supported, but the Trace Inherit
|
|---|
| 107 | option is not supported:
|
|---|
| 108 | The child process shall not be traced into any of the trace
|
|---|
| 109 | streams of its parent process.
|
|---|
| 110 | * [TRC] If the Trace option is supported, the child process of
|
|---|
| 111 | a trace controller process shall not control the trace streams
|
|---|
| 112 | controlled by its parent process.
|
|---|
| 113 | * [CPT] The initial value of the CPU-time clock of the child
|
|---|
| 114 | process shall be set to zero.
|
|---|
| 115 | * [TCT] The initial value of the CPU-time clock of the single
|
|---|
| 116 | thread of the child process shall be set to zero.
|
|---|
| 117 |
|
|---|
| 118 | All other process characteristics defined by IEEE Std 1003.1-2001 shall
|
|---|
| 119 | be the same in the parent and child processes. The inheritance of
|
|---|
| 120 | process characteristics not defined by IEEE Std 1003.1-2001 is
|
|---|
| 121 | unspecified by IEEE Std 1003.1-2001.
|
|---|
| 122 |
|
|---|
| 123 | After fork(), both the parent and the child processes shall be capable
|
|---|
| 124 | of executing independently before either one terminates.
|
|---|
| 125 |
|
|---|
| 126 | RETURN VALUE
|
|---|
| 127 |
|
|---|
| 128 | Upon successful completion, fork() shall return 0 to the child process
|
|---|
| 129 | and shall return the process ID of the child process to the parent
|
|---|
| 130 | process. Both processes shall continue to execute from the fork()
|
|---|
| 131 | function. Otherwise, -1 shall be returned to the parent process, no
|
|---|
| 132 | child process shall be created, and errno shall be set to indicate
|
|---|
| 133 | the error.
|
|---|
| 134 |
|
|---|
| 135 | ERRORS
|
|---|
| 136 |
|
|---|
| 137 | The fork() function shall fail if:
|
|---|
| 138 |
|
|---|
| 139 | [EAGAIN]
|
|---|
| 140 | The system lacked the necessary resources to create another
|
|---|
| 141 | process, or the system-imposed limit on the total number of
|
|---|
| 142 | processes under execution system-wide or by a single user
|
|---|
| 143 | {CHILD_MAX} would be exceeded.
|
|---|
| 144 |
|
|---|
| 145 | The fork() function may fail if:
|
|---|
| 146 |
|
|---|
| 147 | [ENOMEM]
|
|---|
| 148 | Insufficient storage space is available.
|
|---|
| 149 |
|
|---|
| 150 |
|
|---|
| 151 |
|
|---|
| 152 |
|
|---|
| 153 | 2.0 Requirements and Assumptions Of The Implementation
|
|---|
| 154 | ------------------------------------------------------
|
|---|
| 155 |
|
|---|
| 156 | The Innotek LIBC fork() implementation will require the following features
|
|---|
| 157 | in LIBC to work:
|
|---|
| 158 | 1. A shared process management internal to LIBC for communication to the
|
|---|
| 159 | child that a fork() is in progress.
|
|---|
| 160 | 2. A very generalized and varied set of fork helper functions to archive
|
|---|
| 161 | maximum flexibility of the implementation.
|
|---|
| 162 | 3. Extended versions of some memory related OS/2 APIs must be implemented.
|
|---|
| 163 |
|
|---|
| 164 | The implementation will further make the following assumption about the
|
|---|
| 165 | operation of OS/2:
|
|---|
| 166 | 1. DosExecPgm will not return till all DLLs are initated successfully.
|
|---|
| 167 | 2. DosQueryMemState() is broken if more than one page is specified.
|
|---|
| 168 | (no idea why/how/where it's broken, but testcase shows it is :/ )
|
|---|
| 169 |
|
|---|
| 170 |
|
|---|
| 171 | 3.0 The Shared Process Management
|
|---|
| 172 | ---------------------------------
|
|---|
| 173 |
|
|---|
| 174 | The fork() implementation requires a method for telling the child process
|
|---|
| 175 | that it's being forked and must take a very different startup route. For
|
|---|
| 176 | some other LIBC apis there are need for parent -> child and child -> parent
|
|---|
| 177 | information exchange. More specifically, the inheritance of sockets,
|
|---|
| 178 | signals, the different scheduler actions of a posix_spawn[p]() call, and
|
|---|
| 179 | possibly some process group stuff related to posix_spawn too if we get it
|
|---|
| 180 | figured out eventually. All this was parent -> child during spawn/fork. A
|
|---|
| 181 | need also exist for child -> parent notification and possibly exchange for
|
|---|
| 182 | process termination. It might be necessary to reimplement the different
|
|---|
| 183 | wait apis and implement SIGCHLD, it's likely that those tasks will make
|
|---|
| 184 | such demands.
|
|---|
| 185 |
|
|---|
| 186 | The choice is now whether or not to make this shared process management
|
|---|
| 187 | specific to each LIBC version as a shared segement or try to make it
|
|---|
| 188 | survive normal LIBC updates. Making is specific have advantages in code
|
|---|
| 189 | size and memory footprint (no reserved fields), however it have certain
|
|---|
| 190 | disadvantages when LIBC is updated. The other option is to use a named
|
|---|
| 191 | shared memory object, defining the content with reserved space for later
|
|---|
| 192 | extensions so several versions of LIBC with more or less features
|
|---|
| 193 | implemented can co use the memory space.
|
|---|
| 194 |
|
|---|
| 195 | The latter option is prefered since it allows more applications to
|
|---|
| 196 | interoperate, it causes less shared memory waste, the shared memory
|
|---|
| 197 | can be located in high memory and it would be possible to fork
|
|---|
| 198 | processes using multiple versions of LIBC.
|
|---|
| 199 |
|
|---|
| 200 | The shared memory shall be named \SHAREMEM\INNOTEKLIBC.V01, the version
|
|---|
| 201 | number being the one of the shared memory layout and contents, it will
|
|---|
| 202 | only be increased when incompatible changes are made.
|
|---|
| 203 |
|
|---|
| 204 | The shared memory shall be protected by an standard OS/2 mutex semaphore.
|
|---|
| 205 | It shall not use any fast R3 semaphore since the the usage frequency is
|
|---|
| 206 | low and the result of a messup may be disastrous. Care must be take for
|
|---|
| 207 | avoiding creation races and owner died scenarios.
|
|---|
| 208 |
|
|---|
| 209 | The memory shall have a fixed size, since adding segments is very hard.
|
|---|
| 210 | Thus the size must be large enough to cope with a great deal of
|
|---|
| 211 | processes, while bearing in mind that OS/2 normally doesn't support more
|
|---|
| 212 | than a 1000 processes, with a theoritical max of some 4000 (being the
|
|---|
| 213 | max thread count). A very simplistic allocation scheme will be
|
|---|
| 214 | implemented. Practically speaking a fixed block size pool would do fine
|
|---|
| 215 | for the process structure, while for the misc structures like socket
|
|---|
| 216 | lists a linked list based heap would do fine.
|
|---|
| 217 |
|
|---|
| 218 | The process blocks shall be rounded up to in size adding a reasonable
|
|---|
| 219 | amount of space resevered for future extensions. Reserved space must be
|
|---|
| 220 | all zeroed.
|
|---|
| 221 |
|
|---|
| 222 | The fork() specific members of the process block shall be a pointer to
|
|---|
| 223 | the shared memory object for the fork operation (the fork handle) and
|
|---|
| 224 | list of forkable modules. The fork handle will it self contain
|
|---|
| 225 | information indicating whether or not another LIBC version have already
|
|---|
| 226 | started fork() handling in the child. The presense of the fork handle
|
|---|
| 227 | means that the child is being forked and normal dll init and startup
|
|---|
| 228 | will not be executed, but a registered callback will be called to do
|
|---|
| 229 | the forking of each module. (more details in section 4.0)
|
|---|
| 230 |
|
|---|
| 231 | The parent shall before spawn, fork and exec (essentially before DosExecPgm
|
|---|
| 232 | or DosStartSession) create a process block for the child to be born and
|
|---|
| 233 | link it into an embryo list in the shared memory block. The child shall
|
|---|
| 234 | find it's process block by searching the embryo list using the parent pid
|
|---|
| 235 | as key. All DosExecPgm and DosStartSession calls shall be serialized within
|
|---|
| 236 | one LIBC version. (If some empty headed programmer manages to link together
|
|---|
| 237 | a program which may end up using two or more LIBC versions and having two
|
|---|
| 238 | or more thread doing DosExecPgm at the very same time, well then he really
|
|---|
| 239 | deserves what ever trouble he gets! At least don't blame me!)
|
|---|
| 240 |
|
|---|
| 241 | Process blocks shall have to stay around after the process terminated
|
|---|
| 242 | (for child -> parent term exchange), a cleanup mechanism will be invoked
|
|---|
| 243 | whenever a free memory threshold is reached. All processes will register
|
|---|
| 244 | exit list handlers to mark the process block as zombie (and later
|
|---|
| 245 | perhaps setting error codes and notifying waiters/child-listeners).
|
|---|
| 246 |
|
|---|
| 247 |
|
|---|
| 248 |
|
|---|
| 249 | 4.0 The fork() Implementation
|
|---|
| 250 | -----------------------------
|
|---|
| 251 |
|
|---|
| 252 |
|
|---|
| 253 | The implementation is based on a fork handle and a set of primitives.
|
|---|
| 254 | The fork handle is a pointer to an shared memory object allocated for the
|
|---|
| 255 | occation and which will be freed before fork() returns. The primitives
|
|---|
| 256 | all operates on this handle and will be provided using a callback table
|
|---|
| 257 | in order to fully support multiple LIBC versions.
|
|---|
| 258 |
|
|---|
| 259 |
|
|---|
| 260 | 4.1 Forkable Executable and DLLs
|
|---|
| 261 | --------------------------------
|
|---|
| 262 |
|
|---|
| 263 | The support for fork() is an optional feature of LIBC. The default
|
|---|
| 264 | executable produced with LIBC and GCC is not be forkable. The fork
|
|---|
| 265 | support will be based on registration of the DLLs and EXEs in their
|
|---|
| 266 | LIBC supplied startup code (crt0/dll0). A set of fork versions of these
|
|---|
| 267 | modules exist with the suffix 'fork.o'.
|
|---|
| 268 |
|
|---|
| 269 | The big differnece between the ordinary crt0/dll0 and the forkable
|
|---|
| 270 | crt0/dll0 is a per module structure, a call to register this, and the
|
|---|
| 271 | handling of the return code of that call.
|
|---|
| 272 |
|
|---|
| 273 | The fork module structure:
|
|---|
| 274 | typedef struct __libc_ForkModule
|
|---|
| 275 | {
|
|---|
| 276 | /** Structure version. (Initially 'FMO1' as viewed in hex editor.) */
|
|---|
| 277 | unsigned int iMagic;
|
|---|
| 278 | /** Fork callback function */
|
|---|
| 279 | int (*pfnAtFork)(__LIBC_FORKMODULE *pModule,
|
|---|
| 280 | __LIBC_FORKHANDLE *pForkHandle, enum __LIBC_CALLBACKOPERATION enmOperation);
|
|---|
| 281 | /** Pointer to the _CRT_FORK_PARENT1 set vector.
|
|---|
| 282 | * It's formatted as {priority,callback}. */
|
|---|
| 283 | void *pvParentVector1;
|
|---|
| 284 | /** Pointer to the _CRT_FORK_CHILD1 set vector.
|
|---|
| 285 | * It's formatted as {priority,callback}. */
|
|---|
| 286 | void *pvChildVector1;
|
|---|
| 287 | /** Data segment base address. */
|
|---|
| 288 | void *pvDataSegBase;
|
|---|
| 289 | /** Data segment end address (exclusive). */
|
|---|
| 290 | void *pvDataSegEnd;
|
|---|
| 291 | /** Reserved - must be zero. */
|
|---|
| 292 | int iReserved1;
|
|---|
| 293 | } __LIBC_FORKMODULE, *__LIBC_PFORKMODULE; /* urg! conventions */
|
|---|
| 294 |
|
|---|
| 295 |
|
|---|
| 296 | The fork callback function which crt0/dll0 references when initializing
|
|---|
| 297 | the fork modules structure is called _atfork_callback. It takes the fork
|
|---|
| 298 | handle, module structure, and an operation enum as arguments. LIBC will
|
|---|
| 299 | contain a default implementation of _atfork_callback() which simply
|
|---|
| 300 | duplicates the data segment, and processes the two set vectors
|
|---|
| 301 | (_CRT_FORK_*1).
|
|---|
| 302 |
|
|---|
| 303 | crt0/dll0 will register the fork module structure and detect a forked
|
|---|
| 304 | child by calling __libc_ForkRegisterModule().
|
|---|
| 305 |
|
|---|
| 306 | Prototypes:
|
|---|
| 307 | /**
|
|---|
| 308 | * Register a forkable module. Called by crt0 and dll0.
|
|---|
| 309 | *
|
|---|
| 310 | * The call links pModule into the list of forkable modules
|
|---|
| 311 | * which is maintained in the process block.
|
|---|
| 312 | *
|
|---|
| 313 | * @returns 0 on normal process startup.
|
|---|
| 314 | * @returns 1 on forked child process startup.
|
|---|
| 315 | * The caller should respond by not calling any _DLL_InitTerm
|
|---|
| 316 | * or similar constructs.
|
|---|
| 317 | * @returns negative on failure.
|
|---|
| 318 | * The caller should return from the dll init returning FALSE
|
|---|
| 319 | * or DosExit in case of crt0. _atfork_callback() will take
|
|---|
| 320 | * care of necessary module initiation.
|
|---|
| 321 | * @param pModule Pointer to the fork module structure for the
|
|---|
| 322 | * module which is to registered.
|
|---|
| 323 | */
|
|---|
| 324 | int __libc_ForkRegisterModule(__LIBC_FORKMODULE *pModule);
|
|---|
| 325 |
|
|---|
| 326 |
|
|---|
| 327 |
|
|---|
| 328 |
|
|---|
| 329 |
|
|---|
| 330 | 4.2 Fork Primitives
|
|---|
| 331 | -------------------
|
|---|
| 332 |
|
|---|
| 333 | These primitives are provided by the fork implementation in the fork
|
|---|
| 334 | handle structure. We define a set of these primitives now, if later
|
|---|
| 335 | new ones are added the users of these must check that they are
|
|---|
| 336 | actually present.
|
|---|
| 337 |
|
|---|
| 338 | Example:
|
|---|
| 339 | rc = pForkHandle->pOps->pfnDuplicatePages(pModule->pvDataBase, pModule->pvDataEnd, __LIBC_FORK_ONLY_DIRTY);
|
|---|
| 340 | if (rc)
|
|---|
| 341 | return rc; /* failure */
|
|---|
| 342 |
|
|---|
| 343 | Prototypes:
|
|---|
| 344 | /**
|
|---|
| 345 | * Duplicating a number of pages from pvStart to pvEnd.
|
|---|
| 346 | * @returns 0 on success.
|
|---|
| 347 | * @returns appropriate non-zero error code on failure.
|
|---|
| 348 | * @param pForkHandle Handle of the current fork operation.
|
|---|
| 349 | * @param pvStart Pointer to start of the pages. Rounded down.
|
|---|
| 350 | * @param pvEnd Pointer to end of the pages. Rounded up.
|
|---|
| 351 | * @param fFlags __LIBC_FORK_ONLY_DIRTY means checking whether the
|
|---|
| 352 | * pages are actually dirty before bothering touching
|
|---|
| 353 | * and copying them. (Using the partically broken
|
|---|
| 354 | * DosQueryMemState() API.)
|
|---|
| 355 | * __LIBC_FORK_ALL means not to bother checking, but
|
|---|
| 356 | * just go ahead copying all the pages.
|
|---|
| 357 | */
|
|---|
| 358 | int pfnDuplicatePages(__LIBC_FORKHANDLE *pForkHandle, void *pvStart, void *pvEnd, unsigned fFlags);
|
|---|
| 359 |
|
|---|
| 360 | /**
|
|---|
| 361 | * Invoke a function in the child process giving it an chunk of input.
|
|---|
| 362 | * The function is invoked the next time the fork buffer is flushed,
|
|---|
| 363 | * call pfnFlush() if the return code is desired.
|
|---|
| 364 | *
|
|---|
| 365 | * @returns 0 on success.
|
|---|
| 366 | * @returns appropriate non-zero error code on failure.
|
|---|
| 367 | * @param pForkHandle Handle of the current fork operation.
|
|---|
| 368 | * @param pfn Pointer to the function to invoke in the child.
|
|---|
| 369 | * The function gets the fork handle, pointer to
|
|---|
| 370 | * the argument memory chunk and the size of that.
|
|---|
| 371 | * The function must return 0 on success, and non-zero
|
|---|
| 372 | * on failure.
|
|---|
| 373 | * @param pvArg Pointer to a block of memory of size cbArg containing
|
|---|
| 374 | * input to be copied to the child and given to pfn upon
|
|---|
| 375 | * invocation.
|
|---|
| 376 | */
|
|---|
| 377 | int pfnInvoke(int *(pfn)(__LIBC_FORKHANDLE *pForkHandle, void *pvArg, size_t cbArg), void *pvArg, size_t cbArg);
|
|---|
| 378 |
|
|---|
|
|---|