================================== Mini Garbage Collection in Girocco ================================== What is "mini" garbage collection? A "mini" garbage collection makes the repository more efficient by cleaning up any "crud" files, combining small packs and joining per-push reflogs. For "git-svn" mirrors, loose objects from the most recent `git svn fetch` operation are migrated into a pack. All of these tasks are also performed during a "full" garbage collection. The difference is that a "mini" garbage collection never removes any objects nor does it ever perform a reachability trace. It also will not operate on any overly large packs. As a result it's typically a relatively speedy operation (at least when compared to full garbage collection). The `combine-packs.sh` script provides the ability to combine packs and/or pack loose objects into a pack. It produces packs that are just as efficient as Git packs (within +/- 1%) but does not require connected objects to do so. -------------------- Triggering a Mini GC -------------------- There are two places a "mini" garbage collection can be triggered from: 1. The `pre-receive` hook scripts 2. The `update.sh` mirror update script (Technically the toolbox `perform-pre-gc-linking.sh` script can also trigger a "mini" garbage collection but it's not something that normally runs.) Requesting a Mini GC ~~~~~~~~~~~~~~~~~~~~ Once a "mini" garbage collection becomes desirable, it's requested simply by making sure the `.needsgc` file exists in the repository's git directory (i.e. alongside the `config` file) -- the contents of the `.needsgc` file are ignored. When the `jobd.pl` process checks to see if a repository needs to have garbage collection run on it (by examining its `gitweb.lastgc` value) it will disregard the `gitweb.lastgc` value and always run the `gc.sh` script if it sees a `.needsgc` file is present. When the `gc.sh` script is run, if the time for a full garbage collection has not yet arrived _and_ a `.needsgc` file exists then it will perform a "mini" garbage collection on the repository instead. Conditions causing a Mini GC ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since Girocco uses transfer.unpackLimit=1, it's possible for more packs than usual to build up more quickly since every successful push that is not strictly a rewind will leave a new pack behind. Once 20 (or more) packs are detected a "mini" garbage collection will be requested. Every ref change fed to the `pre-receive` hook or caused by running the `update.sh` mirror update script gets recorded in a "reflogs" file located in the `reflogs` directory (located alongside the repository's git directory `config` file). Each push operation generates a new "reflogs" file while each mirror update operation that receives new updates just appends to the current day's file. Once 50 (or more) reflogs files are detected a "mini" garbage collection will be requested. Any "git-svn" fetch operation that fetches any new loose objects will also immediately cause a "mini" garbage collection to be requested. -------------------- Escalating a Mini GC -------------------- The primary purpose of a "mini" garbage collection is to reduce the number of packs. To remain efficient, a "mini" garbage collection ignores overly large packs and does not perform any object reachability traces nor does it ever remove any objects. If a "mini" garbage collection cannot reduce the total number of "extra" packs to less than 10 it will trigger a full garbage collection the next time that `jobd.pl` services the repository (it does this by unsetting the `gitweb.lastgc` value). When counting "extra" packs, any `.keep` packs or `.bitmap` or `.bndl` packs are excluded from the count to avoid disrupting incoming pushes, use of bitmaps or downloadable bundles. ---------------------------- On Demand Garbage Collection ---------------------------- The end result of the `.needsgc` facility and the ability to escalate to a full garbage collection when necessary effectively provides an "on-demand" garbage collection capability for Girocco. As long as `jobd.pl` runs continuously (or often enough in `--all-once` mode), Girocco's repositories will generally be kept in a well-maintained state.