1 | /* zran.c -- example of zlib/gzip stream indexing and random access
|
---|
2 | * Copyright (C) 2005 Mark Adler
|
---|
3 | * For conditions of distribution and use, see copyright notice in zlib.h
|
---|
4 | Version 1.0 29 May 2005 Mark Adler */
|
---|
5 |
|
---|
6 | /* Illustrate the use of Z_BLOCK, inflatePrime(), and inflateSetDictionary()
|
---|
7 | for random access of a compressed file. A file containing a zlib or gzip
|
---|
8 | stream is provided on the command line. The compressed stream is decoded in
|
---|
9 | its entirety, and an index built with access points about every SPAN bytes
|
---|
10 | in the uncompressed output. The compressed file is left open, and can then
|
---|
11 | be read randomly, having to decompress on the average SPAN/2 uncompressed
|
---|
12 | bytes before getting to the desired block of data.
|
---|
13 |
|
---|
14 | An access point can be created at the start of any deflate block, by saving
|
---|
15 | the starting file offset and bit of that block, and the 32K bytes of
|
---|
16 | uncompressed data that precede that block. Also the uncompressed offset of
|
---|
17 | that block is saved to provide a referece for locating a desired starting
|
---|
18 | point in the uncompressed stream. build_index() works by decompressing the
|
---|
19 | input zlib or gzip stream a block at a time, and at the end of each block
|
---|
20 | deciding if enough uncompressed data has gone by to justify the creation of
|
---|
21 | a new access point. If so, that point is saved in a data structure that
|
---|
22 | grows as needed to accommodate the points.
|
---|
23 |
|
---|
24 | To use the index, an offset in the uncompressed data is provided, for which
|
---|
25 | the latest accees point at or preceding that offset is located in the index.
|
---|
26 | The input file is positioned to the specified location in the index, and if
|
---|
27 | necessary the first few bits of the compressed data is read from the file.
|
---|
28 | inflate is initialized with those bits and the 32K of uncompressed data, and
|
---|
29 | the decompression then proceeds until the desired offset in the file is
|
---|
30 | reached. Then the decompression continues to read the desired uncompressed
|
---|
31 | data from the file.
|
---|
32 |
|
---|
33 | Another approach would be to generate the index on demand. In that case,
|
---|
34 | requests for random access reads from the compressed data would try to use
|
---|
35 | the index, but if a read far enough past the end of the index is required,
|
---|
36 | then further index entries would be generated and added.
|
---|
37 |
|
---|
38 | There is some fair bit of overhead to starting inflation for the random
|
---|
39 | access, mainly copying the 32K byte dictionary. So if small pieces of the
|
---|
40 | file are being accessed, it would make sense to implement a cache to hold
|
---|
41 | some lookahead and avoid many calls to extract() for small lengths.
|
---|
42 |
|
---|
|
---|