1 | CLucene README
|
---|
2 | ==============
|
---|
3 |
|
---|
4 | ------------------------------------------------------
|
---|
5 | CLucene is a C++ port of Lucene.
|
---|
6 | It is a high-performance, full-featured text search
|
---|
7 | engine written in C++. CLucene is faster than lucene
|
---|
8 | as it is written in C++.
|
---|
9 | ------------------------------------------------------
|
---|
10 |
|
---|
11 | CLucene has contributions from many, see AUTHORS
|
---|
12 |
|
---|
13 | CLucene is distributed under the GNU Lesser General Public License (LGPL)
|
---|
14 | *or*
|
---|
15 | the Apache License, Version 2.0
|
---|
16 | See the LGPL.license and APACHE.license for the respective license information.
|
---|
17 | Read COPYING for more about the license.
|
---|
18 |
|
---|
19 | Installation
|
---|
20 | ------------
|
---|
21 | * For Linux, MacOSX, cygwin and MinGW build information, read INSTALL.
|
---|
22 | * Boost.Jam files are provided in the root directory and subdirectories.
|
---|
23 | * Microsoft Visual Studio (6&7) are provided in the win32 folder.
|
---|
24 |
|
---|
25 | Mailing List
|
---|
26 | ------------
|
---|
27 | Questions and discussion should be directed to the CLucene mailing list
|
---|
28 | at [email protected]
|
---|
29 | Find subscription instructions at
|
---|
30 | http://lists.sourceforge.net/lists/listinfo/clucene-developers
|
---|
31 | Suggestions and bug reports can be made on our bug tracking database
|
---|
32 | (http://sourceforge.net/tracker/?group_id=80013&atid=558446)
|
---|
33 |
|
---|
34 | The latest version
|
---|
35 | ------------------
|
---|
36 | Details of the latest version can be found on the CLucene sourceforge project
|
---|
37 | web site: http://www.sourceforge.net/projects/clucene
|
---|
38 |
|
---|
39 | Documentation
|
---|
40 | -------------
|
---|
41 | Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/
|
---|
42 | You can also build your own documentation by running doxygen from the root directory
|
---|
43 | of clucene.
|
---|
44 | CLucene is a very close port of Java Lucene, so you can also try looking at the
|
---|
45 | Java Docs on http://lucene.apache.org/java/
|
---|
46 |
|
---|
47 |
|
---|
48 | Performance
|
---|
49 | -----------
|
---|
50 | Very little benchmarking has been done on clucene. Andi Vajda posted some
|
---|
51 | limited statistics on the clucene list a while ago with the following results.
|
---|
52 |
|
---|
53 | There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
|
---|
54 | 6108kb of HTML text.
|
---|
55 | org.apache.lucene.demo.IndexFiles with java and gcj:
|
---|
56 | on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
|
---|
57 | . running with java 1.4.1_01-99 : 20379 ms
|
---|
58 | . running with gcj 3.3.2 -O2 : 17842 ms
|
---|
59 | . running clucene 0.8.9's demo : 9930 ms
|
---|
60 |
|
---|
61 | I recently did some more tests and came up with these rough tests:
|
---|
62 | 663mb (797 files) of Guttenberg texts
|
---|
63 | on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
|
---|
64 | Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
|
---|
65 | Clucene: 232141. peak mem usage ~60, avg ~4mb ram
|
---|
66 |
|
---|
67 | Searching indexing using 10,000 single word queries
|
---|
68 | Jlucene: ~60078ms and used ~13mb ram
|
---|
69 | Clucene: ~48359ms and used ~4.2mb ram
|
---|
70 |
|
---|
71 | Platform notes
|
---|
72 | --------------
|
---|
73 |
|
---|
74 | 'Too many open files'
|
---|
75 | Some platforms don't provide enough file handles to run CLucene properly.
|
---|
76 | To solve this, increase the open file limit:
|
---|
77 |
|
---|
78 | On Solaris:
|
---|
79 | ulimit -n 1024
|
---|
80 | set rlim_fd_cur=1024
|
---|
81 |
|
---|
82 | Acknowledgments
|
---|
83 | ----------------
|
---|
84 |
|
---|
85 | The Apache Lucene project is the basis for this software, so the biggest
|
---|
86 | acknoledgment goes to that project.
|
---|
87 |
|
---|
88 | We wish to acknowledge the following copyrighted works that
|
---|
89 | make up portions of the CLucene software:
|
---|
90 |
|
---|
91 | CLucene relies heavily on the use of autoconf and libtool to provide
|
---|
92 | a build environment.
|
---|