3 Constraints
The following list shows all the constraints imposed by mifluz
.
It can also be seen as a list of functions provided by mifluz
that is more general than the API specification.
- `Now Available'
-
- In-place dynamic update of the index.
- Use in memory cache to perform heavy index updates without stressing
the disk too much.
- The library can be linked in an C or C++ application, dynamically or
statically.
- The memory usage is completely controlled. The application can specify
the maximum total memory usage. The application can specify that the
memory cache will be shared among processes.
- The library is thread safe.
- `Future'
-
- Transaction logs for backup recovery.
- Index integrity check and repair function.
- Indexing up to 500 million documents and support up to 18 million document
updates per 24h. The average size of a document is 4 kilo bytes and contains
200 indexable words.
- `Constraints and Limitations'
-
- No atomic data is bigger than a size known in advance.
This postulate is essential for disk storage optimization.
If an atomic data may have a size of 10Mb, it is impossible to guarantee
that a query/indexing process controls the memory it's using.
An atomic datum is something that must be manipulated as whole, with
no possibility of splitting it into smaller parts. For instance a posting
(Word, document identifier and position) is an atomic datum:
to manipulate it in memory it has to reside completely in memory.
By contrast a postings list is not atomic. Manipulating a postings list
can be done without loading all the postings list in memory.
- The cost of an update is O(log m(N)) where m is the average number of
entries in a page and N the total number of pages. This figure has to
be considered when the pages are in memory or on disk.
- The inverted index data is sorted to fit the most typical search
pattern. The structure of the inverted index key can be defined at
run time to fit a usage pattern.
- No lock mechanism is provided beyond an individual word occurrence. It is
assumed that the library is linked in a central server that serializes
all the requests or in a program that provides its own lock mechanism.