Next: , Previous: WordList SYNOPSIS, Up: WordList


10.9.3 WordList DESCRIPTION

WordList is the mifluz equivalent of a database handler. Each WordList object is bound to an inverted index file and implements the operations to create it, fill it with word occurrences and search for an entry matching a given criterion.

WordList is an abstract class and cannot be instanciated. The List method of the class WordContext will create an instance using the appropriate derived class, either WordListOne or WordListMulti. Refer to the corresponding manual pages for more information on their specific semantic.

When doing bulk insertions, mifluz creates temporary files that contain the entries to be inserted in the index. Those files are typically named indexC00000000 . The maximum size of the temporary file is wordlist_cache_size / 2. When the maximum size of the temporary file is reached, mifluz creates another temporary file named indexC00000001 . The process continues until mifluz created 50 temporary file. At this point it merges all temporary files into one that replaces the first indexC00000000 . Then it continues to create temporary file again and keeps following this algorithm until the bulk insertion is finished. When the bulk insertion is finished, mifluz has one big file named indexC00000000 that contains all the entries to be inserted in the index. mifluz inserts all the entries from indexC00000000 into the index and delete the temporary file when done. The insertion will be fast since all the entries in indexC00000000 are already sorted.

The parameter wordlist_cache_max can be used to prevent the temporary files to grow indefinitely. If the total cumulated size of the indexC* files grow beyond this parameter, they are merged into the main index and deleted. For instance setting this parameter value to 500Mb garanties that the total size of the indexC* files will not grow above 500Mb.