-
Notifications
You must be signed in to change notification settings - Fork 1
jbroll/fts
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
fts is a command line interface exposing the full text search capabilities of sqlite3. It indexes a directory tree of documentation and can extract text by excuting external filter programs. Open source filters for extracting text from .doc, .docx, .xlx, .xlsx, .ppt and .pdf are included in the distribution. fts check - check that all the documents in the index still exist. Remove any that do not exist. fts index [<files>] - create of update the search index If no additional arguments are given, index a set of directories indicated in the configuration file by the index-path directives. Or index the files (and directories) given on the command line. The files given must be included within the paths covered by index-path directives in the configuration file. The full text index includes three column of text, a title, a description and the body of text extracted from the file itself. By default the title is the name of the file with "+" and "_" replaced with space and the description is empty. Alternate values for these columns can be provided by calling a group proc declaired in the index-path directive of the config file. fts excludes - display the exclude patterns from <conf> fts filters - display the filter patterns from <conf> fts list - display a table of documents in the index. fts search [-t tmpl] <query> - seach the index for query The search command produces a table of search results... The option -t allows specification of an optional template. Two templates are included on the source, text and html. The default is text. fts rm docid <docids ..> - remove documents by docid. fts rm file <files ..> - remove documents by file path. Finding the config file: The full path to the configuration file may be specified on the command line as the first argument, prefixed with the "@" symbol. If this is not specified the name of the executable will used as the name of the config file by suffixing ".conf" to it. Config file commands: set tmp <temporary-directory> set wTitle <weight of title match> # Weight is positive a real number set wDescip <weight of description match> set wBody <weight of body match> database <sqlite3-database-file> stopwords <stop-words-file> filter <pattern> <extraction-command> Any indexing file candidate that matches the glob style pattern will have text extracted from it by executing the extraction command. The "%f" and "@F" tokens in the extraction command string will be replaced with the file name matching the pattern. The extraction command will be executed and its standard output used as the text to index. If the replacement token "@F" is found in the extraction command string and the pattern is of the form "*.xxx" the rule is chained. The matched extension will be removed from the file name and the result will be matched against the list of extraction filters again. File extension patterns of the form "*.xxx" are not case sensative. exclude <glob-pattern> ... index-path <tag> <path> [url] [regexp] Index all files in the path, recursing to subdirectories. A url entry for the database is generated by calling: set url [regsub $regexp $filepath $url] The default values for url and regexp are {\1} and {%p(.*)}, where %p is substituted with the indexed path. This generates the file tail as the default url entry in the search results. The tag can be used to associate different directory trees of documents with a proc to provide title and description text to index. If a proc with the same name as the tag is found it will be called with two args to retrieve this text. $tag title <file> $tag descrip <file> The result of this call will be indexed in the associated column. The value of the title column is available for use in the search results template. template name { header rows footer } Declair a template whose name may be used with the -t option to search. The template is a list of three strings that will be expanded with subst to produce the results of the search. The first string is expanded before the search, it represents the header of the result. When the header string is expaneded with subst, the value $query is available. The seconds string is expanded once for each row. The values $rowid, $tag, $mtime, $fsize, $url, $file, and $title assiciated with the search result document are available with the string is expanded. The third string is expanded after the search results have been generated and represents the footer of the search results. If the result of any individual template expansion is an empty string the result is ignored. If the oranization of the search results needs to be returned in an order different from the search ranking, the parts of a template to be utilized as callbacks where search results are accumulated in calls to the row template, transformed and returned in the footer.
About
Full text search w/sqlite fts-v4
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published