aports/testing/manticoresearch/0001-Documentation-Create-go-md2man-pages-for-indexer.1-i.patch
2022-03-22 16:45:54 +01:00

345 lines
27 KiB
Diff
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

From 5537fdb9cc5560ce49a9e5ec26c691775b7488ba Mon Sep 17 00:00:00 2001
From: Zach DeCook <zachdecook@librem.one>
Date: Tue, 4 Jan 2022 13:48:15 -0500
Subject: [PATCH] Documentation: Create go-md2man pages for indexer.1 indexer.5
index_converter.1 indextool.1 spelldump.1 wordbreaker.1 searchd.1
---
.../Plain_indexes_creation.md | 23 +++++--
manual/Installation/Migration_from_Sphinx.md | 24 ++++---
manual/Miscellaneous_tools.md | 22 +++---
manual/Starting_the_server/Manually.md | 67 +++++++------------
4 files changed, 70 insertions(+), 66 deletions(-)
diff --git a/manual/Adding_data_from_external_storages/Plain_indexes_creation.md b/manual/Adding_data_from_external_storages/Plain_indexes_creation.md
index 84142f40..d5e5dbe9 100755
--- a/manual/Adding_data_from_external_storages/Plain_indexes_creation.md
+++ b/manual/Adding_data_from_external_storages/Plain_indexes_creation.md
@@ -1,20 +1,26 @@
-# Plain indexes creation
+# indexer 1 "October 2021" manticoresearch-tools
+
+## NAME
+indexer - auxiliary tool to create plain indexes for Manticore Search
+
+## BACKGROUND INFORMATION
Plain indexes are indexes that are created one-time by fetching data at creation from one or several sources. A plain index is immutable as documents cannot be added or deleted during it's lifespan. It is only possible to update values of numeric attributes (including MVA). Refreshing the data is only possible by recreating the whole index.
Plain indexes are available only in [Plain mode](../Creating_an_index/Local_indexes.md#Defining-index-schema-in-config-%28Plain mode%29) and their definition is made of an index declaration and one or several source declarations. The data gathering and index creation is not made by the `searchd` server, but by the auxiliary tool `indexer`.
-**Indexer** is a command line tool that can be called directly from the command line or from shell scripts.
+## DESCRIPTION
+indexer is a command line tool that can be called directly from the command line or from shell scripts.
It can accept a number of arguments when called, but there are also several settings of it's own in the Manticore configuration file.
In the typical scenario, indexer does the following:
+
* fetches the data from the source
* builds the plain index
* writes the index files
* (optional) informs search server about the new index which triggers index rotation
-## Indexer tool
General syntax for `indexer` is as follows:
```shell
@@ -36,6 +42,7 @@ sudo -u manticore indexer mysmallindex mybigindex
```
Wildcarding on index names is also supported. The following wildcard tokens can be used:
+
* `?` matches any single character
* `*` matches any count of any characters
* `%` matches none or any single character
@@ -44,12 +51,12 @@ Wildcarding on index names is also supported. The following wildcard tokens can
sudo -u manticore indexer indexpart*main --rotate
```
-The exit codes are as follows:
+## EXIT STATUS
* 0 - everything went ok
* 1 - there was a problem while indexing (and if -rotate was specified, it was skipped) or an operation emitted a warning
* 2 - indexing went ok, but -rotate attempt failed
-### Indexer command line arguments
+## OPTIONS
* `--config <file>` (`-c <file>` for short) tells `indexer` to use the given file as its configuration. Normally, it will look for `manticore.conf` in the installation directory (e.g. `/etc/manticoresearch/manticore.conf`), followed by the current directory you are in when calling `indexer` from the shell. This is most of use in shared environments where the binary files are installed in a global folder, e.g. `/usr/bin/`, but you want to provide users with the ability to make their own custom Manticore set-ups, or if you want to run multiple instances on a single server. In cases like those you could allow them to create their own `manticore.conf` files and pass them to `indexer` with this option. For example:
```shell
@@ -87,6 +94,7 @@ sudo -u manticore indexer myindex --buildstops word_freq.txt 1000
```
This would produce a document in the current directory, `word_freq.txt` with the 1,000 most common words in 'myindex', ordered by most common first. Note that the file will pertain to the last index indexed when specified with multiple indexes or `--all` (i.e. the last one listed in the configuration file)
+
* `--buildfreqs` works with `--buildstops` (and is ignored if `--buildstops` is not specified). As `--buildstops` provides the list of words used within the index, `--buildfreqs` adds the quantity present in the index, which would be useful in establishing whether certain words should be considered stopwords if they are too prevalent. It will also help with developing "Did you mean…" features where you need to know how much more common a given word compared to another, similar one. Example:
```shell
@@ -94,6 +102,7 @@ sudo -u manticore indexer myindex --buildstops word_freq.txt 1000 --buildfreqs
```
This would produce the `word_freq.txt` as above, however after each word would be the number of times it occurred in the index in question.
+
* `--merge <dst-index> <src-index>` is used for physically merging indexes together, for example if you have a [main+delta scheme](../Creating_an_index/Local_indexes/Plain_index.md#Main+delta), where the main index rarely changes, but the delta index is rebuilt frequently, and `--merge` would be used to combine the two. The operation moves from right to left - the contents of `src-index` get examined and physically combined with the contents of `dst-index` and the result is left in `dst-index`. In pseudo-code, it might be expressed as: `dst-index += src-index` An example:
```shell
@@ -101,6 +110,7 @@ sudo -u manticore indexer --merge main delta --rotate
```
In the above example, where the main is the master, rarely modified index, and the delta is more frequently modified one, you might use the above to call `indexer` to combine the contents of the delta into the main index and rotate the indexes.
+
* `--merge-dst-range <attr> <min> <max>` runs the filter range given upon merging. Specifically, as the merge is applied to the destination index (as part of `--merge`, and is ignored if `--merge` is not specified), `indexer` will also filter the documents ending up in the destination index, and only documents will pass through the filter given will end up in the final index. This could be used for example, in an index where there is a 'deleted' attribute, where 0 means 'not deleted'. Such an index could be merged with:
```shell
@@ -108,6 +118,7 @@ sudo -u manticore indexer --merge main delta --merge-dst-range deleted 0 0
```
Any documents marked as deleted (value 1) would be removed from the newly-merged destination index. It can be added several times to the command line, to add successive filters to the merge, all of which must be met in order for a document to become part of the final index.
+
* `--merge-killlists` (and its shorter alias `--merge-klists`) changes the way kill lists are processed when merging indexes. By default, both kill lists get discarded after a merge. That supports the most typical main+delta merge scenario. With this option enabled, however, kill lists from both indexes get concatenated and stored into the destination index. Note that a source (delta) index kill list will be used to suppress rows from a destination (main) index at all times.
* `--keep-attrs` allows to reuse existing attributes on reindexing. Whenever the index is rebuilt, each new document id is checked for presence in the "old" index, and if it already exists, its attributes are transferred to the "new" index; if not found, attributes from the new index are used. If the user has updated attributes in the index, but not in the actual source used for the index, all updates will be lost when reindexing; using keep-attrs enables saving the updated attribute values from the previous index. It is possible to specify a path for index files to used instead of reference path from config:
@@ -135,6 +146,8 @@ sudo -u manticore indextool --rotate --check myindex
* `--help` (`-h` for short) lists all of the parameters that can be called in `indexer`.
* `-v` shows `indexer` version.
+# indexer 5 "October 2021" manticoresearch-tools
+
### Indexer configuration settings
You can also configure indexer behaviour in Manticore configuration file in section `indexer`:
diff --git a/manual/Installation/Migration_from_Sphinx.md b/manual/Installation/Migration_from_Sphinx.md
index 52f30a6d..4507ee52 100755
--- a/manual/Installation/Migration_from_Sphinx.md
+++ b/manual/Installation/Migration_from_Sphinx.md
@@ -1,3 +1,4 @@
+# null 4 "This section of documentation not converted to man page"
# Migration from Sphinx Search
## Sphinx 2.x -> Manticore 2.x
@@ -72,9 +73,12 @@ Manticore 3.x recognizes and parses special suffixes which makes easier to use n
* Size suffixes - can be used in parameters that define size of something (memory buffer, disk file, limit of RAM, etc. ) in bytes. "Naked" numbers in that places mean literally size in bytes (octets). Size values take suffix `k` for kilobytes (1k=1024), `m` for megabytes (1m=1024k), `g` for gigabytes (1g=1024m) and `t` for terabytes (1t=1024g).
* Time suffixes - can be used in parameters defining some time interval values like delays, timeouts, etc. "Naked" values for those parameters usually have documented scale, and you must know if their numbers, say, 100, means '100 seconds' or '100 milliseconds'. However instead of guessing you just can write suffixed value and it will be fully determined by it's suffix. Time values take suffix `us` for useconds (microseconds), `ms` for milliseconds, `s` for seconds, `m` for minutes, `h` for hours, `d` for days and `w` for weeks.
-## index_converter
+# index_converter 1 "May 2021" manticoresearch-converter
-`index_converter` is a tool for converting indexes created with Sphinx/Manticore Search 2.x to Manticore Search 3.x index format. The tool can be used in several different ways:
+## NAME
+index_converter - tool for converting indexes created with Sphinx/Manticore Search 2.x to Manticore Search 3.x index format.
+
+## DESCRIPTION
#### Convert one index at a time
@@ -104,20 +108,20 @@ $ index_converter --config /home/myuser/manticore.conf --all --output-dir /new/p
#### Convert kill lists
-A special case is for indexes containing kill-lists. As the behaviour of how kill-lists works has changed (see [killlist_target](../Creating_an_index/Local_indexes/Plain_and_real-time_index_settings.md#killlist_target)), the delta index should know which are the target indexes for applying the kill-lists. There are 3 ways to have a converted index ready for setting targeted indexes for applying kill-lists:
+A special case is for indexes containing kill-lists. As the behaviour of how kill-lists works has changed, the delta index should know which are the target indexes for applying the kill-lists. There are 3 ways to have a converted index ready for setting targeted indexes for applying kill-lists:
* Use `-killlist-target` when converting an index
- ```ini
- $ index_converter --config /home/myuser/manticore.conf --index deltaindex --killlist-target mainindex:kl
- ```
+
+```ini
+$ index_converter --config /home/myuser/manticore.conf --index deltaindex --killlist-target mainindex:kl
+```
+
* Add killlist_target in the configuration before doing the conversion
* use [ALTER ... KILLIST_TARGET](../Adding_data_from_external_storages/Adding_data_from_indexes/Killlist_in_plain_indexes.md#killlist_target) command after conversion
-#### Complete list of index_converter options
-
-Here's the complete list of `index_converter` options:
+## OPTIONS
-* `--config <file>` (`-c <file>` for short) tells index_converter to use the given file as its configuration. Normally, it will look for manticore.conf in the installation directory (e.g. `/usr/local/manticore/etc/manticore.conf` if installed into `/usr/local/sphinx`), followed by the current directory you are in when calling index_converter from the shell.
+* `--config <file>` (`-c <file>` for short) tells index_converter to use the given file as its configuration. Normally, it will look for manticore.conf in the installation directory (e.g. `/etc/manticoresearch/manticore.conf`), followed by the current directory you are in when calling index_converter from the shell.
* `--index` specifies which index should be converted
* `--path` - instead of using a config file, a path containing index(es) can be used
* `--strip-path` - strips path from filenames referenced by index: stopwords, exceptions and wordforms
diff --git a/manual/Miscellaneous_tools.md b/manual/Miscellaneous_tools.md
index 99fcd442..1ee9cb58 100755
--- a/manual/Miscellaneous_tools.md
+++ b/manual/Miscellaneous_tools.md
@@ -1,14 +1,15 @@
-# Miscellaneous tools
+# indextool 1 "February 2021" manticoresearch-tools
-## indextool
+## NAME
+indextool - helper tool used to dump miscellaneous information about a physical index.
-`indextool` is a helper tool used to dump miscellaneous information about a physical index. The general usage is:
+## SYNOPSIS
```sql
indextool <command> [options]
```
-Options effective for all commands:
+## OPTIONS
* `--config <file>` (`-c <file>` for short) overrides the built-in config file names.
* `--quiet` (`-q` for short) keep indextool quiet - it will not output banner, etc.
@@ -37,16 +38,19 @@ The commands are as follows:
* `--rotate` works only with `--check` and defines whether to check index waiting for rotation, i.e. with .new extension. This is useful when you want to check your index before actually using it.
* `--apply-killlists` loads and applies kill-lists for all indexes listed in the config file. Changes are saved in .SPM files. Kill-list files (.SPK) are deleted. This can be useful if you want to move applying indexes from server startup to indexing stage.
-## spelldump
+# spelldump 1 "February 2021" manticoresearch-tools
-`spelldump` is used to extract contents of a dictionary file that uses `ispell` or `MySpell` format, which can help build word lists for *wordforms* - all of the possible forms are pre-built for you.
+## NAME
+spelldump - extract contents of a dictionary file that uses `ispell` or `MySpell` format, which can help build word lists for *wordforms* - all of the possible forms are pre-built for you.
-The general usage is:
+## SYNOPSIS
```bash
spelldump [options] <dictionary> <affix> [result] [locale-name]
```
+## DESCRIPTION
+
The two main parameters are the dictionary's main file and its affix file; usually these are named as `[language-prefix].dict` and `[language-prefix].aff` and will be available with most common Linux distributions, as well as various places online.
`[result]` specifies where the dictionary data should be output to, and `[locale-name]` additionally specifies the locale details you wish to use.
@@ -69,7 +73,9 @@ zoned > zoned
zoning > zoning
```
-## wordbreaker
+# wordbreaker 1 "February 2021" manticoresearch-tools
+
+## DESCRIPTION
`wordbreaker` is used to split compound words, as usual in URLs, into its component words. For example, this tool can split "lordoftherings" into its four component words, or `http://manofsteel.warnerbros.com` into "man of steel warner bros". This helps searching, without requiring prefixes or infixes: searching for "sphinx" wouldn't match "sphinxsearch" but if you break the compound word and index the separate components, you'll get a match without the costs of prefix and infix larger index files.
diff --git a/manual/Starting_the_server/Manually.md b/manual/Starting_the_server/Manually.md
index aa5c337a..4c1ba572 100755
--- a/manual/Starting_the_server/Manually.md
+++ b/manual/Starting_the_server/Manually.md
@@ -1,15 +1,6 @@
-# Starting Manticore manually
-
-You can also start Manticore Search by calling `searchd` (Manticore Search server binary) directly:
-
-```shell
-searchd [OPTIONS]
-```
-
-Note that without setting a path to the configuration file `searchd` will try to find it in several locations, depending on the operation system.
-
-## searchd command line options
+# searchd 1 "September 2021" manticoresearch
+## OPTIONS
The options available to `searchd` in all operation systems are:
@@ -36,32 +27,44 @@ Possible exit codes are as follows:
* 3 if server crashed during shutdown
* `--status` command is used to query running `searchd` instance status using the connection details from the (optionally) provided configuration file. It will try to connect to running instance using the first found UNIX socket or TCP port from the configuration file. On success it will query for a number of status and performance counter values and print them. You can also use [SHOW STATUS](../Profiling_and_monitoring/Node_status.md#SHOW-STATUS) command to access the very same counters via SQL protocol. Examples:
+
```bash
$ searchd --status
$ searchd --config /etc/manticoresearch/manticore.conf --status
```
+
* `--pidfile` is used to explicitly force using a PID file (where the `searchd` process identification number is stored) despite any other debugging options that say otherwise (for instance, `--console`). This is a debugging option.
+
```bash
$ searchd --console --pidfile
```
+
* `--console` is used to force `searchd` into console mode; typically Manticore runs as a conventional server application and logs information into log files (as specified in configuration file). Sometimes though, when debugging issues in the configuration or the server itself or trying to diagnose hard-to-track-down problems it may be easier to force it to dump information directly to the console/command line from which it is being called. Running in console mode also means that the process will not be forked (so searches are done in sequence) and logs will not be written to. (It should be noted that console mode is not the intended method for running `searchd`.) You can invoke so:
+
```bash
$ searchd --config /etc/manticoresearch/manticore.conf --console
```
+
* `--logdebug`, `--logreplication`, `--logdebugv`, and `--logdebugvv` options enable additional debug output in the server log. They differ by the logging verboseness level. These are debugging options, they pollute the log a lot, and thus they should *not* be normally enabled. (The normal use case for these is to enable them temporarily on request, to assist with some particularly complicated debugging session.)
* `--iostats` is used in conjunction with the logging options (the `query_log` will need to have been activated in `manticore.conf`) to provide more detailed information on a per-query basis as to the input/output operations carried out in the course of that query, with a slight performance hit and a little bit bigger logs. The IO statistics don't include information about IO operations for attributes, as these are loaded with mmap. To enable it you can start `searchd` so:
+
```bash
$ searchd --config /etc/manticoresearch/manticore.conf --iostats
```
+
* `--cpustats` is used to provide actual CPU time report (in addition to wall time) in both query log file (for every given query) and status report (aggregated). It depends on `clock_gettime()` Linux system call or falls back to less precise call on certain systems. You might start `searchd` thus:
+
```bash
$ searchd --config /etc/manticoresearch/manticore.conf --cpustats
```
+
* `--port portnumber` (`-p` for short) is used to specify the port that Manticore should listen on to accept binary protocol requests, usually for debugging purposes. This will usually default to 9312, but sometimes you need to run it on a different port. Specifying it on the command line will override anything specified in the configuration file. The valid range is 0 to 65535, but ports numbered 1024 and below usually require a privileged account in order to run.
An example of usage:
- ```bash
- $ searchd --port 9313
- ```
+
+```bash
+$ searchd --port 9313
+```
+
* `--listen ( address ":" port | port | path ) [ ":" protocol ]` (or `-l` for short) Works as `--port`, but allow you to specify not only the port, but full path, as IP address and port, or Unix-domain socket path, that `searchd` will listen on. In other words, you can specify either an IP address (or hostname) and port number or just a port number or Unix socket path. If you specify port number, but not the address searchd will listen on all network interfaces. Unix path is identified by a leading slash. As the last param you can also specify a protocol handler (listener) to be used for connections on this socket. Supported protocol values are 'sphinx' and 'mysql' (MySQL protocol used since 4.1).
* `--force-preread` forbids the server to serve any incoming connection until prereading of index files completes. By default, at startup the server accepts connections while index files are lazy loaded into memory. This opens extends the behavior and makes it wait until the files are loaded.
* `--index <index>` (or `-i <index>` for short) forces this instance of `searchd` to only serve the specified index. Like `--port`, above, this is usually for debugging purposes; more long-term changes would generally be applied to the configuration file itself.
@@ -72,41 +75,19 @@ $ searchd --config /etc/manticoresearch/manticore.conf --cpustats
* `ignore-trx-errors`, ignore any transaction errors and skip current binlog file (the default behavior is to exit with an error).
* `ignore-all-errors`, ignore any errors described above (the default behavior is to exit with an error).
Example:
- ```bash
- $ searchd --replay-flags=accept-desc-timestamp
- ```
-* `--coredump` is used to enable saving a core file or a minidump of the server on crash. Disabled by default to speed up of server restart on crash. This is useful for debugging purposes.
+
```bash
-$ searchd --config /etc/manticoresearch/manticore.conf --coredump
+$ searchd --replay-flags=accept-desc-timestamp
```
-* `--new-cluster` bootstraps a replication cluster and makes the server a reference node with [cluster restart](../Creating_a_cluster/Setting_up_replication/Restarting_a_cluster.md) protection. On Linux you can also run `manticore_new_cluster`. It will start Manticore in `--new-cluster` mode via systemd.
-* `--new-cluster-force` bootstraps a replication cluster and makes the server a reference node bypassing [cluster restart](../Creating_a_cluster/Setting_up_replication/Restarting_a_cluster.md) protection. On Linux you can also run `manticore_new_cluster --force`. It will start Manticore in `--new-cluster-force` mode via systemd.
-There are some options for `searchd` that are specific to Windows platforms, concerning handling as a service, and are only available in Windows binaries.
+* `--coredump` is used to enable saving a core file or a minidump of the server on crash. Disabled by default to speed up of server restart on crash. This is useful for debugging purposes.
-Note that in Windows searchd will default to `--console` mode, unless you install it as a service.
-* `--install` installs `searchd` as a service into the Microsoft Management Console (Control Panel / Administrative Tools / Services). Any other parameters specified on the command line, where `--install` is specified will also become part of the command line on future starts of the service. For example, as a part of calling `searchd`, you will likely also need to specify the configuration file with `--config`, and you would do that as well as specifying `--install`. Once called, the usual start/stop facilities will become available via the management console, so any methods you could use for starting, stopping and restarting services would also apply to `searchd`. Example:
-```bat
-C:\WINDOWS\system32> C:\Manticore\bin\searchd.exe --install
- --config C:\Manticore\manticore.conf
+```bash
+$ searchd --config /etc/manticoresearch/manticore.conf --coredump
```
-If you want to have the I/O stats every time you start `searchd`, you need to specify its option on the same line as the `--install` command thus:
-
-```bat
-C:\WINDOWS\system32> C:\Manticore\bin\searchd.exe --install
- --config C:\Manticore\manticore.conf --iostats
-```
-* `--delete` removes the service from the Microsoft Management Console and other places where services are registered, after previously installed with `--install`. Note, this does not uninstall the software or delete the indexes. It means the service will not be called from the services systems, and will not be started on the machine's next start. If currently running as a service, the current instance will not be terminated (until the next reboot, or until `searchd` is called with `--stop`). If the service was installed with a custom name (with `--servicename`), the same name will need to be specified with `--servicename` when calling to uninstall. Example:
-```bat
-C:\WINDOWS\system32> C:\Manticore\bin\searchd.exe --delete
-```
-* `--servicename <name>` applies the given name to `searchd` when installing or deleting the service, as would appear in the Management Console; this will default to searchd, but if being deployed on servers where multiple administrators may log into the system, or a system with multiple `searchd` instances, a more descriptive name may be applicable. Note that unless combined with `--install` or `--delete`, this option does not do anything. Example:
-```bat
-C:\WINDOWS\system32> C:\Manticore\bin\searchd.exe --install
- --config C:\Manticore\manticore.conf --servicename ManticoreSearch
-```
-* `--ntservice` is the option that is passed by the Management Console to `searchd` to invoke it as a service on Windows platforms. It would not normally be necessary to call this directly; this would normally be called by Windows when the service would be started, although if you wanted to call this as a regular service from the command-line (as the complement to `--console`) you could do so in theory.
+* `--new-cluster` bootstraps a replication cluster and makes the server a reference node with [cluster restart](../Creating_a_cluster/Setting_up_replication/Restarting_a_cluster.md) protection. On Linux you can also run `manticore_new_cluster`. It will start Manticore in `--new-cluster` mode via systemd.
+* `--new-cluster-force` bootstraps a replication cluster and makes the server a reference node bypassing [cluster restart](../Creating_a_cluster/Setting_up_replication/Restarting_a_cluster.md) protection. On Linux you can also run `manticore_new_cluster --force`. It will start Manticore in `--new-cluster-force` mode via systemd.
* `--safetrace` forces `searchd` to only use system backtrace() call in crash reports. In certain (rare) scenarios, this might be a "safer" way to get that report. This is a debugging option.
* `--nodetach` switch (Linux only) tells `searchd` not to detach into background. This will also cause log entry to be printed out to console. Query processing operates as usual. This is a debugging option and might also be useful when you run Manticore in a docker container to capture its output.
--
2.34.1