Document bolt metrics (#12107)

This commit is contained in:
Josh Black 2021-07-16 11:44:30 -07:00 committed by GitHub
parent e354722f9e
commit 1e68be76e4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -377,54 +377,72 @@ These metrics relate to the supported [storage backends][storage-backends].
These metrics relate to raft based [integrated storage][integrated-storage].
| Metric | Description | Unit | Type |
| :------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------- | :------ |
| `vault.raft.apply` | Number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Raft servers. | raft transactions / interval | counter |
| `vault.raft.barrier` | Number of times the node has started the barrier i.e the number of times it has issued a blocking call, to ensure that the node has all the pending operations that were queued, to be applied to the node's FSM. | blocks / interval | counter |
| `vault.raft.candidate.electSelf` | Time to request for a vote from a peer. | ms | summary |
| `vault.raft.commitNumLogs` | Number of logs processed for application to the FSM in a single batch. | logs | gauge |
| `vault.raft.commitTime` | Time to commit a new entry to the Raft log on the leader. | ms | timer |
| `vault.raft.compactLogs` | Time to trim the logs that are no longer needed. | ms | summary |
| `vault.raft.delete` | Time to delete file from raft's underlying storage. | ms | summary |
| `vault.raft.delete_prefix` | Time to delete files under a prefix from raft's underlying storage. | ms | summary |
| `vault.raft.fsm.apply` | Number of logs committed since the last interval. | commit logs / interval | summary |
| `vault.raft.fsm.applyBatch` | Time to apply batch of logs. | ms | summary |
| `vault.raft.fsm.applyBatchNum` | Number of logs applied in batch. | ms | summary |
| `vault.raft.fsm.enqueue` | Time to enqueue a batch of logs for the FSM to apply. | ms | timer |
| `vault.raft.fsm.restore` | Time taken by the FSM to restore its state from a snapshot. | ms | summary |
| `vault.raft.fsm.snapshot` | Time taken by the FSM to record the current state for the snapshot. | ms | summary |
| `vault.raft.fsm.store_config` | Time to store the configuration. | ms | summary |
| `vault.raft.get` | Time to retrieve file from raft's underlying storage. | ms | summary |
| `vault.raft.leader.dispatchLog` | Time for the leader to write log entries to disk. | ms | timer |
| `vault.raft.leader.dispatchNumLogs` | Number of logs committed to disk in a batch. | logs | gauge |
| `vault.raft.list` | Time to retrieve list of keys from raft's underlying storage. | ms | summary |
| `vault.raft.peers` | Number of peers in the raft cluster configuration. | peers | gauge |
| `vault.raft.put` | Time to persist key in raft's underlying storage. | ms | summary |
| `vault.raft.replication.appendEntries.log` | Number of logs replicated to a node, to bring it up to speed with the leader's logs. | logs appended / interval | counter |
| `vault.raft.replication.appendEntries.rpc` | Time taken by the append entries RFC, to replicate the log entries of a leader node onto its follower node(s). | ms | timer |
| `vault.raft.replication.heartbeat` | Time taken to invoke appendEntries on a peer, so that it doesnt timeout on a periodic basis. | ms | timer |
| `vault.raft.replication.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.restore` | Number of times the restore operation has been performed by the node. Here, restore refers to the action of raft consuming an external snapshot to restore its state. | operation invoked / interval | counter |
| `vault.raft.restoreUserSnapshot` | Time taken by the node to restore the FSM state from a user's snapshot. | ms | timer |
| `vault.raft.rpc.appendEntries` | Time taken to process an append entries RPC call from a node. | ms | timer |
| `vault.raft.rpc.appendEntries.processLogs` | Time taken to process the outstanding log entries of a node. | ms | timer |
| `vault.raft.rpc.appendEntries.storeLogs` | Time taken to add any outstanding logs for a node, since the last appendEntries was invoked. | ms | timer |
| `vault.raft.rpc.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.rpc.processHeartbeat` | Time taken to process a heartbeat request. | ms | timer |
| `vault.raft.rpc.requestVote` | Time taken to complete requestVote RPC call. | ms | summary |
| `vault.raft.snapshot.create` | Time taken to initialize the snapshot process. | ms | timer |
| `vault.raft.snapshot.persist` | Time taken to dump the current snapshot taken by the node to the disk. | ms | timer |
| `vault.raft.snapshot.takeSnapshot` | Total time involved in taking the current snapshot (creating one and persisting it) by the node. | ms | timer |
| `vault.raft.state.follower` | Number of times node has entered the follower mode. This happens when a new node joins the cluster or after the end of a leader election. | follower state entered / interval | counter |
| `vault.raft.transition.heartbeat_timeout` | Number of times node has transitioned to the Candidate state, after receive no heartbeat messages from the last known leader. | timeouts / interval | counter |
| `vault.raft.transition.leader_lease_timeout` | Number of times quorum of nodes were not able to be contacted. | contact failures | counter |
| `vault.raft.verify_leader` | Number of times node checks whether it is still the leader or not. | checks / interval | counter |
| `vault.raft-storage.delete` | Time to insert log entry to delete path. | ms | timer |
| `vault.raft-storage.get` | Time to retrieve value for path from FSM. | ms | timer |
| `vault.raft-storage.put` | Time to insert log entry to persist path. | ms | timer |
| `vault.raft-storage.list` | Time to list all entries under the prefix from the FSM. | ms | timer |
| `vault.raft-storage.transaction` | Time to insert operations into a single log. | ms | timer |
| `vault.raft-storage.entry_size` | The total size of a Raft entry during log application in bytes. | bytes | sample |
| Metric | Description | Unit | Type |
| :--------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------- | :------ |
| `vault.raft.apply` | Number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Raft servers. | raft transactions / interval | counter |
| `vault.raft.barrier` | Number of times the node has started the barrier i.e the number of times it has issued a blocking call, to ensure that the node has all the pending operations that were queued, to be applied to the node's FSM. | blocks / interval | counter |
| `vault.raft.candidate.electSelf` | Time to request for a vote from a peer. | ms | summary |
| `vault.raft.commitNumLogs` | Number of logs processed for application to the FSM in a single batch. | logs | gauge |
| `vault.raft.commitTime` | Time to commit a new entry to the Raft log on the leader. | ms | timer |
| `vault.raft.compactLogs` | Time to trim the logs that are no longer needed. | ms | summary |
| `vault.raft.delete` | Time to delete file from raft's underlying storage. | ms | summary |
| `vault.raft.delete_prefix` | Time to delete files under a prefix from raft's underlying storage. | ms | summary |
| `vault.raft.fsm.apply` | Number of logs committed since the last interval. | commit logs / interval | summary |
| `vault.raft.fsm.applyBatch` | Time to apply batch of logs. | ms | summary |
| `vault.raft.fsm.applyBatchNum` | Number of logs applied in batch. | ms | summary |
| `vault.raft.fsm.enqueue` | Time to enqueue a batch of logs for the FSM to apply. | ms | timer |
| `vault.raft.fsm.restore` | Time taken by the FSM to restore its state from a snapshot. | ms | summary |
| `vault.raft.fsm.snapshot` | Time taken by the FSM to record the current state for the snapshot. | ms | summary |
| `vault.raft.fsm.store_config` | Time to store the configuration. | ms | summary |
| `vault.raft.get` | Time to retrieve file from raft's underlying storage. | ms | summary |
| `vault.raft.leader.dispatchLog` | Time for the leader to write log entries to disk. | ms | timer |
| `vault.raft.leader.dispatchNumLogs` | Number of logs committed to disk in a batch. | logs | gauge |
| `vault.raft.list` | Time to retrieve list of keys from raft's underlying storage. | ms | summary |
| `vault.raft.peers` | Number of peers in the raft cluster configuration. | peers | gauge |
| `vault.raft.put` | Time to persist key in raft's underlying storage. | ms | summary |
| `vault.raft.replication.appendEntries.log` | Number of logs replicated to a node, to bring it up to speed with the leader's logs. | logs appended / interval | counter |
| `vault.raft.replication.appendEntries.rpc` | Time taken by the append entries RFC, to replicate the log entries of a leader node onto its follower node(s). | ms | timer |
| `vault.raft.replication.heartbeat` | Time taken to invoke appendEntries on a peer, so that it doesnt timeout on a periodic basis. | ms | timer |
| `vault.raft.replication.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.restore` | Number of times the restore operation has been performed by the node. Here, restore refers to the action of raft consuming an external snapshot to restore its state. | operation invoked / interval | counter |
| `vault.raft.restoreUserSnapshot` | Time taken by the node to restore the FSM state from a user's snapshot. | ms | timer |
| `vault.raft.rpc.appendEntries` | Time taken to process an append entries RPC call from a node. | ms | timer |
| `vault.raft.rpc.appendEntries.processLogs` | Time taken to process the outstanding log entries of a node. | ms | timer |
| `vault.raft.rpc.appendEntries.storeLogs` | Time taken to add any outstanding logs for a node, since the last appendEntries was invoked. | ms | timer |
| `vault.raft.rpc.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.rpc.processHeartbeat` | Time taken to process a heartbeat request. | ms | timer |
| `vault.raft.rpc.requestVote` | Time taken to complete requestVote RPC call. | ms | summary |
| `vault.raft.snapshot.create` | Time taken to initialize the snapshot process. | ms | timer |
| `vault.raft.snapshot.persist` | Time taken to dump the current snapshot taken by the node to the disk. | ms | timer |
| `vault.raft.snapshot.takeSnapshot` | Total time involved in taking the current snapshot (creating one and persisting it) by the node. | ms | timer |
| `vault.raft.state.follower` | Number of times node has entered the follower mode. This happens when a new node joins the cluster or after the end of a leader election. | follower state entered / interval | counter |
| `vault.raft.transition.heartbeat_timeout` | Number of times node has transitioned to the Candidate state, after receive no heartbeat messages from the last known leader. | timeouts / interval | counter |
| `vault.raft.transition.leader_lease_timeout` | Number of times quorum of nodes were not able to be contacted. | contact failures | counter |
| `vault.raft.verify_leader` | Number of times node checks whether it is still the leader or not. | checks / interval | counter |
| `vault.raft-storage.delete` | Time to insert log entry to delete path. | ms | timer |
| `vault.raft-storage.get` | Time to retrieve value for path from FSM. | ms | timer |
| `vault.raft-storage.put` | Time to insert log entry to persist path. | ms | timer |
| `vault.raft-storage.list` | Time to list all entries under the prefix from the FSM. | ms | timer |
| `vault.raft-storage.transaction` | Time to insert operations into a single log. | ms | timer |
| `vault.raft-storage.entry_size` | The total size of a Raft entry during log application in bytes. | bytes | sample |
| `vault.raft_storage.bolt.freelist.`<br/>`free_pages` | Number of free pages in the freelist. | pages | gauge |
| `vault.raft_storage.bolt.freelist.`<br/>`pending_pages` | Number of pending pages in the freelist. | pages | gauge |
| `vault.raft_storage.bolt.freelist.`<br/>`allocated_bytes` | Total bytes allocated in free pages. | bytes | gauge |
| `vault.raft_storage.bolt.freelist.`<br/>`used_bytes` | Total bytes used by the freelist. | bytes | gauge |
| `vault.raft_storage.bolt.transaction.`<br/>`started_read_transactions` | Number of started read transactions. | transactions | gauge |
| `vault.raft_storage.bolt.transaction.`<br/>`currently_open_read_transactions` | Number of currently open read transactions. | transactions | gauge |
| `vault.raft_storage.bolt.page.count` | Number of page allocations. | allocations | gauge |
| `vault.raft_storage.bolt.page.`<br/>`bytes_allocated` | Total bytes allocated. | bytes | gauge |
| `vault.raft_storage.bolt.cursor.count` | Number of cursors created. | cursors | gauge |
| `vault.raft_storage.bolt.node.count` | Number of node allocations. | nodes | gauge |
| `vault.raft_storage.bolt.node.dereferences` | Number of node dereferences. | dereferences | gauge |
| `vault.raft_storage.bolt.rebalance.count` | Number of node rebalances. | rebalances | gauge |
| `vault.raft_storage.bolt.rebalance.time` | Time taken rebalancing. | ms | sample |
| `vault.raft_storage.bolt.split.count` | Number of nodes split. | nodes | gauge |
| `vault.raft_storage.bolt.spill.count` | Number of nodes spilled. | nodes | gauge |
| `vault.raft_storage.bolt.spill.time` | Time taken spilling. | ms | sample |
| `vault.raft_storage.bolt.write.count` | Number of writes performed. | writes | gauge |
| `vault.raft_storage.bolt.write.time` | Time taken writing to disk. | ms | sample |
## Integrated Raft Storage Leadership Changes