Background
Observability is key thing in every application development, but apart from logs it can be achived with other tools like telemetry.
Problem
Main question for now is what telemetry we want to collect? Another problem is what metric we can expose at prometheus level and which of them cannot be presented by prometheus types.
Solution
There are 2 types of telemetry, peer local and global info. Here is all info about them and prometheus types which can be used for them
Peer info
- Peer name
- Peer location
- Iroha version (counter?)
- Device operating system (kernel version, etc)
- Peer uptime (counter?)
- Peer current role (gauge with integer as role?)
- Its networking speed (gauge)
- Its latency (gauge)
- Last available block on peer (gauge)
- Last finalized block by peer (gauge)
- Block time (gauge)
- Block propagation time (gauge)
- Number of pending transactions (gauge)
Global info
- Finalized block (gauge)
- Average block time (histogram/summary)
- Time since last block (gauge)
- Block propagation time (histogram/summary)
- Number of transactions in block (gauge)
- Info about gas
Decisions
Alternatives
Concerns
Assumptions
Risks
Additional Information
Prometheus metric types https://prometheus.io/docs/concepts/metric_types/