Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added reporting and monitoring thoughts

...

  • Pushgateway part
    • What kind of data does it need: aggregated TX data since the last update. So it essentially needs every TX data since the last update (regardless of actually what component does the aggregation)
    • When does it need it: the data is aggregated and the output is produced at specified (configurable) intervals
    • What output does it produce: pushes the following client-specific metrics (calculated for the current update interval) to the gateway: caliper_tps, caliper_latency, caliper_txn_submit_rate, caliper_txn_success, caliper_txn_failure, caliper_txn_pending
    • Remarks:
      • Calculated such derived metrics on the Caliper-side is not desirable, Prometheus is probably better at this, plus the users can specify whatever time-window they want.
      • The following per-client metrics (i.e., metrics have a client label) should suffice for deriving every other metric: caliper_txn_submitted, caliper_txn_finished{status="success|failed"}, caliper_txn_e2e_latency (histogram/summary). This requires only two counters and a histogram/summary, whose update is trivial, no other calculations are needed.
        • Goodput: caliper_tps_success = rate(sum(caliper_txn_finished{status="success"})[1s])
        • Failure rate: caliper_tps_failed = rate(sum(caliper_txn_finished{status="failed"})[1s])
        • Total TPS: caliper_tps = rate(sum(caliper_txn_finished)[1s])
        • Send rate: caliper_txn_send_rate = rate(sum(caliper_txn_submitted)[1s])
        • Pending: caliper_txn_pending = sum(caliper_txn_submitted) - sum(caliper_txn_finished)
      • Additional labels can be used based on the SUT, like channel, chaincode.
  • Updates to the master process
    • What kind of data does it need: aggregated TX data since the last update. So it essentially needs every TX data since the last update (regardless of actually what component does the aggregation)
    • When does it need it: the data is aggregated and the output is produced at specified (configurable) intervals
    • What output does it produce: sends an inter-process txUpdate message to the master process

...

  1. These modules as of now perform only progress reports and are usually mutually exclusive (although enabling both would be a nice validation of the progress report since they should reflect similar numbers during a round, and definitely should equal at the end of a round).
  2. They report data based on some data source (worker messages, Prometheus timelines, etc). If we have multiple TxObservers in the workers, and multiple ProgressReporters in the master process, then data format compatibility might be an issue. The source data should include some correlation ID (i.e., target specifier), for example, the WorkerTxObserver would add the target: "worker-progress-reporter" field to its TX updates and ProgressReporters could filter data based on this.

Monitors: TBD these modules monitor different resources throughout the benchmark run and aggregate them. They expose different aggregated metrics, usually for each round and for the total benchmark execution. Some require periodic metric scraping (process and container monitoring), while others acquire the metrics at the end of rounds or the benchmark run (like the Prometheus-based monitor). 

Remarks: the monitor "output" format should be unified. Currently, this is almost true, but the report generation, for example, depends on whether Prometheus is used or not.  A unified format would decouple the reporters from the monitors.

Reporters: they generate some kind of report usually at the end of the benchmark run. The current reporter prints to the console (and its table format is not really log-friendly) and also generates an HTML report. This could be split into two reporters. The reporters get their data from the monitors (hopefully in a unified format). Pluggable reporters would allow the easy addition of, for example, a Grafana-based (or any other plotting lib-based) reporter, that could visualize time-series data (and ignore the monitor's aggregated data).