AlgoTraderAlgoTrader Documentation

Chapter 12. Performance Monitoring

12.1. Metrics Configuration
12.2. Grafana Configuration
12.3. Metrics, Dashboards, Panels
12.3.1. Environment Metrics
12.3.2. Esper Metrics
12.3.3. Hazelcast Metrics
12.3.4. Market Data Metrics
12.3.5. Order Metrics
12.3.6. Tick-to-Trade
12.3.7. Event Dispatch Metrics
12.3.8. Influx Metrics
12.3.9. UI Metrics

AlgoTrader provides a sophisticated performance metrics tracking functionality. The main components are meters, registries and Grafana.

By default all performance metrics are enabled and to be collected and persisted (if InfluxDB is enabled via the influx profile). See the default config settings and a description on how to use them (conf-metrics.properties):

    #List of metrics enabled and disable switches delineated by |. +{switch} enables -{switch} disables.
    #The switches can be found in the MeterSwitch enum. The order of switches matter, and gets evaluated left to right. e.g.:
    #  -to enable all but esper: +All|-Esper
    #  -this only enables all due to ordering: -Esper|+All
    #  -a more complex one: +AlgoTrader|+Platform|-Hazelcast
    #{"type":"String","label":"Enable metrics with +, disable with - (if not enabled doesn't need to disable)"}
    metrics.filter=+All

    #Additional tags (key:value pairs) added to the specified metric, delineated by |.
    # Defined such as metrics.tags.{metric}={tag1}:{value1}|{tag2}:{value2}|...
    #The metrics can be found in the MeterSwitch enum.
    #e.g.:
    #metrics.tags.Esper=type:event|source:AT|color:red
    #metrics.tags.EventDispatch=type:event|source:AT|color:blue
    #metrics.tags.Order=type:business|source:AT|color:blue
    #metrics.tags.MarketData=type:business|source:AT|color:orange

    #Log level for metrics (only matters if the LOGGING registry is enabled)
    #{"type":"String","label":"Log level"}
    metrics.log.level=Debug

    # List of registries (possible values that can be added INFLUX,JMX,LOGGING)
    #{"type":"String","label":"List of registries to send metrics to for further processing, persistence or presentation"}
    metrics.registries=INFLUX,JMX,LOGGING

    #{"type":"int","label":"Esper statement metrics logging interval (seconds)"}
    metrics.engine.logging.intervalSeconds=10

    #{"type":"int","label":"Esper statement metrics logging threshold - min average CPU time (millis)"}
    metrics.engine.logging.thresholdMs=50
            

To tweak monitoring behavior, configuration parameters can be set/overriden via program arguments, see Section 2.4, “VM Options”. To disable metrics set the metrics.filter parameter to empty (i.e.: -Dmetrics.filter=""), this can be useful if the hardware resources running AlgoTrader are limited, and we want to minimize memory consumption and maximize performance.

AlgoTrader provides out of the box dashboards and a data-source configuration for Grafana under the /algotrader/bootstrap/conf/src/main/resources/grafana/conf directory. Grafana loads configuration and dashboards on startup, so the mentioned configuration can be copied over to the Grafana installation's /conf directory, and after a start/restart dashboards should load. Please note that there are a few requirements for this:

Please be aware that for specific dashboards to work, that metric has to be enabled on the AlgoTrader side too, otherwise AlgoTrader would not collect/persist that data and Grafana would have no way of reading it from Influx. e.g.: If Esper metrics are not enabled on the AlgoTrader side (see Section 12.1, “Metrics Configuration”), then all he Esper Statements and Engine related dashboards will stop displaying data.

AlgoTrader has a wide range of metrics that can be monitored. All of them are fed to the enabled registries, see Section 12.1, “Metrics Configuration”.

Esper statement and engine metrics.

See engine metrics.

Statement Metrics can be seen on the bottom panel and can be set to display only selected statements.

The shown attributes are the following as per Esper:

  • wallTime - Statement processing wall time in nanoseconds (based on System.nanoTime). This is the most useful metric from esper as it indicates how much time is spent on a specific Esper statement, that can indicate either a bottleneck in the code called by the esper statement, or that the statement itself is too resource heavy, complicated or overloaded.

  • numInput - Number of input events to the statement.

  • numOutputIStream - Number of insert stream rows output to listeners or the subscriber, if any.

  • numOutputRStream - Number of remove stream rows output to listeners or the subscriber, if any.

See statement metrics.

Used to monitor Exchange and internal AlgoTrader latency and event count, measuring how much it takes for specific types of market data (including ticks) to arrive from different market data providers, and how long it takes to propagate them internally. This is not available for all providers as not all venues provide the necessary data (sending-time).

There are two blocks of panels. The upper section has Internal count and latency figures, while the bottom block of panels contains the Exchange figures. The connection between the two lies in the adapter layer, so internal really means Adapters-to-Strategies and exchange means Exchanges-to-Adapters, describing the external (network) and internal path market data takes. These metrics are useful for monitoring the network connection to the exchanges and to detect outages and general responsiveness, to determine the source of adapter issues, whether it is internal/external, load/network/latency related.

These metrics track exchange and internal AlgoTrader latency and event count, measuring the delay between sending orders and a receiving a response/status (ack, fill, reject, cancel) from the target trading venue. This is not available for all providers as not all venues provide the necessary data (sending-time).

Note: the numbers on this screenshot are not representative as they have been recorded on a demo account that has some order limitations put in place inflating the numbers.

There are two blocks of panels the upper section has internal count and latency figures, while the bottom block of panels contains the exchange figures. The connection between the two lies in the adapter layer, so Internal really means Adapters-to-Strategies and exchange means Exchanges-to-Adapters, describing the external (network) and internal path market data takes. These metrics are useful for monitoring the network connection to the exchanges and to detect outages and general responsiveness, to determine the source of adapter issues, whether it is internal/external, load/network/latency related.

Unlike other metrics this one is not readily available and calculated, and needs some additional math/guesswork to have. All the necessary data to calculate the Tick-to-Trade metric is available as smaller parts of other metrics and panels on the Section 12.3.4, “Market Data Metrics” and Section 12.3.5, “Order Metrics” dashboards. To have a clearer picture lets have a look at how the Tick-to-Trade is build and what components we specifically need from those two dashboards to be able to calculate it. If the whole process is automated via a strategy then the data path is the following:

Market Data Path:

  • Exchange ----(ExternalMD/Tick)-------> Adapter

  • Adapter ----(InternalMD/Tick)-------> Strategy

Order Path:

  • Strategy ----(InternalOrder)---------> Adapter

  • Adapter ----(ExternalOrder)---------> Exchange

Order Status/Response Path:

  • Exchange ----(ExternalStatus/Resp)---> Adapter

  • Adapter ----(InternalStatus/Resp)---> Strategy

Adding up the latencies on these paths, gives us an accurate approximation of the Tick-to-Trade metric. Depending on the requirements it can be calculated in a variety of ways, of which the simplest is to take the Delay Average panel values and add them up by path, be aware that this is only an average though and we can have outliers. The way it translates is the following: the first three points directly correspond to the values displayed on the Market Data and Order dashboards, and the last free can be seen as a single path that we cannot separately measure, only as a cumulative network latency value and is displayed on the Order dashboard.

Since the panels contain more specific data history filtered/grouped by exchange, order type and market data type, mirroring the previous data path calculation process with carefully selected data can determine the Tick-to-Trade metric at a certain time for a fixed order type, market data type and exchange if desirable.

In addition to being able to calculate the Tick-to-Trader metric, having the sub-component panels helps tracking/narrowing down and potentially solve the issue when the value is becomes overinflated.