Metrics are important source of information to conclude the following:
- System health
- API level monitoring
- Api latency
- Success and failure
- Significant API for a business
The main agenda of this design is to capture accurate metrics for XYZ services so that above inferences can be made through metric streams.
Metric Recording and Reporting Design Overview:
Metrics captured within XYZ service will be reported to SignalFx in PROD environment and Log file in QA environment. Following is the block diagram for the same.
Brief about components of above design:
- Metric Registry: This is a metric container where metrics of all type are saved. For each metric a reservoir is associated which keep on flushing the data as well. Metric registry which will be used in XYZ will be of metric-core which Dropwizard(formerly known as Codehale) library.
- Metric Reporter: Metrics captured in metric registry are reported to end destination at regular intervals. Metric-core comes with in built reporter like ConsoleReporter, Slf4jReporter, JmxReporter… etc.
- Slf4jReporter: We will be using Slf4jReporter for reporting metrics in QA environment.
- SignalFxReporter: We will be using this reporter for reporting metrics SignalFx.
- SignalFx: A metric streaming platform. All metrics picked up for reporting by SignalFxReporter are published to v2/datapoint endpoint which is a public end point of SignalFx.
- Log file: All metrics picked up for reporting by Slf4jReporter are written to an application log file.
Metrics To be Reported:
- Incremental Counter in a rolling window of 5 mins will be captured for:
- Api Success or Failure
- Cumulative count (later on will be switched off)
- Request incremental count (later on will be switched off)
- Api Timer metric will also be captured to record api latency. Out of all metrics of Timer following will be reported to SingalFx
- Rate 5min metric : to produce stats for request/second in a window of 5 mins
- Median metric: to produce average latency in sliding window of 5 mins.
Requirements:
- Metric reporting from any method based upon the criticality in terms performance and business importance.
- Derivation of metric name with a standard convention.
- Rolling out metrics reporting phase wise.
- Switch on/off metrics as per needs
Following are the components as per the requirement:
- Annotation based pointcut:
- CounterRecorder:
Will have one field metricName. This field will can be used for the case when user is interested in custom name rather than derived name by convention.
- LatencyRecorder:
Will have one field metricName. This field will can be used for the case when user is interested in custom name rather than derived name by convention.
- Interface MetricRecorder: This interface will be used to record metric which will be derived from the metadata supplied. Following method will be provided by this interface.
void record(RecordMetadata recordMetadata);
- RecordMetadata: Metric specific information rich model which will be used to record metric. It will have the following fields:
- MetricName: Final name of the metric to be used
- MetricType: An enum constant which will be of following type:
- INCREMENTAL_COUNT
- CUMULATIVE_COUNT
- SUCCESS_COUNT
- FAILURE_COUNT
- LATENCY
Each enum constant has metricSuffix field associated with each metricType. This will be appended at the end of the metric name.
- Value: This field will be used to add the numerical value of metric. As of now this will be populated only in case of LATENCY metric.
- latencyUnit: This field will be used in case of LATENCY metric. Default value of this field is MILLISECONDS.
- Builder: A static inner class to be used to build RecordMetadata from:
- Class<?>: Class from which metric recording has been initiated
- Prefix:
- Infix: Probable value can be the name of method against which metric is being recorded
- AbsoluteMetricName
- Value:
- timeUnit
Derivation of metricName:
- If AbsoluteMetricName is present then this is the final name of metric which will be recorded.
- Else if prefix is present then final name of metric is
- AbsoluteMetricName = prefix + “.” + metricType.metricSuffix
- Else
- AbsoluteMetricName = clazz + “.” + infix + “.” + metricType.metricSuffix
- MetricRecorderDefault: This class implements MetricRecorder interface. This implementation will be responsible for interpreting recordMetadata and create/update the respective metric in MetricRegistry.
- Set<MetricType> metricsToRecord: This field will allow us to switch on/off the metric recording of each type. This set will be consulted for the presence of metric type whenever record request is received. If metricType is available in the set then only record is entertained.
- MetricRecorderAspect: This will be an @Aspectj annotated class which will have two around advice each for latency and count recording.
- MetricRecorderUtils: This will be a static class which provide a static method to record a metric. This method can be used at places where @Countrecorder does not serve the purpose.
- MetricsConfiguration: This will be a spring based configuration class to create required beans for metric recording. Following beans will be created by this class:
- MetricRegistry:
- SignalFxReporter: This bean will be created only with prod profile. Following will be the settings:
- detailsToAdd: a field of SignalFxRecorder consulted before publishing metrics to signal Fx. We will add MEDIAN and RATE_5_MIN metric of Timer metrics.
- authToken: secret token to connect to signalFx. This will be configurable with property key signalFx.auth.token
- reportingInterval: Reporting interval for SignalFx reporting scheduler. This will be configurable with property key metric.reporting.interval.in.mins
- Slf4jReporter: This bean will be created with any non-prod profile. Following will be the settings:
- reportingInterval: Reporting interval for Slf4j reporting scheduler. This will be configurable with property key metric.reporting.interval.in.mins
- MetricRecorderDefault: This bean will be created with the following settings:
- metricsToBeRecorded: Configurable field to specify the metricType to be recorded. Property key for this will be metrics.to.be.recorded
- timerMetricWindowSizeInMins: Configurable field for timer metric slidingWindow reservoir size. Property key for this will be timer.metric.window.size.in.mins
- MetricRecorderAspect