Skip to content

Logging

Overview

We use Elasticsearch and Kibana for logging. Elasticsearch is the database storing the logs, and Kibana provides a query interface. They are hosted and managed in Elastic Cloud.

Many logs are emitted automatically by the platform code. They are more or less similar to the "Trace" concept from OpenTelemetry and contain similar amount of information.

Writing to the log

  • Think through what information is useful in the log, and also remember that lots of information is already logged automatically (DB, EIN etc), and does not need logging again.
  • Inject a Serilog.ILogger in the constructor and call releveant methods.
    • Platform uses ILogger from Serilog. Make sure to not use interface in Microsoft.Extensions.Logging!
  • Make sure to not use string interpolation. The log message string should be easily searchable to find all instances of the same type of log message. Variable arguments (like a user id or status code) should be passed as a separate argument.

Examples:

private readonly Serilog.ILogger _logger;
...
_logger.Information("Updated {@count} ACH users", count);
_logger.Warning("Can't find address for {@userId}", userId);
_logger.Warning(ex, "Failed to get coordinate for  {@addressRequest}", addressRequest);
_logger.TechMail(ex, "Error generating Bar code");

Log structure

Common log fields

Most fields logged are self-explanatory and not listed here.

Field Description
ruid Also known as Correlation ID. All logs made for a request is logged on the same ruid.
logLevel See Log Levels
SpanId (inside of the object field)
ParentSpanId (inside of the object field)

Log Levels

In addition to the traditional ones many logs uses custom log levels according to format {type: Event | Request}{direction: IN| OUT}{subtype: Request| Response| Result}. All loglevels are listed below:

LogLevel Explanation Additional Explanation
EIN Event INcoming
EINR Event INcoming Result
EOUT Event OUTgoing
EOUTR Event OUTgoing Result Not currently used
RINRQ Request INcoming ReQuest
RINRP Request INcoming ResPonse
ROUTRQ Request OUTgoing ReQuest
ROUTRP Request OUTgoing ResPonse
DB DataBase
CACHEGET CACHEGET Retrieve value from shared cache (Redis)
CACHEPUT CACHEPUT Save value in shared cache (Redis)
TEKM TechMail Something is wrong. These are monitored and alerted on for fairly low levels of logs. Issues that a developer should investigate. Often issues that an engineer can improve (i.e. a bug in the code, bad infra etc.), but can also be issues with third parties or other parts like the apps.
ERR ERRor Something is wrong. These are monitored and alerted on for medium amount of logs. Could be issues that are bad, but part of "normal" operations. Maybe they cannot be fixed. No need for a developer to monitor or investigate these on a daily basis, but should be reviewed during for example deploys or when reaching high amounts.
WARN WARNing A bad state, failed operation etc that is in itself not very serious and can be recovered. But over time, e.g. trending upwards, or massive spikes, these are an indication of a problem. Is probably helpful when troubleshooting issues. These are typically added manually in the code, and normally not emitted by platform code. It is a good idea to monitor these while deploying.
INFO INFOrmation For informational data. Things that can be helpful when troubleshooting, but is in itself no indication of a problem or issue.
DBG DeBuG For pure debugging purposes. Note that by default this log level is ignored.
CRIT CRITical Use TEKM instead. Should not be used.

Elastic index setup

All areas log to a specific "index" in Elasticsearch. An index can be thought of as a traditional database table. There are two indexes used, log_retained and log_temp. Further, these indexes are split by date: log_retained_ and log_temp_.

  • log_temp_: Used for all dev and stage logging, and some unimportant production logging.
  • log_retained_: Used for production logging excl some specific unimportant logs.

Each service will log (using the shared logging platform code) to these based on the current utc date, e.g. on 2023-01-01 the logs will be written to log_retained_230101 and so on. The indexes have different life cycle policies defined, so that the data in the log_retained indexes are kept longer than the log_temp indexes. The indexes have dynamic mapping, meaning they accept all kinds of data and will dynamically map it to fields that can be queried in Kibana.