Logging¶
Overview¶
We use Elasticsearch and Kibana for logging. Elasticsearch is the database storing the logs, and Kibana provides a query interface. They are hosted and managed in Elastic Cloud.
Many logs are emitted automatically by the platform code. They are more or less similar to the "Trace" concept from OpenTelemetry and contain similar amount of information.
Writing to the log¶
- Think through what information is useful in the log, and also remember that lots of information is already logged automatically (DB, EIN etc), and does not need logging again.
- Inject a
Serilog.ILoggerin the constructor and call releveant methods.- Platform uses
ILoggerfrom Serilog. Make sure to not use interface inMicrosoft.Extensions.Logging!
- Platform uses
- Make sure to not use string interpolation. The log message string should be easily searchable to find all instances of the same type of log message. Variable arguments (like a user id or status code) should be passed as a separate argument.
Examples:
private readonly Serilog.ILogger _logger;
...
_logger.Information("Updated {@count} ACH users", count);
_logger.Warning("Can't find address for {@userId}", userId);
_logger.Warning(ex, "Failed to get coordinate for {@addressRequest}", addressRequest);
_logger.TechMail(ex, "Error generating Bar code");
Log structure¶
Common log fields¶
Most fields logged are self-explanatory and not listed here.
| Field | Description |
|---|---|
| ruid | Also known as Correlation ID. All logs made for a request is logged on the same ruid. |
| logLevel | See Log Levels |
| SpanId | (inside of the object field) |
| ParentSpanId | (inside of the object field) |
Log Levels¶
In addition to the traditional ones many logs uses custom log levels according to format {type: Event | Request}{direction: IN| OUT}{subtype: Request| Response| Result}. All loglevels are listed below:
| LogLevel | Explanation | Additional Explanation |
|---|---|---|
| EIN | Event INcoming | |
| EINR | Event INcoming Result | |
| EOUT | Event OUTgoing | |
| EOUTR | Event OUTgoing Result | Not currently used |
| RINRQ | Request INcoming ReQuest | |
| RINRP | Request INcoming ResPonse | |
| ROUTRQ | Request OUTgoing ReQuest | |
| ROUTRP | Request OUTgoing ResPonse | |
| DB | DataBase | |
| CACHEGET | CACHEGET | Retrieve value from shared cache (Redis) |
| CACHEPUT | CACHEPUT | Save value in shared cache (Redis) |
| TEKM | TechMail | Something is wrong. These are monitored and alerted on for fairly low levels of logs. Issues that a developer should investigate. Often issues that an engineer can improve (i.e. a bug in the code, bad infra etc.), but can also be issues with third parties or other parts like the apps. |
| ERR | ERRor | Something is wrong. These are monitored and alerted on for medium amount of logs. Could be issues that are bad, but part of "normal" operations. Maybe they cannot be fixed. No need for a developer to monitor or investigate these on a daily basis, but should be reviewed during for example deploys or when reaching high amounts. |
| WARN | WARNing | A bad state, failed operation etc that is in itself not very serious and can be recovered. But over time, e.g. trending upwards, or massive spikes, these are an indication of a problem. Is probably helpful when troubleshooting issues. These are typically added manually in the code, and normally not emitted by platform code. It is a good idea to monitor these while deploying. |
| INFO | INFOrmation | For informational data. Things that can be helpful when troubleshooting, but is in itself no indication of a problem or issue. |
| DBG | DeBuG | For pure debugging purposes. Note that by default this log level is ignored. |
| CRIT | CRITical | Use TEKM instead. Should not be used. |
Elastic index setup¶
All areas log to a specific "index" in Elasticsearch. An index can be thought of as a traditional database table. There are two indexes used, log_retained and log_temp. Further, these indexes are split by date: log_retained_
- log_temp_
: Used for all dev and stage logging, and some unimportant production logging. - log_retained_
: Used for production logging excl some specific unimportant logs.
Each service will log (using the shared logging platform code) to these based on the current utc date, e.g. on 2023-01-01 the logs will be written to log_retained_230101 and so on. The indexes have different life cycle policies defined, so that the data in the log_retained indexes are kept longer than the log_temp indexes. The indexes have dynamic mapping, meaning they accept all kinds of data and will dynamically map it to fields that can be queried in Kibana.