Arch Forum 2025-04-17¶
Participants: Backend devs & Victor
Agenda¶
- Observability 2.0 part2
* Loglevels discussion
* Observability improvements prioritization
Summary¶
Loglevels¶
The log level hierarchy (INFO, WARN, ERR and TEKM) were discussed. It became clear that we need a stricter definition. Additionally, the log level goes hand in hand with alerting.
Observability improvements prio¶
We had a walkthrough of the issue identified in the previous Arch forum, and then ended with a vote of the issues the devs think are worth spending time on (3 votes / dev).
The outcome can be seen below.
| Area | Topic | Votes |
|---|---|---|
| Logs | Raw logs of external req/resp | 4 |
| Logs | Ruid handling for bigger jobs | 3 |
| Metrics | Ops and Warning slack channels spammy and unclear | 3 |
| General | OpenTelemetry for everything | 3 |
| New devs | TEKM is not a standard log level | 3 |
| Query | Kibana limited/unfamiliar query lanaguge | 2 |
| New devs | Loglevel is two concepts in one (level and type) | 2 |
| Logs | Log Sanitizing | 1 |
| Logs | Serilog vs MS ILogger | 1 |
| Logs | Dont use string interpolation for logs | 1 |
| Logs | Increased log retention | 1 |
| Metrics | Custom metrics guidelines | 1 |
| Metrics | Datadog dashboards and monitors are messy | 1 |
| Logs | Span/Parentspans are cumbersome to use | |
| Logs | Fields not indexed in elastic | |
| Query | DW/BigQuery for troubleshooting/analysis | |
| Metrics | Trial Elastic/Kibana for more alerts | |
| General | Servicemap / application map for system overview | |
| General | Real tracing |
Original sheet used: https://docs.google.com/spreadsheets/d/1aFs_Ic1yI2lYD0Wk6hsOExOC0WAqbNeNv50JdKPg0-Y/edit?gid=0#gid=0