Arch Forum 2024-05-02¶

Participants: Backend devs, EMs, Andy and Victor

Agenda¶

The DNT tool
0 Deadletters

Notes¶

The DNT Tool: Victor demoed how, with the help of the DNT tool, it's possible to swap out package references (nugets) to project references. By doing this, it's easy to run a service locally using the locally checked out platform code instead of whatever platform Nugets are imported. This is often useful to do when there's an issue in platform code, or you want to try new not yet committed platform code in a different service...

The basic steps are
1. Create a switcher.json file and fill it in with the referenced platform libraries
2. Run dnt switch-to-projects
3. Done!

0 Deadletters: A discussion around the "0 deadletter vision" we already have started on. Several subtopics were discussed.

The basic steps when deadletters happens are:
1. Investigate why it happened.
2. Try to fix the underlying issue
3. It's important to also propagate any state, i.e. by re-queueing the deadletter to let the handlers process it again.
4. If the deadletter is not resent, it should be removed.

Idempotency:
- In the ideal world, every event handler should be idempotent, meaning it can handle if an event is resent. (We are not there yet, but this is the ideal)
- We should think about and try hard to implement idempotency for new code.
- Not everything is idempotent today.
- In some cases, replaying events does not make sense. E.g. a message to send push because of good FX rate is likely outdated and should not be processed after a short time (like 15 minutes).
- It's unknown if DW handles duplicate events.

Two questions were asked by Victor:
Should we have a single deadletter queue instead of one per topic? If we want to change this, the time is now, since it will be much more work once RabbitMQ migration is done. However, we could not see any compelling reasons to change to a single deadletter queue. It could be easier to resend (or remove) all deadletters, but given our current state it's unlikely we want to do that. It's more important that its easy to attribute the deadletter to a single consumer/area, both for troubleshooting and for cleaning up. We also don't foresee a huge amount of deadletters.

Should/could we have a backoff for event retry? Yes, this seems like a good idea. It would prevent some deadletters when for example a third party is down for a short period. It should be reasonably easy to implement. We just have to be aware of this backoff time interacts with the backoff time Polly already does.