Skip to content

Arch Forum 2023-10-05

Participants: JD, Liangxiong, Shakib, Thani, Victor, Zak

Agenda

  • Concurrency Issue
  • RabbitMQ vs ServiceBus

Notes

Concurrency Issue: Recently we had an issue in Remittance, where a user did two remittances at the same time, most likely because of an App bug. We could agree that each area is responsible to handle duplicates in a (for the area) reasonable way. At the same time, it is a problem that several areas face, for example MPay, Remittance and Wallet. Therefor, it would be good, if possible, to have a generic way to handle this. Ideal could be an Attribute from platform that could be added to API contracts. A discussion about solutions followed, but no clear outcome/win. We could however conclude that while it does happen from time to time, it's not a huge problem. For example, in Remittance, the internal state is still consistent even when multiple remittances are done at the same time. Therefor not a high prio to fix. (Apps should however fix their issue to minimize the problem. A ticket already exists for this.)

RabbitMQ vs ServiceBus: Finally time to end the discussions about Azure Service Bus vs RabbitMQ. Last time, we looked at performance numbers of Service Bus (it takes 30-40ms from a message is sent on Service Bus until some other service can receive it). After that we have also got some numbers on RabbitMQ which shows < 1ms latency. With this information, it does not seem like Service Bus is suitable for the kind of event driven architecture we have. For example, with the overhead of Service bus means it will take at least 100ms from us to receive a transaction from Galileo until an area can receive a TransactionCompletedEvent for the transaction (30-40ms latency times 3 events). And this is without any processing time in Galileo or Transaction services!

Given this and our previous discussions, we decided that we should migrate over to RabbitMQ everywhere.

Additional notes regarding RabbitMQ vs ServiceBus:
- We should, if possible, use a single messaging technology.
- Service Bus is not cheap, the cost about the same as all the VMs in the K8s cluster.
- RabbitMQ is quite cheap to run, it uses only a minor fraction of all power of the K8s cluster.
- If maintaining RabbitMQ is too much, we could consider a hosted third party version. However, we seem to handle the cluster quite well.
- Rebtel uses RabbitMQ with much higher load than we do, and manages fine.
- RabbitMQ is an old and very mature and stable technology.
- No other options have been evaluated in depth. However, some light evaluation below:
- Kafka is much more complex to host than RabbitMQ.
- Kafka has been mentioned several times, but not with any compelling arguments.
- Our needs match better with RabbitMQs features than Kafkas (or kafka-like) features.
- NATS is an interesting option, but we lack experience, and it's not clear why it would be better than RabbitMQ.
- RabbitMQ is a much more established technology than NATS.
- If RabbitMQ can handle our use case well, then why "go over the bridge to get water"?