Skip to content

Data Warehouse Events

The backend both publishes events to, and receives events from, the Data Warehouse (DW).

The DW is built on BigQuery, is available in the BigQuery Console, and is managed by the Data team.

Guideline

Whenever a new event is created in the backend, evaluate whether it should also be forwarded to the DW:

  • The overhead cost of sending a new event to the DW is very small.
  • It's better to send more events than fewer. Therefore, default to sending them.

Sending Events to the DW

Events are sent to the DW over Amazon Kinesis. (The long-term plan is to migrate this to Google Pub/Sub.)

flowchart LR
    DW[Datawarehouse]@{ shape: database }
    Kinesis@{shape: horizontal-cylinder}
    Auditing
    RabbitMq@{shape: horizontal-cylinder}
    RabbitMq --> Auditing
    Auditing --> Kinesis
    Kinesis --> DW

To send an event, register a handler for the event with RegisterMetadataBasedMessageHandler in the AuditingServiceModule in Auditing, similar to the existing registrations.

Query Events from the DW

Events sent to the DW first end up in the kinesis.backend_events table:

You can query this table to look up any event sent to the DW, but often it's easier to look at one of the derived tables under the dbt_stage dataset if available. How the events are processed from the kinesis.backend_events table is specificed in the dt-dbt repository.

When querying, restrict the query using the partition key (event_ingested_at in the backend_events table, other tables might be partitioned differently). This makes the query faster and uses fewer resources, which reduces cost.

Receiving Events from the DW

Events are received from the DW over Google Pub/Sub. They are read by the Auditing area, and then published as standard events on RabbitMq. Any other area can subscribe to them like any other event.

Notes

  • This feature should be used with caution, in general backend should not depend on data from the data warehouse.
  • For an event to be forwarded to Risk, Auditing must be subscribed to it, like any other event.
flowchart LR
    DW[Datawarehouse]@{ shape: database }
    PubSub@{shape: horizontal-cylinder}
    Auditing
    RabbitMq@{shape: horizontal-cylinder}
    BE1[BE service 1]
    BE2[BE service 2]
    BE3[BE service n..]
    DW --> PubSub
    PubSub --> Auditing
    Auditing --> RabbitMq
    RabbitMq --> BE1
    RabbitMq --> BE2
    RabbitMq --> BE3

Any event sent from the DW should have a definition in be-bank-platform, under the Minority.Data.Domain.Contract/Events namespace.

Events should use the Data owner prefix. Remember to name them appropriately.

In Pub/Sub, the events are published on these subscriptions:

The subscription has a dead-letter topic (queue); in prod it is at projects/event-hub-dev-4fyee2/topics/events-backend-dlq-topic. If a dead-letter occurs, a notification will be sent to the Slack channel #dt-event-hub-dlq.