Skip to content

Document monitoring, metrics, tracing, observability and alertingΒ #4797

@AgaDufrat

Description

@AgaDufrat

As an engineer on GOV.UK,
I want to know how to configure monitoring, metrics, tracing, observability and alerting for my applications,
so that we can enable proactive detection and resolution of issues, ensure optimal performance and enhance reliability by providing real time insight into applications health and behaviour, as well as to inform product decisions.

Current documentation

Logging

How logging works on GOV.UK
Request tracing

Monitoring

Debug underperforming search - I've asked Search team to review and probably remove this
How we handle errors
Pingdom
Sentry

Alerting

Pingdom Bouncer canary check
Router error ratio too high
Travel Advice or Drug and Medical Device email alerts not sent
Signon API user token expires soon
PagerDuty
Things that may contact on-call - I suggest the specifics get taken out of here and instead link to the relevant pages

[WIP] Missing documentation

[WIP] Documentation that could do with a refresh

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions