On-Call Duty
On-Call is a learning track, teaching us how to deal with incidents occurring outside of normal work hours.
Issues and incidents that arise outside of regular working hours in production environments on Gjensidige Application Platform will be handled by the On-Call Duty.
Mainly, the on-call engineer will respond to incidents triggered automatically by the platform.
On-Call schedule
Day | Time (GMT+1) |
---|---|
Weekdays | 07:00 - 08:00 & 16:00 - 23:00 |
Saturdays | 10:00 - 18:00 |
Sundays | 12:00 - 20:00 |
Public holidays | 12:00 - 20:00 |
Learning track
The On-Call Duty is currently in a testing stage as part of Gjensidige's Cloud Journey.
We're learning about our needs for readiness in Modern Zone. This includes
- resources
- competences
- tools
- processes
- governance
What happens next?
Employees in the Platform Team take turns on weekly duty. The response time should be no more than 60 minutes.
When an incident occurs
- start an incident using incident-bot (if affecting prod and critical severity the on-call engineer will be alerted)
- tag all GAP Champions that might be affected
- create an SN ticket
- troubleshoot and fix the issue
Incidents
The On-Call Duty is only responsible for incidents related to GAP. This does for instance not include applications running on AKS, Azure services provisioned by other teams or on-premise infrastructure.
Part of GAP
- Application Gateway
- Azure Kubernetes Service (AKS)
- Tools
- ArgoCD
- Linkerd
- Grafana
- Prometheus
- AlertManager
- CertManager
- NGINX Ingress Controller
- Application Logs from AKS to Splunk Cloud
- Application traces from AKS to Splunk Observability
Not part of GAP
- Applications running on AKS
- Azure resources/services provisioned by development teams/projects
- Databases
- Redis Cache
- Keyvault
- Storage Account
- Azure Firewall
- Express route
- On-premise infrastructure
- ISAM
- firewall
- Network