At a glance
- Detection: monitoring and alerts surface reliability and security issues.
- Triage: we assess severity, scope, and customer impact.
- Containment: we limit blast radius and prevent further impact.
- Remediation: we address root cause and validate recovery.
- Communication: we notify customers when warranted by scope and impact.
What we consider an incident
An incident is an event that impacts confidentiality, integrity, or availability of the service. Examples include unauthorized access,
credential exposure, data integrity issues, or service outages that materially affect customers.
Detection and intake
We use operational monitoring and security-relevant logging to detect abnormal behavior and failures. Incidents may also be reported
by customers or external researchers.
- Operational alerts: error spikes, processing delays, and availability issues.
- Security signals: unusual authentication activity and privileged action patterns.
- External reports: vulnerability disclosures and customer-reported issues.
Triage
When an event is detected, we triage to determine severity, affected systems, and customer impact. We prioritize response based on
potential risk and breadth of impact.
- Severity assessment: impact to confidentiality, integrity, or availability.
- Scope: which components are affected (application, integrations, inference workloads).
- Customer impact: whether access, processing, or reporting is impaired.
Containment
We take steps to limit blast radius and prevent further impact. Containment measures depend on incident type and may include disabling
integrations, restricting access paths, rotating credentials, or isolating affected components.
- Access restriction: limit privileged access if needed.
- Credential actions: rotate or invalidate credentials where warranted.
- Component isolation: isolate malfunctioning or suspicious components.
Remediation and recovery
We remediate root cause and validate recovery. Recovery may include fixes, configuration changes, and verification through monitoring and
targeted testing.
- Root cause analysis: identify what happened and why.
- Fix and validate: deploy corrections and verify normal operation.
- Hardening: implement control improvements to reduce recurrence risk.
Customer communication
When warranted by scope and impact, we notify customers with actionable information. Notifications include what happened, what was affected,
what we did to remediate, and recommended customer actions (if any).
- Timeliness: communications are provided as appropriate to severity and confirmed facts.
- Actionability: recommended customer actions are included when applicable.
- Follow-up: we provide updates as remediation progresses.
Post-incident review
We review incidents to improve controls and reduce recurrence. This includes remediation tracking, monitoring improvements, and updates to
operational procedures where appropriate.