Skybitra logo Skybitra
Internship

SRE Intern

New York City / Remote

Description

What you will learn

Learn how SREs keep services reliable through observability, SLOs, and disciplined incident response.

You’ll help instrument apps, keep runbooks current, and practice structured on-call handoffs with mentors.

Responsibilities

  • Assist with dashboards, alerts, and log queries for critical services
  • Keep incident timelines, action items, and runbooks accurate and discoverable
  • Shadow on-call, help triage noise vs. actionable alerts, and document learnings
  • Support SLO/error-budget reviews by gathering metrics and simple charts
  • Help test failover or game-day drills alongside senior SREs
Requirements

What makes you a fit

Interest in systems, observability, and a calm, methodical approach to problems.

  • Basic scripting (Python/Bash) and Linux fundamentals
  • Understands why metrics, logs, and traces matter for debugging
  • Comfort writing clear notes and summarizing complex situations
  • Based in New York City or remote.

Nice to have

  • Used Prometheus/Grafana, Datadog, New Relic, or similar
  • Experimented with chaos or failure-injection tutorials locally
  • Exposure to Kubernetes basics (pods, services, events)
  • Understands incident roles (commander, comms, ops) and handoff basics
  • Curiosity about reliability patterns like circuit breakers and backoff

Apply