HiTakeJobHiTakeJob

Senior SRE Engineer - Gong.io

  • חברה: Gong.io
  • מיקום: Tel Aviv District, Israel
  • טכנולוגיות: AWS, CI/CD tools, Kubernetes

תיאור המשרה

Design, build, and maintain scalable, fault-tolerant systems. Define and enforce reliability processes, SLOs, SLIs, and SLAs. Lead complex incident responses, including on-call rotations and postmortems. Challenges related to observability, testing, production stability, and development productivity. Reliability improvements through data-driven decisions. Complex production incidents. Build automation, tooling, and self-service capabilities. Collaborate with engineering, product, and support teams to embed reliability into everything we do. Mentor engineers and promote operational excellence across the organization. You have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments. You have a deep understanding of distributed systems, failure modes, resiliency patterns, observability, and operating large-scale production services running on Kubernetes. You are hands-on with building and owning monitoring tools. You are experienced with CI/CD tools. You are proficient with infrastructure-as-code tools. You have solid experience with cloud platforms (AWS preferred). Advantage: Experience with Java.

תחומי אחריות

Design, build, and maintain scalable, fault-tolerant systems. Define and enforce reliability processes, SLOs, SLIs, and SLAs. Lead complex incident responses, including on-call rotations and postmortems. Challenges related to observability, testing, production stability, and development productivity. Reliability improvements through data-driven decisions. Complex production incidents. Build automation, tooling, and self-service capabilities. Collaborate with engineering, product, and support teams to embed reliability into everything we do. Mentor engineers and promote operational excellence across the organization. You have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments. You have a deep understanding of distributed systems, failure modes, resiliency patterns, observability, and operating large-scale production services running on Kubernetes. You are hands-on with building and owning monitoring tools. You are experienced with CI/CD tools. You are proficient with infrastructure-as-code tools. You have solid experience with cloud platforms (AWS preferred). Advantage: Experience with Java.

דרישות

Design, build, and maintain scalable, fault-tolerant systems. Define and enforce reliability processes, SLOs, SLIs, and SLAs. Lead complex incident responses, including on-call rotations and postmortems. Challenges related to observability, testing, production stability, and development productivity. Reliability improvements through data-driven decisions. Complex production incidents. Build automation, tooling, and self-service capabilities. Collaborate with engineering, product, and support teams to embed reliability into everything we do. Mentor engineers and promote operational excellence across the organization. You have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments. You have a deep understanding of distributed systems, failure modes, resiliency patterns, observability, and operating large-scale production services running on Kubernetes. You are hands-on with building and owning monitoring tools. You are experienced with CI/CD tools. You are proficient with infrastructure-as-code tools. You have solid experience with cloud platforms (AWS preferred). Advantage: Experience with Java.