Senior SRE Engineer - Gong.io
- חברה: Gong.io
- מיקום: Tel Aviv District, Israel
- טכנולוגיות: AWS, CI/CD tools, Kubernetes
תיאור המשרה
Design, build, and maintain scalable, fault-tolerant systems.
Define and enforce reliability processes, SLOs, SLIs, and SLAs.
Lead complex incident responses, including on-call rotations and postmortems.
Challenges related to observability, testing, production stability, and development productivity.
Reliability improvements through data-driven decisions.
Complex production incidents.
Build automation, tooling, and self-service capabilities.
Collaborate with engineering, product, and support teams to embed reliability into everything we do.
Mentor engineers and promote operational excellence across the organization.
You have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments.
You have a deep understanding of distributed systems, failure modes, resiliency patterns, observability, and operating large-scale production services running on Kubernetes.
You are hands-on with building and owning monitoring tools.
You are experienced with CI/CD tools.
You are proficient with infrastructure-as-code tools.
You have solid experience with cloud platforms (AWS preferred).
Advantage: Experience with Java.
תחומי אחריות
Design, build, and maintain scalable, fault-tolerant systems.
Define and enforce reliability processes, SLOs, SLIs, and SLAs.
Lead complex incident responses, including on-call rotations and postmortems.
Challenges related to observability, testing, production stability, and development productivity.
Reliability improvements through data-driven decisions.
Complex production incidents.
Build automation, tooling, and self-service capabilities.
Collaborate with engineering, product, and support teams to embed reliability into everything we do.
Mentor engineers and promote operational excellence across the organization.
You have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments.
You have a deep understanding of distributed systems, failure modes, resiliency patterns, observability, and operating large-scale production services running on Kubernetes.
You are hands-on with building and owning monitoring tools.
You are experienced with CI/CD tools.
You are proficient with infrastructure-as-code tools.
You have solid experience with cloud platforms (AWS preferred).
Advantage: Experience with Java.
דרישות
Design, build, and maintain scalable, fault-tolerant systems.
Define and enforce reliability processes, SLOs, SLIs, and SLAs.
Lead complex incident responses, including on-call rotations and postmortems.
Challenges related to observability, testing, production stability, and development productivity.
Reliability improvements through data-driven decisions.
Complex production incidents.
Build automation, tooling, and self-service capabilities.
Collaborate with engineering, product, and support teams to embed reliability into everything we do.
Mentor engineers and promote operational excellence across the organization.
You have 7+ years of experience in SRE, DevOps, or Production Engineering roles, ideally in SaaS environments.
You have a deep understanding of distributed systems, failure modes, resiliency patterns, observability, and operating large-scale production services running on Kubernetes.
You are hands-on with building and owning monitoring tools.
You are experienced with CI/CD tools.
You are proficient with infrastructure-as-code tools.
You have solid experience with cloud platforms (AWS preferred).
Advantage: Experience with Java.