HiTakeJobHiTakeJob

Senior DevOps Engineer Platform Engineering - DriveNets

  • חברה: DriveNets
  • מיקום: Tel Aviv
  • סוג עבודה: Hybrid
  • טכנולוגיות: Python, GitHub Actions, Kubernetes, Terraform, Helm, Docker, AWS

תיאור המשרה

About the Company

DriveNets is a leader in large-scale networking solutions for AI infrastructure and service providers. The company's disaggregated networking architecture transforms the economics of large-scale infrastructures while maximizing performance, utilization, and operational efficiency. Its high-performance AI fabric maximizes GPU utilization and accelerates deployments by optimizing the AI stack end-to-end, resulting in higher tokens-per-second and lower cost-per-token. DriveNets' solutions power production networks for global tier-1 operators like AT&T and Comcast, and scale multi-vendor AI infrastructures at foundation model labs, NeoClouds, and enterprises.


Responsibilities

- Design, build, and operate the internal engineering platform powering DriveNets' build, test, deployment, and security validation workflows at scale

- Write and maintain production-grade Python and shell tooling that drives platform automation — this is a hands-on coding role, not just pipeline configuration

- Architect and manage hybrid cloud/on-prem execution infrastructure, including large-scale Kubernetes runner pools across multiple AWS regions

- Own and evolve CI/CD pipelines at scale using GitHub Actions, including reusable workflows, ARC-based runner orchestration, and build caching strategies (BuildKit, sccache, Valkey)

- Operate and tune DinD environments (Sysbox, EBS/NVMe, overlay storage, MTU/networking) for build, test, and release workloads

- Connect and manage self-hosted and on-prem runners, routing physical device (wbox) test jobs by site and device type

- Implement DevSecOps controls including least-privilege IAM, OIDC, isolated runner groups, container signing, and automated security scans

- Drive platform observability, cost optimization, and reliability improvements across the engineering infrastructure

- Collaborate cross-functionally with hundreds of engineers to improve engineering velocity and release confidence

- Take end-to-end ownership of complex infrastructure problems and drive them to resolution

דרישות

Technical Skills

- 5+ years of hands-on DevOps experience with a strong software development background — prior development experience is a must

- B.Sc. in Computer Science or equivalent practical experience

- Strong programming skills in Python (or a similar high-level language); ability to write and own production tooling

- Proven experience designing and building scalable systems, automation frameworks, and infrastructure as code using Terraform and Helm

- Solid understanding of Linux, containers (Docker), and Git-based workflows

- Hands-on experience with CI/CD at scale using GitHub Actions or similar — including reusable actions, workflow design, and automation frameworks

- Deep experience with hybrid cloud infrastructure (AWS and on-prem), including EKS, ARC, Karpenter, ECR, S3, Direct Connect, VPC endpoints, IAM/OIDC, and Secrets Manager

- Experience operating spot and on-demand runner pools for builds, DinD tests, releases, and security scans across multiple AWS regions

- Experience with DinD environments (Sysbox, EBS/NVMe, memory limits, overlay storage, MTU/networking) and build caching (BuildKit, sccache, Valkey)

- Experience connecting on-prem/self-hosted runners and routing physical device (wbox) test jobs by site and device type

- Experience implementing DevSecOps controls and improving platform observability, cost efficiency, and reliability

- Platform & tooling familiarity: Kubernetes (EKS, on-prem) · GitHub Actions · ARC · Karpenter · Terraform · Helm · Docker/DinD · Sysbox · containerd · BuildKit · ECR · S3 · ElastiCache (Valkey) · sccache · Direct Connect · VPC endpoints · IAM/OIDC · Secrets Manager · self-hosted runners


Soft Skills

- Strong system-level thinking and troubleshooting skills; able to diagnose and resolve complex infrastructure issues independently

- Takes end-to-end ownership and drives problems to resolution without hand-holding

- Excellent communication and cross-team collaboration skills; comfortable working alongside large engineering organizations


Nice to Have / Advantage

- Experience with Jenkins

- Familiarity with GitHub merge queue

- Experience with MinIO or on-prem S3 caching

- Hardware-in-the-loop CI experience

- MTU/VPC networking tuning expertise

- Monorepo CI optimization experience