HiTakeJobHiTakeJob

Senior Software Developer: Models Team (Token Factory) - Nebius

  • חברה: Nebius
  • מיקום: Amsterdam, Netherlands
  • טכנולוגיות: Python and/or Go programming skills, Deep understanding of Kubernetes

תיאור המשרה

Cache-aware routing NUMA-aware deployments KV-cache offloading Disaggregated serving architectures Autoscaling with high-speed model loading over InfiniBand / RoCE Performance, quality, and smoke-testing frameworks Hyperparameter optimization for inference framework configurations Gibberish detection systems Automated rollout pipelines for inference framework upgrades Diagnostics and observability tooling Traffic replay systems Automated search for optimal serverless deployment configurations Experience serving LLMs in production Strong Python and/or Go programming skills Experience designing and operating highly scalable, highly available distributed services Contributions to vLLM, SGLang, TRT-LLM, or NVIDIA ecosystem open-source projects Deep understanding of KV cache management, speculative decoding, and quantization Experience with LLM evaluation frameworks Hands-on experience with performance benchmarking and optimization Deep understanding of Kubernetes Familiarity with distributed serving architectures and autoscaling Knowledge of InfiniBand, RoCE, or high-performance networking Competitive compensation Career growth and learning opportunities Flexibility and ownership Collaborative and innovative culture Opportunity to work on impactful AI projects International environment and talented teams

תחומי אחריות

Cache-aware routing NUMA-aware deployments KV-cache offloading Disaggregated serving architectures Autoscaling with high-speed model loading over InfiniBand / RoCE Performance, quality, and smoke-testing frameworks Hyperparameter optimization for inference framework configurations Gibberish detection systems Automated rollout pipelines for inference framework upgrades Diagnostics and observability tooling Traffic replay systems Automated search for optimal serverless deployment configurations Experience serving LLMs in production Strong Python and/or Go programming skills Experience designing and operating highly scalable, highly available distributed services Contributions to vLLM, SGLang, TRT-LLM, or NVIDIA ecosystem open-source projects Deep understanding of KV cache management, speculative decoding, and quantization Experience with LLM evaluation frameworks Hands-on experience with performance benchmarking and optimization Deep understanding of Kubernetes Familiarity with distributed serving architectures and autoscaling Knowledge of InfiniBand, RoCE, or high-performance networking Competitive compensation Career growth and learning opportunities Flexibility and ownership Collaborative and innovative culture Opportunity to work on impactful AI projects International environment and talented teams

דרישות

Cache-aware routing NUMA-aware deployments KV-cache offloading Disaggregated serving architectures Autoscaling with high-speed model loading over InfiniBand / RoCE Performance, quality, and smoke-testing frameworks Hyperparameter optimization for inference framework configurations Gibberish detection systems Automated rollout pipelines for inference framework upgrades Diagnostics and observability tooling Traffic replay systems Automated search for optimal serverless deployment configurations Experience serving LLMs in production Strong Python and/or Go programming skills Experience designing and operating highly scalable, highly available distributed services Contributions to vLLM, SGLang, TRT-LLM, or NVIDIA ecosystem open-source projects Deep understanding of KV cache management, speculative decoding, and quantization Experience with LLM evaluation frameworks Hands-on experience with performance benchmarking and optimization Deep understanding of Kubernetes Familiarity with distributed serving architectures and autoscaling Knowledge of InfiniBand, RoCE, or high-performance networking Competitive compensation Career growth and learning opportunities Flexibility and ownership Collaborative and innovative culture Opportunity to work on impactful AI projects International environment and talented teams