Senior Software Developer: Models Team (Token Factory) - Nebius
- חברה: Nebius
- מיקום: Amsterdam, Netherlands
- טכנולוגיות: Python and/or Go programming skills, Deep understanding of Kubernetes
תיאור המשרה
Cache-aware routing
NUMA-aware deployments
KV-cache offloading
Disaggregated serving architectures
Autoscaling with high-speed model loading over InfiniBand / RoCE
Performance, quality, and smoke-testing frameworks
Hyperparameter optimization for inference framework configurations
Gibberish detection systems
Automated rollout pipelines for inference framework upgrades
Diagnostics and observability tooling
Traffic replay systems
Automated search for optimal serverless deployment configurations
Experience serving LLMs in production
Strong Python and/or Go programming skills
Experience designing and operating highly scalable, highly available distributed services
Contributions to vLLM, SGLang, TRT-LLM, or NVIDIA ecosystem open-source projects
Deep understanding of KV cache management, speculative decoding, and quantization
Experience with LLM evaluation frameworks
Hands-on experience with performance benchmarking and optimization
Deep understanding of Kubernetes
Familiarity with distributed serving architectures and autoscaling
Knowledge of InfiniBand, RoCE, or high-performance networking
Competitive compensation
Career growth and learning opportunities
Flexibility and ownership
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams
תחומי אחריות
Cache-aware routing
NUMA-aware deployments
KV-cache offloading
Disaggregated serving architectures
Autoscaling with high-speed model loading over InfiniBand / RoCE
Performance, quality, and smoke-testing frameworks
Hyperparameter optimization for inference framework configurations
Gibberish detection systems
Automated rollout pipelines for inference framework upgrades
Diagnostics and observability tooling
Traffic replay systems
Automated search for optimal serverless deployment configurations
Experience serving LLMs in production
Strong Python and/or Go programming skills
Experience designing and operating highly scalable, highly available distributed services
Contributions to vLLM, SGLang, TRT-LLM, or NVIDIA ecosystem open-source projects
Deep understanding of KV cache management, speculative decoding, and quantization
Experience with LLM evaluation frameworks
Hands-on experience with performance benchmarking and optimization
Deep understanding of Kubernetes
Familiarity with distributed serving architectures and autoscaling
Knowledge of InfiniBand, RoCE, or high-performance networking
Competitive compensation
Career growth and learning opportunities
Flexibility and ownership
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams
דרישות
Cache-aware routing
NUMA-aware deployments
KV-cache offloading
Disaggregated serving architectures
Autoscaling with high-speed model loading over InfiniBand / RoCE
Performance, quality, and smoke-testing frameworks
Hyperparameter optimization for inference framework configurations
Gibberish detection systems
Automated rollout pipelines for inference framework upgrades
Diagnostics and observability tooling
Traffic replay systems
Automated search for optimal serverless deployment configurations
Experience serving LLMs in production
Strong Python and/or Go programming skills
Experience designing and operating highly scalable, highly available distributed services
Contributions to vLLM, SGLang, TRT-LLM, or NVIDIA ecosystem open-source projects
Deep understanding of KV cache management, speculative decoding, and quantization
Experience with LLM evaluation frameworks
Hands-on experience with performance benchmarking and optimization
Deep understanding of Kubernetes
Familiarity with distributed serving architectures and autoscaling
Knowledge of InfiniBand, RoCE, or high-performance networking
Competitive compensation
Career growth and learning opportunities
Flexibility and ownership
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams