Kubernetes Orchestration Engineer – GPU Hypercomputing & AI Workloads

NETS-International Group


Date: 2 weeks ago
City: Riyadh
Contract type: Contractor
Riyadh, Saudi Arabia

contractual

Company Description

NETS is a leading global Solutions Provider and Systems Integrator dedicated empowering the future through our integrated approach and commitment to delivering Innovative, Intelligent, and Integrated Solutions (NETS 3 I’s) Effectively, Efficiently, and Economically (NETS 3 E’s). Our service portfolio covers 3 verticals namely Infrastructure, Digital, and Managed Solutions, and NETS Services include Access Networks (Fixed and Wireless), Enterprise Data Networks, Cloud Solutions, Cyber Security, Automation, Resource Outsourcing, and Managed Services. NETS brings over 4 decades of proven domain expertise, service specialization, and industry leadership, delivering over 3,000+ successful projects. Our 1,000+ highly skilled & professional staff, collaboration with over 50 leading global technology partners, 100+ NETS OEM Partners, and NETS Reach, with offices in the UK, UAE, USA, Saudi Arabia, and Pakistan, has allowed us to be the preferred trusted partner to over 200 long-standing satisfied customers including fortune 500 companies across 25+ countries.

Job Description

Role Overview:

We are seeking a highly skilled Kubernetes Orchestration Engineer to lead the deployment and management of GPU-optimized Kubernetes environments that power AI/ML and hypercomputing workloads. This role is critical to ensuring scalable, reliable, and high-performance infrastructure across on-premises and hybrid cloud environments.

As a core member of our infrastructure engineering team, you will work at the intersection of container orchestration, GPU resource management, and AI application scaling, enabling large-scale distributed training and inference across GPU clusters.

Key Responsibilities

  • Deploy and manage production-grade Kubernetes clusters tailored for GPU-intensive workloads in AI/ML environments.
  • Build and maintain GPU node pools using NVIDIA Device Plugin, CRI-O, or container runtimes supporting GPU scheduling.
  • Orchestrate containerized distributed model training using Kubernetes and frameworks like PyTorch, TensorFlow, or Hugging Face.
  • Design and implement Kubernetes Operators to automate scaling, health monitoring, and lifecycle management of AI/ML services.
  • Use Helm charts to deploy and scale GPU-accelerated applications across clusters.
  • Integrate observability tools such as Prometheus, Grafana, and Kibana to monitor GPU, memory, and pod resource utilization.
  • Configure and troubleshoot CNI plugins (e.g., Calico, Flannel, Cilium) to ensure high-performance pod networking.
  • Collaborate with DevOps, AI, and infrastructure teams to support hybrid and multi-cloud Kubernetes deployments.

Requirements

Required Skills & Expertise:

  • Strong experience with Kubernetes (K8s) and container orchestration in production environments.
  • Expertise in managing GPU workloads in Kubernetes using NVIDIA GPU Operator, vGPU, and device plugin configurations.
  • Proficiency with container runtimes such as Docker and CRI-O, and orchestration tools like Helm and Kubernetes Operators.
  • Solid understanding of networking within Kubernetes and service mesh integration (e.g., Istio, Linkerd).
  • Familiarity with hybrid/multi-cloud Kubernetes platforms (e.g., GKE, EKS, AKS).
  • Strong scripting and automation skills (e.g., YAML, Helm templating, Bash, Python).

Preferred Certifications

  • Certified Kubernetes Administrator (CKA) – Required
  • Certified Kubernetes Application Developer (CKAD) – Preferred
  • NVIDIA Certified Kubernetes Specialist – Nice to have

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Solutions Engineer

Databricks, Riyadh
17 hours ago
REQ# FEQ326R244At Databricks, our core values are at the heart of everything we do. Our culture of proactiveness and a customer-centric mindset guides us to create a unified platform that makes data science and analytics accessible to everyone. We aim to inspire our customers to make informed decisions that push their business forward. We provide a user-friendly and intuitive platform...

Head of Seller Services, 3P SA Hardlines

Amazon, Riyadh
19 hours ago
DescriptionOwn one of the key growth engines for Amazon in Saudi Arabia. As the Head of Seller Services, you will lead a team that is responsible for growing Amazon’s selection and ensuring the success of brands available on Amazon across a portfolio of categories.To do this, you will need to identify our current selection opportunities, work with existing brand owners...

Cybersecurity Integration Specialist – KSA

Help AG, an e& enterprise company, Riyadh
21 hours ago
Help AG is looking for a talented and experienced Cybersecurity Integration Specialist who will be responsible for the secure design and seamless integration of cybersecurity technologies and solutions within critical electricity infrastructure. This role requires a strong understanding of OT (Operational Technology) and IT convergence, with a focus on safeguarding power generation, transmission, and distribution environments from cyber threats. The...