Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

DeepSource Technologies


Date: 2 weeks ago
City: Riyadh
Contract type: Full time

Role Overview

We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.

Key Responsibilities

  • Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
  • Manage and operate GPU orchestration tools and platforms such as:
    • Nvidia Base Command Manager (critical)
    • Nvidia AI Enterprise Suite
    • Nvidia GPU and Network Operators
    • Nvidia NIMs and Blueprints
  • Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
    • Slurm (critical)
    • Vanilla Kubernetes
  • Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
  • Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
  • Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
  • Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
  • Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.

Requirements

Required Skills & Experience

  • Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
  • Hands-on experience with:
    • Nvidia Base Command Manager
    • Nvidia AI Enterprise Suite
    • Nvidia GPU/Network Operators, NIMs, Blueprints
  • Strong experience with Slurm and/or Kubernetes orchestration.
  • Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
  • Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
  • Excellent troubleshooting and performance-tuning skills.
  • Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
  • Strong understanding of networking, security, resource allocation, and cluster management best practices.

Preferred Qualifications

  • Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
  • Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
  • Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
  • Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Operation Coordinator - BLS - Riyadh - KSA

TÜV SÜD, Riyadh
1 day ago
Apply now Operation Coordinator - BLS - Riyadh - KSA At TUV SUD we are passionate about technology. Innovations impact our daily lives in countless ways, and we are dedicated to being a part of that progress. We test, we audit, we inspect, we advise. We never stop challenging ourselves for the safety of society and its people. We breathe...

Senior | Audit | Audit & Assurance | KSA - Saudi National

Deloitte, Riyadh
1 day ago
About Deloitte: When you work for us, you commit to a career at one of the largest and most prestigious professional services firms in the world. We have received numerous awards over the last few years, including Best Employer in the Middle East, and Best Consulting Firm, and the Middle East Training & Development Excellence Award.Our PurposeDeloitte makes an impact...

Assistant Housekeeping Manager

Rotana Hotels, Riyadh
1 day ago
Job DescriptionSummary-The Assistant Housekeeping Manager will assist the Housekeeping Manager in overseeing the cleanliness and maintenance of all areas within a facility. They will ensure that cleaning standards are met, schedules are followed, and supplies are stocked.Job Responsibility- Assist the Housekeeping Manager in developing and implementing cleaning procedures and standards Supervise housekeeping staff and ensure they are performing their tasks...