Senior AI Infrastructure & Platform Engineer - Riyadh,KSA
DeepSource Technologies
Date: 5 hours ago
City: Riyadh
Contract type: Full time
Role Overview
We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.
Key Responsibilities
- Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
- Manage and operate GPU orchestration tools and platforms such as:
- Nvidia Base Command Manager (critical)
- Nvidia AI Enterprise Suite
- Nvidia GPU and Network Operators
- Nvidia NIMs and Blueprints
- Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
- Slurm (critical)
- Vanilla Kubernetes
- Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
- Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
- Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
- Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
- Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.
Requirements
Required Skills & Experience
- Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
- Hands-on experience with:
- Nvidia Base Command Manager
- Nvidia AI Enterprise Suite
- Nvidia GPU/Network Operators, NIMs, Blueprints
- Strong experience with Slurm and/or Kubernetes orchestration.
- Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
- Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
- Excellent troubleshooting and performance-tuning skills.
- Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
- Strong understanding of networking, security, resource allocation, and cluster management best practices.
Preferred Qualifications
- Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
- Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
- Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
- Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Senior Specialist - SDO
Qiddiya Investment Company,
Riyadh
5 hours ago
Qiddiya Investment Company is seeking a highly motivated Senior Specialist to join our Strategy Development Office (SDO). In this role, you will be responsible for supporting strategic initiatives, driving key projects, and delivering actionable insights that contribute to the successful execution of Qiddiya’s vision and business objectives. You will work closely with senior leadership and cross-functional teams to analyze market...
Senior Document Controller (Saudi National)
Parsons Corporation,
Riyadh
18 hours ago
In a world of possibilities, pursue one with endless opportunities. Imagine Next!At Parsons, you can imagine a career where you thrive, work with exceptional people, and be yourself. Guided by our leadership vision of valuing people, embracing agility, and fostering growth, we cultivate an innovative culture that empowers you to achieve your full potential. Unleash your talent and redefine what’s...
Head Of Operations
Sanabil Studio,
Riyadh
20 hours ago
Sanabil Studio is a venture builder that blends industry expertise, technology, and startup know-how to turn bold ideas into real businesses. We take concepts from zero to launch, and when a venture proves its potential, we back it with seed funding and hands-on support to help it scale.TradePay is one of those ventures building the trade finance infrastructure for FMCG...