Senior AI Infrastructure & Platform Engineer - Riyadh,KSA
DeepSource Technologies
Date: 2 weeks ago
City: Riyadh
Contract type: Full time
Role Overview
We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.
Key Responsibilities
- Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
- Manage and operate GPU orchestration tools and platforms such as:
- Nvidia Base Command Manager (critical)
- Nvidia AI Enterprise Suite
- Nvidia GPU and Network Operators
- Nvidia NIMs and Blueprints
- Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
- Slurm (critical)
- Vanilla Kubernetes
- Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
- Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
- Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
- Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
- Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.
Requirements
Required Skills & Experience
- Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
- Hands-on experience with:
- Nvidia Base Command Manager
- Nvidia AI Enterprise Suite
- Nvidia GPU/Network Operators, NIMs, Blueprints
- Strong experience with Slurm and/or Kubernetes orchestration.
- Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
- Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
- Excellent troubleshooting and performance-tuning skills.
- Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
- Strong understanding of networking, security, resource allocation, and cluster management best practices.
Preferred Qualifications
- Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
- Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
- Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
- Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Operation Coordinator - BLS - Riyadh - KSA
TÜV SÜD,
Riyadh
1 day ago
Apply now
Operation Coordinator - BLS - Riyadh - KSA
At TUV SUD we are passionate about technology. Innovations impact our daily lives in countless ways, and we are dedicated to being a part of that progress. We test, we audit, we inspect, we advise. We never stop challenging ourselves for the safety of society and its people. We breathe...
Senior | Audit | Audit & Assurance | KSA - Saudi National
Deloitte,
Riyadh
1 day ago
About Deloitte: When you work for us, you commit to a career at one of the largest and most prestigious professional services firms in the world. We have received numerous awards over the last few years, including Best Employer in the Middle East, and Best Consulting Firm, and the Middle East Training & Development Excellence Award.Our PurposeDeloitte makes an impact...
Assistant Housekeeping Manager
Rotana Hotels,
Riyadh
1 day ago
Job DescriptionSummary-The Assistant Housekeeping Manager will assist the Housekeeping Manager in overseeing the cleanliness and maintenance of all areas within a facility. They will ensure that cleaning standards are met, schedules are followed, and supplies are stocked.Job Responsibility- Assist the Housekeeping Manager in developing and implementing cleaning procedures and standards Supervise housekeeping staff and ensure they are performing their tasks...