OCI GPU Specialist

Oracle


Date: 13 hours ago
City: Riyadh
Contract type: Full time
Job Description

Oracle is seeking a OCI GPU Black Belt to drive customer success in designing, deploying, and optimizing large-scale AI and HPC workloads on Oracle Cloud Infrastructure (OCI). This role combines deep technical expertise in NVIDIA GPUs, distributed training and inference frameworks, benchmarking and performance tuning, RLHF pipelines, and end-to-end solution delivery.

The OCI GPU Black Belt will work in close collaboration with our sales, marketing, and technical teams to drive revenue growth and accelerate market penetration for our NVIDIA GPU compute services.

The role will also play a critical role in progressing business opportunities, delivering technical workshops and demonstrations and supporting during proof-of-concepts to drive cloud consumption and overall revenue growth.

What You’ll Do

  • You will engage directly with the customer to understand their requirements for NVIDIA GPUs on Oracle Cloud to run their AI infrastructure, graphics and HPC workloads. With your understanding of the customers' requirements, you will build a comprehensive and effective solution design
  • You will lead the solution design within a highly collaborative virtual team that includes engineering/product management, capacity planning and external partners. With the team, you will map the requirements onto Oracle cloud services, define a solution design, and if requested by the customer, offer hands-on support during the proof-of-concept phase to deploy and test the proposed solution. After the PoC, you advise and guide the customer/partner on running and maintaining their NVDIA GPU workload on Oracle Cloud according to our best practices.
  • Deliver technical workshops, proofs-of-concept (PoCs), and demos, collaborating closely with sales, engineering, and customer teams to validate end-to-end solutions and accelerate cloud adoption.
  • Optimize end-to-end AI workloads by analyzing hardware bottlenecks (GPU utilization, memory bandwidth, network latency), applying NVLink/InfiniBand interconnects, RDMA storage solutions, and tuning parallel libraries (MPI, CUDA) for peak efficiency.
  • Deploy and scale HPC clusters for engineering, scientific, and financial simulations-configuring compute nodes, high-speed networking, and shared file systems to meet performance SLAs.
  • Lead the architecture and deployment of scalable inference platforms, leveraging containerized microservices on Kubernetes and OCI GPU instances to meet low-latency, high-throughput requirements.
  • Design and implement distributed training pipelines using frameworks such as DeepSpeed, and Fully Sharded Data Parallel (FSDP) to accelerate model convergence at scale.
  • Develop benchmarking and profiling solutions to measure training and inference performance, using mixed-precision, model- and data-parallel strategies, and generate actionable insights through dashboards and automated reports.
  • Guide customers in model selection and evaluation, comparing architectures (e.g., Transformers, CNNs) against workload requirements and resource constraints to optimize cost and performance.
  • Contribute to Oracle’s internal expert community, documenting best practices, co-authoring solution blueprints, and mentoring peers on AI infrastructure design.
  • Stay current with emerging AI infrastructure technologies, present at industry events, and represent Oracle as a technology evangelist at conferences and in customer forums.


Required Skills & Experience

  • Excellent communication and presentation skills with high degree of comfort speaking across all levels of management (e.g. IT management, Architects, administrators and executives)
  • 5+ years of hands-on experience in AI/ML infrastructure or HPC, architecting and operating large-scale GPU-accelerated environments for training and inference.
  • Deep proficiency with NVIDIA GPU technologies (CUDA, cuDNN), RDMA networking (InfiniBand, NVLink), and cluster orchestration tools.
  • Expertise in distributed training and inference frameworks: PyTorch, TensorFlow, DeepSpeed, FSDP, and model parallel toolkits.
  • Strong background in performance optimization techniques: mixed-precision training, gradient compression, asynchronous updates, and communication overlap to maximize throughput.
  • Familiarity with cloud-native practices: Docker, Kubernetes, Terraform, monitoring stacks (Prometheus, Grafana), and CI/CD for infrastructure.
  • Solid understanding of cloud architecture principles-networking, security, resilience, and cost optimization on OCI or comparable public clouds.
  • Excellent communication and presentation skills, with proven ability to engage technical and executive audiences, lead virtual teams, and influence senior stakeholders.
  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related technical field.


This role offers the opportunity to shape Oracle’s AI/ML portfolio, drive revenue growth through technical leadership, and collaborate with customers to unlock the full potential of GPU-accelerated AI and HPC solutions.

At Oracle, we don’t just respect differences—we celebrate them. We believe that innovation starts with inclusion and to create the future we need people with diverse backgrounds, perspectives, and abilities. That’s why we’re committed to creating a workplace where all kinds of people can do their best work. It’s when everyone’s voice is heard and valued that we’re inspired to go beyond what’s been done before.

We expressly encourage disabled candidates to apply for this position. Please therefore feel free to voluntarily inform us in your application about any severe disability (degree of disability of at least 50%) or any equal status (degree of disability of at least 30% together with official decision on equality) in accordance with the German SGB IX.”

Qualifications

Career Level - IC4

About Us

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.

We know that true innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing an inclusive workforce that promotes opportunities for all.

Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [email protected] or by calling +1 888 404 2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Team HRP - MEA

IBM, Riyadh
14 hours ago
IntroductionAt IBM AskHR (our AI HR Assistant) serves as the single entry point for all HR inquiries. All IBM Employees and Managers are supported through the digital tier with questions related to HR programs, processes and transactions, with one seamless hand-off to Team HR Partners and HR Functions when needed.For more complex situations, Managers can connect directly to Team HR...

ENGINEER, QA

alfanar, Riyadh
15 hours ago
Job description:Job Purpose Ensure that construction projects meet quality standards and specifications by monitoring activities, inspecting work, and coordinating with project teams. Responsible for identifying issues, preventing defects, and supporting continuous improvement in construction quality.Key Accountability Areas Quality Control & InspectionRegularly inspect ongoing construction work to make sure it meets project plans, safety codes, and industry standards.Keep an eye on...

Store Manager

Apparel Group, Riyadh
19 hours ago
About Us“Apparel Group is a global fashion and lifestyle retail conglomerate residing at the crossroads of the modern economy – Dubai, United Arab Emirates. Today, the Apparel Group caters to thousands of eager shoppers through its more than 1750+ stores and 75+ brands across all platforms employing 17,000 multicultural staff covering four continents.Apparel group has carved its strong presence not...