Site Reliability Engineer
Thales
We are seeking a skilled DevSecOps / Sovereign Cloud Engineer with 5-10 years of experience to design, operate, and secure regulated cloud environments across AWS and GCP. The role combines cloud operations, security engineering, and SRE practices, including Kubernetes management, Infrastructure as Code (Terraform), CI/CD automation (Git/GitLab), and secrets management (HashiCorp Vault). The ideal candidate will drive security incident response, platform reliability, observability (Datadog), and 24x7 incident management while embedding security-by-design principles, automation-first practices, and continuous improvement into sovereign cloud platforms. Strong expertise in cloud security, networking fundamentals, compliance-driven environments, and cross-functional incident leadership is essential.
A week in the life of an DevSecOps Engineer:
- Deploy, manage and maintain the organization’s Sovereign Cloud strategy, ensuring compliance with regulatory and data residency requirements on AWS/GCP public cloud.
- Managing and operating Kubernetes clusters including upgrades, scaling, and workload optimization.
- Security Incident Response & Risk Mitigation – Investigate, analyze, and remediate cloud security incidents, proactively identifying and mitigating vulnerabilities within AWS environments.
- Secure Cloud Strategy & Continuous Improvement – Support and enhance the organization’s AWS cloud strategy by embedding security best practices and continuously improving the cloud security posture.
- Security Automation & DevSecOps Enablement – Implement and tune security tools, automate policy-driven responses, and advocate DevSecOps practices to ensure secure-by-design cloud operations.
- Implementing and managing Infrastructure as Code (Terraform) to provision, modify, and secure cloud resources.
- Maintaining and optimizing CI/CD pipelines using Git/GitLab, ensuring secure and automated deployments.
- Managing secrets and secure access using HashiCorp Vault, including token lifecycle, access policies, and secrets rotation.
- Troubleshooting complex infrastructure, networking, container, and performance issues across distributed systems.
- Observability & Reliability Engineering – Monitor system health and performance using Datadog, define and manage SLIs/SLOs, and drive continuous reliability improvements aligned with SRE principles.
- Incident Management & Operational Governance – Manage 24x7 alerting and incident response through PagerDuty, perform root cause analysis (RCA), and actively contribute to incident, problem, and change management processes.
- Cloud Security & Performance Optimization – Conduct proactive system hardening, vulnerability remediation, performance tuning, and capacity planning across cloud environments.
- Automation & Continuous Improvement – Develop automation using Python/Bash/Terraform/Ansible to reduce manual effort, improve operational efficiency, and strengthen platform resilience.
Knowledge, Skills and Experience:
To succeed in this role, you must have:
- 4–5 years of hands-on experience in DevSecOps, SRE roles.
- Proven hands-on expertise managing Kubernetes clusters
- Experience with Terraform (IaC) Deployment, Git/Gitlab for CI/CD and version control for best practices
- Experience working with HashiCorp Vault, Terraform, Datadog, Pagerduty, Confluence etc
- Strong understanding of Cloud Security principles (IAM, encryption, network security, container security, vulnerability management).
- Experience in incident management, change management, and root cause analysis processes.
- SRE & Automation Expertise – Strong understanding of SRE principles (SLIs, SLOs, error budgets, reliability metrics) with hands-on scripting experience in Python and/or Bash to drive automation-first practices.
- Networking & Compliance Knowledge – Solid grasp of networking fundamentals (TCP/IP, DNS, load balancing, firewalls, VPNs, private endpoints) with experience or exposure to regulated and compliance-driven environments.
- Leads incident bridges during P1/P2 outages, coordinating cross-functional teams and driving timely resolution with clear RCA.
- Maintains a strong customer and business-focused mindset while prioritizing tasks.
- Cloud certifications in a plus
- Linux, Kubernets, Terraform, Ansible, Google Cloud knowledge is mandatory.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Marine Expert(Ports and Terminals)
Operation Lease Coordinator
Mechanical Inspector