Senior Big Data Engineer
BBI
Date: 2 weeks ago
City: Riyadh
Contract type: Full time
Responsibilities:
- Design, implement, and optimize data pipelines for batch and real-time data processing using Cloudera (Hadoop, Hive, Spark, Impala) and Informatica (PowerCenter, Cloud Data Integration)
- Build data extraction, transformation, and loading (ETL) workflows using Informatica PowerCenter for large-scale data integration from source systems (e.g., relational databases, flat files, APIs) into Cloudera Data Lake or data warehouse environments.
- Implement Spark jobs on Cloudera for distributed data processing and optimization of data workflows.
- Leverage Informatica for orchestrating ETL workflows, including data extraction, cleansing, transformation, and loading into data repositories (HDFS, Hive, SQL databases, etc.).
- Optimize the Informatica workflows to minimize runtime, ensure smooth data integration, and maintain high data quality.
- Utilize Hadoop and Spark on Cloudera to process large datasets and implement data transformations using MapReduce, Spark SQL, and PySpark.
- Leverage Impala for low-latency SQL queries on Hadoop, ensuring real-time access to processed data.
- Implement partitioning, bucketing, and indexing strategies in Hive and HBase to improve query performance on large datasets.
- Implement and enforce data quality rules within Informatica workflows, ensuring that all transformations meet the required standards for completeness, consistency, and accuracy.
- Ensure compliance with data governance and security protocols (e.g., encryption, masking, access control) in accordance with industry best practices.
- Automation and Scheduling: Automate ETL workflows using Informatica Server, integrating with Airflow, Nifi or other workflow orchestration tools for scheduling and monitoring jobs.
- Utilize Cloudera Navigator for monitoring and auditing data processes within the Hadoop ecosystem.
- Perform regular tuning of the ETL pipelines, data flows, and SQL queries to ensure optimal performance.
- Bachelor’s degree in Computer Science, Engineering, or related field.
- 6+ years of experience in the same field.
- Proven experience with the Cloudera Distribution of Hadoop (CDH), including expertise in HDFS, Hive, Impala, Spark, and HBase.
- Strong hands-on experience with Informatica PowerCenter (ETL), EDC, IDQ, B2B, and Axon.
- Deep understanding of ETL best practices, data pipelines, and distributed computing technologies such as Spark, MapReduce, PySpark, and Hadoop ecosystem components.
- Advanced SQL skills for data manipulation, aggregation, optimization, and reporting across relational and non-relational data stores (e.g., SQL Server, MySQL, PostgreSQL, Hive, Impala).
- Experience in Python and SQL.
- Strong background in data warehousing principles and data modeling, including dimensional modeling (star schema, snowflake schema) and OLAP/OLTP considerations.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Full Stack Developer
Falakcart فلك كارت,
Riyadh
2 days ago
About the RoleFalak Cart is seeking a Full Stack Developer to support the development and enhancement of its B2B e-commerce platform. The successful candidate will be responsible for building, maintaining, and optimizing both front-end and back-end components while contributing to platform scalability, integrations, performance, and security.This role is ideal for a developer who enjoys working across the full technology stack...
Materials Specialist - Roads / Bridges
Parsons,
Riyadh
2 days ago
In a world of possibilities, pursue one with endless opportunities. Imagine Next!
At Parsons, you can imagine a career where you thrive, work with exceptional people, and be yourself. Guided by our leadership vision of valuing people, embracing agility, and fostering growth, we cultivate an innovative culture that empowers you to achieve your full potential. Unleash your talent and redefine...
Recruitment Manager - Riyadh/Saudi Arabia
Virtucruit,
Riyadh
2 days ago
Job Description:We are partnering with a leading regional developer of large-scale public-private partnership (PPP) projects across oil and gas, energy transition, water distribution, and social infrastructure, with a substantial portfolio of projects under management and a growing family of operating companies. Alongside its development business, the group runs an active investment platform backing high-potential, fast-growth technology companies across energy, infrastructure,...