Data Engineer

  • Gurugram
  • Exl

Job Location: Bangalore/Gurgaon

Shift Timing: 12:00PM IST – 10:30 PM IST

Experience: 3+ years


Job Summary:

Data Engineer (DE) is responsible for designing, developing, and maintaining data assets and data related products by liaising with multiple stakeholders.

Responsibilities:

· Collaborate with project stakeholders (client) to identify product and technical requirements. Conduct analysis to determine integration needs.

  • Use different data warehousing concepts to build a data warehouse for reporting purpose.

· Build data pipelines to ingest and transform the data into our Data platform.

· Apply best approaches for large scale data movement, capture data changes and apply incremental data load strategies.

· Develop, implement, and tune large-scale distributed systems and pipelines that process large volume of data.

  • Assist Data Science / Modelling teams in setting up data pipelines & monitoring daily jobs.
  • Develop and test ETL components to high standards of data quality and act as hands-on development lead.
  • Oversee and contribute to the creation and maintenance of relevant data artifacts (data lineages, source to target mappings, high level designs, interface agreements, etc.).
  • Ensuring that developer responsibilities are being met by mentoring, reviewing code and test plans, verifying that design best practices as well as coding and architectural guidelines, standards, and frameworks.
  • Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
  • Create the data integration and data diagram documentation.

Qualifications (Must have):

  • 3+ years as Data Engineer with proficiency in SQL, Python & PySpark programming.
  • Strong knowledge on Databricks and related services/functionalities and how to utilize them across the DE & Analytics spectrum
  • Strong knowledge on Hadoop, Hive, Databricks and RDBMS like Oracle, Teradata, SQL server etc
  • Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
  • Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
  • Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
  • Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
  • Proficiency in at least one cloud platform (AWS, Azure, GCP) & developing ETL processes using ETL tools, big data processing and analytics with Databricks.

· Expertise in building data pipelines in big data platforms; Good understanding of Data warehousing concepts

  • Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
  • Expectation is to have strong problem-solving and troubleshooting skills.
  • Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
  • Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
  • Strong business acumen & demonstrated aptitude for analytics that incite action.

Qualifications (Preferred):

  • Good experience building Real-Time streaming data pipelines with Kafka, Kinesis etc.
  • Knowledge of Jinja/YAML templating in Python is a plus.
  • Knowledge and experience in designing and developing RESTful services.
  • Working knowledge of DevOps methodologies, including designing CI/CD pipelines
  • Experience building distributed architecture-based systems, especially handling large data volumes and real-time distribution.
  • Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
  • Expectation is to have strong problem-solving and troubleshooting skills.
  • Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
  • Initiative and problem-solving skills when working independently.
  • Familiarity with Big Data Design Patterns, modelling, and architecture.
  • Exposure to NoSQL databases and cloud-based data transformation technologies.
  • Understanding of object-oriented design principles.
  • Knowledge of enterprise integration patterns.
  • Experience with messaging middleware, including queues, pub-sub channels, and streaming technologies.
  • Expertise in building high-performance, highly scalable, cloud-based applications.
  • Experience with SQL and No-SQL databases.