Data Engineer

  • Gurugram
  • Avrio
Overview Avrio is an international climate tech startup leveraging advancements in AI to decarbonize the real estate industry which is responsible for around 39% of global carbon emissions. As a Machine Learning Engineer at Avrio, you will play a crucial role in developing AI-powered solutions that enhance our product offerings and improve our clients' business outcomes. You will work closely with data scientists and cross-functional teams to build and deploy scalable machine learning models that are critical to our products' success. Responsibilities Design, develop, and maintain scalable data pipelines and ETL processes to ingest, transform, and store large volumes of structured and unstructured data from diverse sources. Focus on implementing optimal storage solutions for time series data to ensure efficient access and query performance. Develop and maintain a workflow orchestration system for efficiently running ETL and other machine learning pipelines. Implement scheduling, monitoring, and error-handling mechanisms to ensure the reliability and robustness of data processing workflows. Manage infrastructure for alerting and recommendation systems, ensuring timely delivery of actionable insights to clients. Design and deploy scalable solutions for real-time monitoring, anomaly detection, and predictive analytics. Collaborate with cross-functional teams including data scientists, software engineers, and domain experts to understand business requirements and translate them into scalable data engineering solutions. Stay abreast of emerging technologies and best practices in data engineering, workflow orchestration, and infrastructure management. Evaluate and adopt new tools and frameworks to enhance efficiency, scalability, and reliability. Provide technical guidance and mentorship to junior team members, fostering a culture of collaboration, innovation, and continuous learning. Document data engineering processes, workflows, and infrastructure configurations. Ensure compliance with data governance policies, security standards, and regulatory requirements. Qualifications Bachelor's degree or higher in Computer Science, Engineering, or a related field. Proven experience in data engineering, ETL development, and infrastructure management, preferably in a cloud-based environment. Proficiency in programming languages such as Python, SQL, Java, or Scala, and experience with data processing frameworks like Apache Spark, Apache Beam, or Apache Flink. Strong understanding of distributed computing principles, containerization technologies (e.g., Docker, Kubernetes), and cloud services (e.g., AWS, Azure, GCP). Experience with workflow orchestration tools such as Apache Airflow, Luigi, or Prefect. Knowledge of alerting systems (e.g., Prometheus, Grafana) and recommendation engines. Excellent problem-solving skills, attention to detail, and ability to work independently and collaboratively in a fast-paced environment. Strong communication and interpersonal skills, with the ability to effectively communicate technical concepts to both technical and non-technical stakeholders. Culture Fit Autonomy: We think your time is sacred, whether it's at work, or outside of work. Ownership: We're a company with a high ownership, high autonomy culture. We hope that you'll come in, help us, and over the course of many years do the best work of your life. When we bring you onboard, we expect you to change the company. New hard problems: We are at the frontier in our industry building cutting-edge solutions. You will constantly have new interesting and hard problems to solve. Growth: We are committed to your professional development. The dynamic nature of our work provides abundant opportunities for personal and career growth.