DevOps Engineer

  • Bengaluru
  • Tata Communications

Devops - MLOPs(Chatbot / iCPaaS)


Location: Open


About the role:

We are seeking a highly skilled MLOPs to take on a crucial role in our innovative AI-as-a-Service (AIaaS) product team. Our AIaaS platform is designed to empower our suite of products with cutting-edge AI capabilities and we're looking for an experienced person to support the rest of the engineering team in implementing and optimizing this new AI platform for AWS and Kubernetes.


Given the product's critical nature, we place high requirements on performance, scalability, and reliability, expecting our AI Developers to possess the maturity and foresight to prioritize a production-ready product that can scale seamlessly while maintaining high standards of observability and user experience.


Responsibilities:

Design, implement, and maintain robust MLOps infrastructure and pipelines to support AI model training, deployment, and inference at scale.

Collaborate closely with AI Developers to integrate AI technologies seamlessly into our AIaaS platform, ensuring the platform's architecture supports innovative AI features efficiently.

Apply your extensive knowledge of AWS, Kubernetes, and microservices architecture to architect and optimize our cloud infrastructure for high availability, fault tolerance, and auto-scaling.

Implement best practices in continuous integration and continuous deployment (CI/CD) pipelines, ensuring rapid and reliable delivery of features.

Spearhead initiatives to improve system observability, including monitoring, logging, and alerting, to preemptively address potential system issues.

Conduct in-depth performance tuning, including but not limited to, cost optimization, security compliance, and network architecture refinement.

Engage in proactive research and adoption of emerging technologies and methodologies that can enhance the performance and capabilities of our AIaaS offerings.

Ensure compliance with security best practices and data protection laws, particularly in configuring and managing cloud resources.

Facilitate a culture of excellence and continuous improvement within the team, mentoring junior team members and sharing knowledge on MLOps best practices.


Qualifications:

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience in SRE and DevOps roles.

At least 7 years of experience in SRE and DevOps, with a strong emphasis on cloud technologies, containerization, and microservices architectures.

Deep expertise in AWS, Kubernetes, and microservices architecture is essential.

Proficiency in implementing CI/CD pipelines, with extensive hands-on experience in tools like Jenkins and GitHub Actions.

Solid understanding of AI development lifecycle and familiarity with AI technologies, including machine learning, NLP, and generative AI models.

Demonstrable experience in system monitoring, logging, and alerting tools, with a keen focus on maintaining high system reliability and performance.

Strong analytical and problem-solving skills, capable of handling complex technical challenges and driving performance optimization initiatives.

Excellent communication skills, able to articulate technical concepts clearly to both technical and non-technical stakeholders.

Self-motivated and adaptable, with a startup mentality and a strong desire to contribute to a dynamic and innovative team environment