Senior Site Reliability Engineer - SDE III (5-10 years/ AWS/ Kubernetes)

  • Gurugram
  • Peoplegene

Responsibilities:


  • - Design and develop processes to support continuous integration and delivery
  • - Lead, Architect and Develop one or more major areas of the platform, which comprises of multiple components; oversees project teams as required
  • - Obsessive focus on automation, building repeatable solutions working within and between development and production environments.
  • - Build automation tools for creating, provisioning, and deploying servers
  • - Configure and monitor AWS/GCP infrastructure (load balancer, firewall, vpcs, instances,etc.)
  • - Help automate systems for 24x7 monitoring and failure recovery
  • - Ensure proper security, monitoring, alerting and reporting for the infrastructure and services
  • - Bring expertise on troubleshooting application, database, and networking performance and failures
  • - Continually identify areas for process improvement in the production environment and develop appropriate resolutions
  • - Implement release deployment standards


Good to have:

  • - 5+ years of work experience in a Linux-based DevOps role
  • - Proficient at leveraging CI/CD tools to automate testing and deployment
  • - Deployment experience with Capistrano, Chef, Ansible, Fabric, Packer or Terraform
  • - Extremely proficient with the Unix command line, shell scripting, and configuring systems monitoring tools
  • - Knowledge of networking and software-defined networking in cloud environments
  • - Experience in automating code deployment in AWS and/or with other cloud providers such as Google Cloud, Microsoft Azure
  • - Experience with configuration management, monitoring, and automation tools
  • - Experience working in a big data setting is a plus
  • - Experience with Docker containers and orchestration platforms a plus
  • - Experience in setting up/managing ElasticSearch and Cassandra cluster is a plus