DevOps Engineer/SRE

  • Pune
  • Ltimindtree

Mandatory Skills:

Monitoring of applications and automation - tools like Splunk, Grafana, NewRelic, Dynatrace etc

Troubleshooting of java applications - performance/degradation

CICD

Docker/ Kubernetes

Experience with any cloud solutions

Hands on coding experience - Python, Go, Unix/Shell

Responsibilities :

Ensure site performance, Reliability, scalability & resiliency. As a Site Reliability Engineer, I have been responsible to Engage and improve the whole lifecycle of services from inception and design, through deployment, operations and refinement.

Partner with development SQUAD teams in designing/architecting complex business solution in the Cart, Checkout & Account areas by adopting Google Cloud Platform best practices,

Analyze system and application issues using various tool such as Stackdriver, BigQuery, Grafana, Akamai, New Relic, Cloud Armor, Google Cloud platform resources

Monitor and analyze application traffic to identify organic traffic vs BOT/Crawler/malicious traffic

Automate custom monitoring & reporting for proactive monitoring and reduce manual effort, impact to application and analysis time.

Build alerts to proactively monitor parameters that cannot be monitored using installed monitoring tools.

Analysis and Discovery of Requirements, Dependencies, and Architecture of Cloud platform

Assess network security & policies and take necessary actions using Cloud Armor, VPC Network, Firewall rules etc

Design and create static caching solutions using Google Cloud CDN and Akamai CDN

Maintain services once they are live by measuring and monitoring availability, latency and overall system health.

Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Perform destructive testing and performance testing to find underlying application performance issue and build resiliencies

Create reporting, alerting, monitoring for the services and perform cloud deployments

Setup CI/CD process using GiHub, Concourse, Jenkins to create jobs, Maven build and Create Docker image and use the docker image to deploy in google cloud clusters.

Monitor application health and Service Level Objectives (SLOs) post deployments and troubleshoot real time issues, if any

Creating & update reporting, alerting, monitoring, deployments using python and other scripting languages

Coordinate multiple teams and stakeholders to perform blameless postmortems for major incidents