Site Reliability Engineer

  • Delhi
  • System Soft Technologies
Title: Site Reliability Engineer 100% REMOTE The Site Reliability Engineer (SRE) is a technician who utilizes an array of skills to enhance reliability in critical customer facing digital assets. The SRE is responsible for maintaining the availability and performance of relevant systems through supporting, building, and enhancing applications, tools and engaging with infrastructure teams. The SRE designs and configures systems to monitor and alert on critical applications and automate issue resolution. This role includes a focus on providing solutions that are robust, scalable, and highly available. For internal processes and technologies, the SRE will build systems to streamline operations and reduce friction. The SRE will be part of an on-call rotation to troubleshoot production issues with the specific goal of building resilient mitigation processes. Skills: BA or BS degree in Computer Science or related field required. Master's Degree in Technology or related field desired. Certification(s) specific to Architecture discipline 5+ years of experience working with technical teams. Strong emphasis on SRE as an engineering discipline with a focus on automation. You can write detailed solution specifications, diagrams, best practices/standards documentation, operating procedures, test plans/test reports, etc. Experience supporting infrastructure and services in public cloud environments (AWS, GCP, etc.). Experience building and supporting containerized application technologies, including Docker, Kubernetes. Experience with public cloud cost management. Experience in performance engineering and capacity planning. Prior success in automating a real-world production environment. Knowledge of IP networking, VPN's, DNS, load balancing and firewall. Expertise in any monitoring tools like Splunk, AppDynamics, Nagios, New Relic. Experience with software development and testing process in an agile environment Excellent problem solving, analytical, and decision-making skills. Ability to work in a collaborative environment. Must be an excellent communicator (verbal and written) Experience with deployments and operations of 24x7 high volume, highly available systems. Cloud scaling and Ability to drive automation/modernization initiatives. Enjoy working with a large variety of services and new technologies. Demonstrate a solid understanding of development, debugging, administration, and automation frameworks: C#/.NET, PowerShell, Python, Ansible, etc.Experience with logging platforms and application performance metrics: DataDog, NewRelic, Splunk, ELK, Dyantrace, App Insights Analytics, etc.