Site Reliability Engineer

  • Hyderabad
  • Oracle
As a SRE for the State & Local GBU you will play a critical role in solving complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. As an SRE, you not only follow best practices, standards, and processes employed by the team, but feel comfortable contributing to them as well. You have strong communication skills with a penchant for radiating information. You work as a member of our cross-discipline development team continuously collaborating with Software Developers and Software Developers in Test of your team. You are responsible in ensuring our services and systems are designed and built with reliability, scalability, and observability as a critical feature. You author and maintain operational run books to reduce Time of Incidents (TOI) and manage and triage operational tickets pertaining to the data platform services. You have a strong pulse on the cloud development community keeping up on news and changes as they come. Responsibilities displayed in the job posting Who We Are: The Oracle State & Local Global Business Unit (SLGBU) is a brand-new organization that is in the process of designing & developing a suite of SaaS applications for the state & local government markets using cutting edge technologies. These are exciting times in our space - we are growing fast, still at an early stage, and working on ambitious new initiatives. Do you want to design, build and run innovative SaaS services that will transform the Public Safety industry and have the potential to save lives? Come join us on this critical mission! Preferred Qualifications: Minimum 6+ years experience with Experience operating a production environment at high scale with emphasis on availability, latency and healthy customer experience Development experience is preferred Minimum of 4 years of experience working with the following: Linux/Unix development (Oracle Linux preferred) Containers and orchestration (Docker, Kubernetes, Mesos) Infrastructure Automation (Terraform, Chef, Ansible, Puppet) Cloud computing platform (Oracle Cloud Infrastructure Services or other cloud platforms) Programming and scripting languages (Java, JavaScript, Typescript, Python, Bash, and/or Go is a plus) Oracle database, MySQL (experience with MS SQL and/or NoSQL is a plus) CI/CD (Jenkins and GitLab CI) Git version-control and collaboration (GitLab) Issue tracking and collaboration (Jira and Confluence) Product/Service ownership or Project Management experience a plus System Administration (Linux internals, TCP/IP, DNS, Load Balancing) Experience working with fault tolerant, highly available, high throughput, distributed, scalable systems Experience with Monitoring and Observability technologies like Prometheus, Grafana, Fluent, Elasticsearch, Kibana or equivalent Excellent written and verbal communication skills and the ability to communicate with individuals across the organization A systems thinker, able to move fluidly between high-level abstract thinking and detail-oriented implementation; open minded to new ideas, approaches, and possesses the technical ability to implement ideas Experience working with Agile development frameworks A self-starter that is naturally inquisitive, requiring only small pieces to the puzzle, across many technologies new and legacy