Site Reliability Engineer

  • Gurugram
  • Acefone

Key Responsibilities:

1. Telephony Infrastructure Management:

Design, implement, and maintain internet telephony systems to ensure high availability and call quality.

Manage and optimize cloud telephony services to scale with our growing user base.

Troubleshoot and resolve telephony-related issues to minimize downtime and disruptions.

2. Cloud Expertise:

Utilize AWS services and resources to architect and maintain a resilient telephony platform.

Implement autoscaling and load balancing strategies to handle varying call volumes.

3. Linux Administration:

Manage Linux-based servers and containers that support our telephony infrastructure.

Perform system upgrades, patch management, and security enhancements.


4. Monitoring and Alerting:

Set up robust monitoring and alerting systems to proactively identify and address telephony performance issues.

Continuously optimize monitoring tools to ensure efficient resource utilization.

5. Incident Response:


Participate in on-call rotations to respond to telephony-related incidents and outages.

Work collaboratively with cross-functional teams to resolve issues and prevent recurrences.

6. Documentation and Best Practices:


Document telephony configurations, procedures, and best practices for reference and knowledge sharing.

Stay up-to-date with industry trends and emerging technologies to recommend improvements.

Requirement:

  • Bachelor’s degree in computer science, Information Technology, or related field (or equivalent experience).
  • Proven experience as an SRE or similar role with a focus on telephony systems.
  • Strong expertise in internet telephony technologies (e.g., SIP, VoIP) and cloud telephony solutions.
  • Hands-on experience with AWS services and cloud infrastructure.
  • Proficiency in Linux system administration and troubleshooting.
  • Excellent problem-solving skills and a proactive approach to identifying and addressing issues.
  • Experience with monitoring and alerting tools (e.g., Zabbix, Newrelic, Prometheus, Grafana).
  • Strong scripting and automation skills (e.g., Python, Bash).
  • Knowledge of containerization technologies (e.g., Docker, Kubernetes) is a plus.
  • Effective communication skills and the ability to work collaboratively in a team-oriented environment.
  • AWS or other relevant certifications are a plus.