Principal Site Reliability Engineer

  • gurugram

Cvent is a leading meetings, events and hospitality technology provider with more than 4,800 employees and nearly 22,000 customers worldwide. Founded in 1999, the company delivers a comprehensive event marketing and management platform for event professionals and offers software solutions to hotels, special event venues and destinations to help them grow their group/MICE and corporate travel business.


The DNA of Cvent is our people, and our culture has an emphasis on fostering intrapreneurship --a system that encourages Cventers to think and act like individual entrepreneurs and empowers them to take action, embrace risk, and make decisions as if they had founded the company themselves. We foster an environment that promotes agility, which means we don’t have the luxury to wait for perfection. At Cvent, we value the diverse perspectives that each individual brings. Whether working with a team of colleagues or with clients, we ensure that we foster a culture that celebrates differences and builds on shared connections.


Job Description:

Cvent is looking for a Principal Site Reliability Engineer to help us scale our systems and ensure stability, reliability and performance and rapid deployments of our platform. We build teams that are inclusive, collaborative, and have a strong sense of ownership for the things they build. If you have a passion and track record for solving problems; moreover, have strong leadership skills, this is a great fit for you.

As a Principal Engineer, you will demonstrate both emerging and current technologies, methods, and processes contributing to the evolution of software deployment processes, enhancing security, reducing risk, and improving the overall end-user experience. As part of the Technology R&D Team, you will play an integral part in advancing DevOps maturity and be a part of a new culture of quality and site reliability. You will continually improve reliability, resiliency and scalability of our products, processes, and procedures. In this position, you would also be expected to ramp up to manage/mentor engineers and ensure their technical growth.


What You Will Be Doing:

• Set the direction and strategy for execution of increasingly complex problems and solve them keeping larger time horizon in mind.

• Keeps abreast of emerging cloud technologies and leverages them when appropriate. Performs POCs to determine suitability, etc.

• Identifies recurring problems and anti-patterns in cloud implementations. Evangelizes on correct practices and helps teams improve their platforms.

• Own site stability, performance, and capacity planning. Drives awareness of reliability and performance concerns across their area of concern and beyond.

• Actively engage with other SRE Teams, Product Dev Teams etc, to work toward a holistic understanding of SDLC problems and solutions.

• Foster a learning and ownership culture within the team and the larger Cvent organization.

• Ensure best engineering practices through automation, infrastructure as code, robust system monitoring, alerting, auto scaling, self-healing, etc.

• Represent the technology perspective and priorities to leadership and other stakeholders by continuously communicating timeline, scope, risks, and technical road map.


What You Need for this Position:

• 10+ years of hands-on technical experience within the realm of Site Reliability Engineering

• Architect-level understanding of one or more of the major public cloud services (AWS, GCP & Azure), using them to effectively design secure and scalable services.

• Strong understanding of SRE concepts and the DevOps culture, with a focus on leveraging software engineering tools, methodologies, and concepts

• In-depth understanding of automation and CI/CD processes to go along with excellent reasoning and problem-solving skills.

• Experience with Unix/Linux environments with a deep grasp on system internals

• Worked on large-scale distributed systems including multi-tiered architecture.

• Strong knowledge of modern platforms like Kubernetes, Docker, etc.

• Experience working with monitoring tools (Datadog, ELK stack, etc) and Database technologies (SQL Server, Postgres and Couchbase preferred)

• Validated breadth of understanding and development of solutions based on multiple technologies, including networking, cloud, database, and scripting languages.

• Strong leadership, communication and interpersonal skills geared to getting things done.