Cloud Architect

  • Hyderabad
  • Grid Dynamics
Senior Cloud Consultant/Architect - (SRE / Observability / AIOPs) Job Description Professionals who specialize in cloud operations with focus on observability (logging, tracing, alerting) with a vision for AIOPs and a strong understanding in practice of site reliability. A consultant with a mix of knowledge and skills in software development and cloud platforms, with experience to advise clients on how to analyze their challenges, advise, design, build, test, and deploy changes while maintaining a cloud operating model e.g. DevOps & ITSM process and tools. Seasoned client facing, senior consultant that has advised IT executives on cloud operating models e.g. IT processes and tools Experienced with knowledge to reduce the day to day noise and toil of IT support and improve the availability of the client’s application suite via new support methods, scripting automation and advanced new tooling. Experience working closely with production operations, application developers, system, network, middleware and database administrators to streamline development, operations and support processes Adept at analysing and problem solving and preferably have a blend of platform, middleware, network and software development skills Experience with consulting methodologies, knowledge management and service offering development (to assist in building cloud practice offerings from sales through delivery) Apply consulting and engineering skills to solve operations problems by: Defining and driving initiatives to increase the client‘s overall application availability Building tooling needed to improve observability of performance and operations efficiency Enhancing monitoring and management tooling to better detect, diagnose, and correct problems Resolution of problems in code for an incident, when applicable Documenting defects to communicate back to the Service Owner(s) Participate with application developers to develop new features and automation to solve operational challenges Driving the transformation of delivery methods into the operational teams such as network, database, system administrators, Incident management Enabling an AIOps strategy and roadmap to drive more predictive and automated response Investigate RCA resolution to get to, and correct, the source of issues and outages. Key Skills Ideally a former Developer who knows how to troubleshoot applications transactions end to end and critical points of failure or bottlenecks. DevOps/GitOps mindset with a vision for AIOPs (how AI can automate analysis, assignments, decisions and actions to support and operate a platform and application) Cloud Native dashboarding & alerting. (minimally familiar with AWS, GCP and Azure with depth in at least 1) Experience with scalable architectures and performance tuning. Enjoy solving difficult engineering problems, approach troubleshooting systematically, and comfortable getting hands-on to guide engineers and operators Great communication and planning experience ideally with large consultancy background Ability to own all or part of an assessment to develop recommendations and a roadmap Solid understanding of ITSM and ITIL principles with focus on Event, Incident, Problem, change and Configuration Management - and ability to lead assessments of maturity Nice to have software engineering skills ideally with experience in Python, Go and/or Java Understanding of large-scale complex systems from a reliability perspective Passion for resolving reliability issues and identify strategies to mitigate going forward Implemented High Availability & Disaster Recovery Infrastructure in the cloud. Experience with self-healing infrastructure. Adhering infrastructure to business SLAs and SLOs and managed Error Budgets. Tech Stack MUST HAVE (hands on with at least one): Dynatrace, Big Panda, Datadog or New Relic Highly desired hands on: Grafana, ELK Stack, Prometheus, Splunk, and cloud native tools for alerting and logging Knowledge of required and preferred to have some hands on: Kubernetes, Terraform, Python, GCP/AWS/Azure NEW - ABOUT GRID DYNAMICS - FEB 2024 Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, and advanced analytics services. Fusing technical vision with business acumen, we enable positive business outcomes for enterprise companies undergoing business transformation by solving their most pressing technical challenges. A key differentiator for Grid Dynamics is our 7+ years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization , and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India. Follow us on LinkedIn .