AIOps Architect

  • Hyderabad
  • Adp

Global Enterprise Technology and Solutions – AIOps Architect Monitoring

Job Location: India (HYD)

Qualification: B.Tech/M.Tech (Computer science or IT preferred)

Experience: 10 - 20 years

Job Profile: ADP is looking for a monitoring Architect, Netcool Omnibus and Impact Engineer who is passionate in Event Management engineering and automation to join our growing Monitoring Engineering, Tools and Automation team. You will be in the Monitoring engineering team who will be responsible for setting up, monitors in ADP cloud and AWS and provide an AIOps solution. You should have the ability to multi-task while working under pressure and is expected to be flexible to provide afterhours support on need basis. This role requires frequently interfacing with functional and project teams and ensures clients experience World Class Service when engaged.

Responsibilities:

Ø AIOps Architect in the space of Monitoring will be responsible to design a truly dynamic monitoring solution using IBM AIOps tool that will help in co-relating issues, predict outages, pull up topologies from different sources and merge it together.

Ø Netcool Agile Service Management (ASM) – Ability to create various observers to create topologies, create generic templates of file based observers for consumption across all the domains of (Compute, Network, Storage as well as Application)

Ø Leverage various AI based grouping of IBM AIOps and showcase immediate value to the business in reduction of MTTR and immediately identify root cause with good sense of co-relation of alerts

Ø Partner with respective product owners and SRE’s, get all the alerts data into AIOps and showcase immediate value for their product.

Ø Hands on experience in Netcool Event Management Product Suite, specially with Omnibus and impact & along with good knowledge on the Scripting languages.

Ø Adhere to standards, processes, procedure and audit compliance controls, Ability to effectively estimate and escalate issues timely. Flexible to adjust work schedule as per BU needs.

Ø Spun up IBM Watson AIOPS components as docker containers inside OCP cluster . Good understanding of OCP or any container orchestration technology( Kubernetes), fair understanding of kubernetes resource management commands via OCP.

Ø Resize the OCP work loads based on the requirements, take backups of various AIOps components (Cassandra, postgres, asm). Work with platform teams to provision resources needed and assign the persistent volume class to different operators.

Skillset Requirements:

  • B.Tech/M.Tech with 10- 20 Yrs of experience in the IT Infrastructure Monitoring domain with a thought process of an Architect to convert Problem Statements to Solutions. Hands on experience in Netcool Omnibus 8.1 and Netcool Impact with ability to install, configure, upgrade, patch and administer the product.
  • Deeper understanding of Netcool Omnibus, Netcool Impact, Probes etc. Have experience with IBM NOI and IBM Watson AIOPS AI manager . Experience with IBM Tivoli Agile Service Manager (ASM), Troubleshooting and managing Topology event grouping services.
  • Configuring event manager gateway, cloud native analytics, discovery, integrate with other tools. Configuring localization service to calculate blast radius of an incident.
  • Candidate should have deep knowledge of developing snmp.rules, writing triggers, coding tools and a good knowledge of I mpact Policy language . Hands on experience in event management from multiple sources, configuring various probes and proficiency in Netcool Omnibus data structure with good debugging skills.
  • Strong understanding of OCP development/Administration , ability to navigate through the OCP work loads via oc commands as well as ability to browse through various yml’s to be able to design a fit for purpose work load.
  • Very good understanding of Kubernetes and ability to navigate through various configmaps, secrets etc to manage the overall AIOps application.
  • Working knowledge of either Splunk or Spectrum is an added advantage.
  • Splunk – This is a data aggregation tool (basically for log aggregation)
  • Spectrum – This is a monitoring tool to monitor network devices
  • A developer mind set and open to any kind of automation challenge to develop tools that would ease regular work and help other operations teams have a better event management solution. Experience with 8.1 OMNibus and OMNibus core sub-products (gateways and probes), webgui Dash, Netcool Impact, Developed Operator views using Impact, Oracle GW.
  • Developing web service interface between Netcool and a Ticketing system, Linux OS, Perl CGI/Shell, batch SQL, Analyze and research options. Have configuration experience on HA environment of both OMNibus and Impact products.
  • Have Integrated Impact to other tools and configuring Data Analysis, Correlation of Netcool alerts with Impact. Should be able to easily write Impact Policies as per the problem statements via Event Integration & Co-relation and Maintenance Window Management features.
  • Good understanding of python or any programming languages and has written code at mid level python development level. Strong knowledge of writing complex but efficient SQL queries, should be able to get results from various tables of Netcool’s DB.
  • Greater knowledge of Agile Methodologies, DevOps, CICD Tools, AI and ML techniques will be an added advantage. Good team player, mentoring attitude, exceptional written and verbal communication.

Knowledge

  • Kubernetes, OCP Clusters, OC Commands
  • Agile Methodologies, DevOps, CICD Tools
  • Python Programming, Ansible
  • Linux OS, DB (Cassandra, Postgresql, Oracle)
  • GenAI, AIOps, AI & ML Techniques.

Education: B.Tech / M.Tech