Apply now »

Infrastructure Performance Administrator

Business Unit:  Discovery Central Services
Function:  Information Technology
Date:  22 Apr 2024

Discovery – Technology Infrastructure

Infrastructure Performance Administrator  

 

About Discovery

Discovery’s core purpose is to make people healthier and to enhance and protect their lives. We seek out and invest in exceptional individuals who understand and support our core purpose, and whose own values align with those of Discovery. Our fast-paced and dynamic environment enables smart, self-driven people to be their best. As global thought leaders, Discovery is passionate about innovating in order to not only achieve financial success, but to ignite positive and meaningful change within our society.

 

About Infrastructure Performance Management

Infrastructure Performance management was coined within Discovery, for a requirement that arose to install, configure, maintain, and monitor the IT infrastructure and application monitoring stack, this extends to Discovery’s local and international companies, the function residing within the Technology Infrastructure department.

 

The framework is based upon ITIL principles and the best practices of Event Management. The team dovetails closely with the Event Operations team, who successfully escalate and notify back-office teams of IT threshold breaches, preventing any potential outages and business disruption.

 

Key Purpose

The Infrastructure Performance Administrator role reports into the IT operations manager for the purpose of supporting, servicing, configuring, monitoring IT Infrastructure, and performance, availability, including maintenance of the monitoring environment. Also providing infrastructure and application performance management support to Back Office and System teams. Close engagement with Events Operations and Incident Management teams by means of updates to configurations and procedures. Interaction with product vendors are also required. Key importance is focusing on uptime and ensuring timely resolution in the event of failures. The role requires troubleshooting and investigating the root cause of application and infrastructure operational failures, while providing recommendations for remediation controls.

 

Areas of responsibility may include but not limited

  • To ensure that all the IT monitoring products suites and infrastructure are maintained, focusing on uptime and serviceability.
  • Creating, maintaining, and contributing of team processes and documentation, including monthly reporting
  • Preparing of availability monitoring dashboards for teams as well as management, to provide varying levels of visibility into the issues encountered within the environment, both real-time and over an extended period
  • Create and maintain custom monitors, notifications and dashboards required by all relevant teams.
  • Focusing on the enhancements and application of the current and new core monitoring systems
  • Configuring monitoring and maintaining, correlating alerting solutions to ensure that only relevant issues are being identified and alerted upon
  • Gathering business and technical information regarding solutions used within the environment to determine the specific services and functions required to be monitored.
  • Tweaking the monitoring, correlation, and alerting solutions to maximize the number of pertinent events being identified, while continuing to minimize false positives.
  • Identifying technical issue trends that point towards an underlying problem, working with Problem Management and Major incident leads ensuring correct notifications and thresholds are setup.
  • Contributing to the troubleshooting and resolution of potential significant infrastructure issues occurring within the environment, to address complex and underlying issues.
  • Resolve incident/request/workflow assigned to team within the specified SLA

 

Personal Attributes and Skills

  • Working with People
  • Adhering to Principles and Values
  • Planning & Organising
  • Delivering Results and Meeting Customer Expectations
  • Deciding and Initiating action
  • Presenting and Communicating Information
  • Applying Expertise and Technology
  • Adapting and Responding to Change
  • Coping with Pressure and Overcoming setbacks
  • Problem Solving
  • Attention to detail.

 

Education and Experience

  • Matric - Essential
  • IT Diploma or Degree – Essential
  • 4 Years IT Operations/Services experience with 3 years in a Performance Monitoring and Administration of an IT environment – Essential
  • VMware Portal – Essential
  • Ansible Tower/AWX – Essential
  • ITIL Foundation – Essential
  • Monitoring Tools (Dynatrace, SCOM, Appdynamics etc) – Essential
  • Cloud deployment – Advantageous
  • Github – Advantageous
  • Sailpoint – Advantageous
  • Service Now – Advantageous
  • CA – Advantageous
  • Power BI – Advantageous
  • Beyond Trust – Advantageous
  • SaltStack – Advantageous
  • TerraForms – Advantageous
  • Unix or Linux certification – Advantageous
  • Microsoft OS – Advantageous
  • Networking knowledge - Advantageous

 

 

EMPLOYMENT EQUITY   
                             
The Company’s approved Employment Equity Plan and Targets will be considered as part of the recruitment process. As an Equal Opportunities employer, we actively encourage and welcome people with various disabilities to apply.

Apply now »