drjobs NICE Actimize - Site Reliability Engineer

NICE Actimize - Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bogota - Colombia

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Sutherland is seeking Application and System monitoring Engineer to take our existing CloudOps monitoring to the next level.

In this position You will be working with multitude of modern tools and technologies to properly and efficiently build next generation of monitoring system as well as troubleshoot and resolve issues in our development test and production environments.

The ideal candidate has to have the ability to work in a dynamic and complex software build environment and will also be an energetic selfstarter with a passion to build innovate and achieve excellence.

Subject matter expertise:

  • Experience implementing predictive and detailed monitoring.
  • Expert in Linux Command line.
  • Design architect and implement secure and highly available monitoring infrastructure.
  • Enhanced monitoring capabilities including
  • Auto detection of brute force attacks in logs.
  • Password attacks in logs.
  • Implement next gen predictive monitoring solution to
  • Detect and alert on capacity utilization of compute resources.
  • Detect and alert on any network related issues and choke points.
  • Ability to design implement and improve Grafana Prometheus Loki Promtail node exporter.
  • Log parsing and management.
  • Configuration of alerting push notifications to VictorOps (now Splunk) and Email notifications.
  • Architect design and Implement Icinga 2 monitoring and alerting.
  • Ability to monitor system metrics and log parsing.
  • Ability to automate tasks using bash and / or Python scripting.
  • Predictive monitoring of systems and applications.
  • Familiarity with JVM internals and using of JMX and REST for monitoring.
  • Familiarity with AWS infrastructure.
  • Deep understanding of Java applications TLS Apache.
  • Automated checks of performance of system metrics in Grafana.
  • Automated checks of performance of Web Applications.
  • Problemsolving and troubleshooting including performing root cause analysis to design preventative activities.
  • Crafting and maintaining dashboards and reports pulling together monitoring data across multiple platforms within the same tool as well as across multiple tools.
  • Assisting with writing scripts and queries that can provide environment selfhealing capabilities.
  • Written verbal interpersonal and presentation skills.
  • Communications among technical and nontechnical employees.
  • A customer driven approach and good customer management skills.
  • Staying abreast of the latest monitoring technology and trends.
  • Adhering to configuration release and change management protocols.

Qualifications :

  • Experience with using monitoring tools in a production environment.
  • 5 years of production cloud operations experience
  • 5 years expertise in Linux command line.
  • 5 years of using Terraform in AWS for automation. Hands on with automation and seeking out opportunities to automate manual processes.
  • 5 years of strong handson experience building production services in AWS. (Must Have)
  • 4 years of experience with scripting using Python and Bash
  • Ability to participate in oncall rotation
  • Considerable knowledge of IT equipment and diagnostic tools.
  • Considerable knowledge of principles and techniques of systems analysis design development and programming.
  • Considerable knowledge of principles of information systems.
  • Cnsiderable knowledge of capabilities of computer technology.
  • Knowledge of methods and procedures used to conduct detailed analysis and design of computer systems.
  • Knowledge of practices and issues of systems security and disaster recovery
  • Knowledge of computer operating systems.
  • Considerable problem solving skills.
  • Considerable logic and analytical skills.
  • Considerable oral and written communication skills; interpersonal skills; considerable ability to analyze troubleshoot and resolve data communications problems.
  • Considerable ability to prepare manuals reports documentation and other written materials; considerable ability to identify analyze and resolve complex business and technical problems.


Additional Information :

This is a Hybrid position in Bogot Colombia. Enjoy the benefits of joining a Great Place to Work company working for the worlds biggest technology companies


Remote Work :

No


Employment Type :

Fulltime

Employment Type

Full-time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.