Sr System Reliability Engineer (Application Support + Automation)

Fulcrum Digital

Not Interested
Bookmark
Report This Job

profile Job Location:

Dublin - Ireland

profile Monthly Salary: Not Disclosed
profile Experience Required: 4years
Posted on: 14 hours ago
Vacancies: 1 Vacancy

Job Summary


Who are we
Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries including banking & financial services insurance retail higher education food healthcare and manufacturing.


The Role

  • Plan manage and oversee all aspects of a Production Environment
  • Define strategies for Application Performance Monitoring Optimization in Prod environment
  • Respond to Incidents and improvise platform based on feedback and measure the reduction of incidents over time.
  • Support deployment of code into multiple lower environments. Supporting current processes with an emphasis on automating everything as soon as possible.
  • Design develop and standardize Monitoring and Alerting mechanism for the supported applications.
  • Take a holistic approach to problem solving by connecting the dots during a production event through the various technology stack that makes up the platform to optimize meantime to recover.
  • Engage in and improve the whole lifecycle of servicesfrom inception and design through deployment operation and refinement.
  • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns.
  • Support services before they go live through activities such as system design consulting capacity planning and launch reviews.
  • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating and lead in DevOps automation and best practices.
  • Maintain services once they are live by measuring and monitoring availability latency and overall system health.
  • Scale systems sustainably through mechanisms like automation and evolving systems by pushing for changes that improve reliability and velocity.
  • Work with a global team spread across tech hubs in multiple geographies and time zones.
  • Ability to share knowledge and explain processes and procedures to others.
  • Share knowledge and mentor junior resources
  • Able to perform on-call duties on a rotational basis.
  • Occasional off hours work required.




Requirements


L8 Positions

NGFT Key skills Must to have

Jenkins
Chef
Bash
Splunk
Dynatrace
Linux
Bit Bucket
Problem Management
ITIL
Remedy
Good To have
Python
AWS * Migrating to AWS


Key Responsibilities
What Youll Do:
Demonstrate and innovate SRE practices by collaborating with stakeholders to implement important SRE principles and objectives and create new practices where applicable.
Partner with product and platform teams to define and track service level objectives (SLOs) and indicators (SLIs).
Monitor and manage system reliability performance ensuring systems meet SLOs.
Communicate reliability concerns and their potential impact with key stakeholders.
Promote the prioritization of reliability throughout the software development life cycle.
Design code test and deliver solutions to automate manual operations.
Participate in on-call rotations provide support for SRE systems and lead or participate in post-mortem incident analysis.
Engage in system design capacity planning and architecture discussions to ensure operational requirements are met.
Share lessons learned and best practices regarding reliability and performance with stakeholders and team members.
Assist in training and mentoring fellow junior SREs to ensure best practices are followed and scaled within the organization.
Pursue continuous improvement opportunities to stay up to date on SRE methods and trends and participate in organizational learning initiatives.
Support governance and ensure compliance with policies by collaborating with security compliance and other teams.
Respond promptly to requests for assistance from technical customers providing engineering support and best-practice guidance.
Adhere to and suggest improvements to standard operating procedures advocate for automation and workflow optimization.

Team Specific Skills
It is not expected that any single candidate would have expertise across all these areas but a Biz Ops engineer will spend time throughout their career with various aspects of the role:

Operational Resiliency Architect:
Support application health performance and capacity.
Assist in system design consulting capacity planning and launch reviews.
Collaborate with development and product teams to establish monitoring and alerting strategies. DevOps/Automation:
Engage in development automation and business process improvement.
Support CI/CD pipelines and promote software into higher environments.
Increase automation and tooling to reduce manual intervention.

ITSM Practices:
Analyze ITSM activities and provide feedback to development teams on operational gaps or resiliency concerns.
Perform root cause analysis of incidents and work with development teams to resolve issues.

Preferred Skills and Experience:
Coding experience in one or more programming languages such as Java Python or Go.
Familiarity with cloud platforms like AWS Azure or GCP.
Experience with Message Queue (MQ) technologies like RabbitMQ Kafka or similar technologies.
Experience with observability tools like Splunk Dynatrace Prometheus or Datadog.
Knowledge of industry-standard CI/CD tools like Git/Bitbucket Jenkins Maven and Artifactory.
Understanding of client-server relationships network concepts and operating system navigation.
Familiarity with Kubernetes and configuration management tools.

General Skills and Competencies:
Ability to work with development operations and product teams.
Strong verbal and written communication skills including the ability to explain technical issues to non-technical audiences.
Critical thinking skills and a proactive approach to problem-solving.
A mindset geared towards continuous improvement and learning.
Ability to work effectively in a team and share best practices



Required Skills:

SRE skills


Required Education:

MBA

Who are we Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries including banking & financial services insurance re...
View more view more

Company Industry

IT Services and IT Consulting

Key Skills

  • Splunk
  • Iis
  • SQL
  • .NET
  • Perl
  • Shell Scripting
  • Weblogic
  • Java
  • Sybase
  • Scripting
  • Oracle
  • Application Support