Job Description: LEAD ADMINISTRATOR L1
Dublin 3 Days a week/ 1 Year
Job Description SRE (Observability & Database Reliability Engineer)
Mandatory Skills: Oracle Database Admin.
Experience: 5-8 Years.
Role Summary:
We are seeking a Site Reliability Engineer (SRE) with strong Database Reliability and Observability expertise to ensure high availability performance and operational visibility of businesscritical platforms. This role has a strong emphasis on dashboards observability Splunk and operational reporting along with handson database operations in complex production environments.
Key Responsibilities:
SRE & Reliability Engineering
- Own endtoend reliability availability and performance of applications and database platforms.
- Define implement and track SLIs SLOs and error budgets.
- Proactively identify reliability risks using metrics trends and capacity analysis.
- Lead production incident management root cause analysis (RCA) and postincident reviews.
- Drive automation to reduce operational toil and improve MTTR.
- Participate in oncall rotations and support 24x7 production environments.
Observability Dashboards & Reporting (Primary Focus)
- Design and maintain endtoend observability covering metrics logs alerts and traces.
- Build and manage realtime operational and executive dashboards for system health availability latency and database performance.
- Strong handson experience with Splunk including log ingestion SPL queries dashboards alerts and reports.
- Correlate application infrastructure and database events to detect issues proactively.
- Create and publish operational reports (daily / weekly / monthly) covering availability incidents SLO compliance performance KPIs and capacity trends.
- Translate technical metrics into actionable insights for engineering and leadership teams.
Database Reliability & Operations
- Support and operate enterprise databases such as PostgreSQL or Oracle (mandatory experience in at least one).
- Monitor and tune database performance including queries indexes and resource utilization.
- Design and support high availability replication backup and disaster recovery solutions.
- Perform database upgrades patching migrations and routine health checks.
- Integrate database monitoring and logs with observability platforms.
Required Skills & Experience
- 10 years of experience in SRE Production Support DevOps or Reliability Engineering roles.
- Strong expertise in observability and monitoring tools with mandatory handson experience in Splunk.
- Proven experience in dashboard building and operational reporting.
- Strong handson experience with PostgreSQL or Oracle databases.
- Solid Linux/Unix administration and troubleshooting skills.
- Experience with incident response RCA and production oncall support.
- Proficiency in scripting using Python Shell or Bash.
- Strong analytical and communication skills.
Preferred Skills
- Experience with cloud platforms such as AWS or Azure.
- Exposure to Kubernetes Docker and containerized environments.
- Experience with Infrastructure as Code tools such as Terraform or Ansible.
- Knowledge of capacity planning forecasting and performance baselining.
- Experience supporting regulated or highavailability systems.
Deliver
No
| Performance Parameter
| Measure
|
1
| Operations of the tower
|
SLA adherence
Knowledge management
CSAT/ Customer Experience
Identification of risk issues and mitigation plans
Knowledge management
|
2
| New projects
|
Timely delivery
Avoid unauthorised changes
No formal escalations
|