Site Reliability Engineer- Product
Irvine, CA - USA
Job Summary
MatchPoint Solutions is a fast-growing young energetic global IT-Engineering services company with clients across the US. We provide technology solutions to various clients like Uber Robinhood Netflix Airbnb Google Sephora and more! More recently we have expanded to working internationally in Canada China Ireland UK Brazil and India. Through our culture of innovation we inspire build and deliver business results from idea to outcome. We keep our clients on the cutting edge of the latest technologies and provide solutions by using industry-specific best practices and expertise.
We are excited to be continuously expanding our team. If you are interested in this position please send over your updated resume. We look forward to hearing from you!
Title : Site Reliability Engineer- Product
- Develop and maintain CI/CD and GitOps workflows (Bitbucket Pipelines ArgoCD) as shared platform capabilities.
- Automate infrastructure application configuration and database deployments (Ansible Liquibase).
- Build automated health checks self-healing and zero-downtime deployment mechanisms for platform services.
- Provide technical guidance and best practices to product engineering teams using the platform.
- Reliability Security & Observability
- Design and operate platform-wide monitoring and observability solutions using Prometheus Grafana and OpenTelemetry.
- Build dashboards and standardized alerting for all platform and product services.
- Enforce platform security standards including container scanning secrets management RBAC and secure network policies.
- Ensure platform compliance with HIPAA SOC 2 and ISO 27001 through automated controls and secure communication.
- Reduce operational toil through automation cost visibility and continuous reliability improvements.
- Experience with event streaming platforms and data services (Kafka/MSK Flink Debezium).
- Exposure to MLOps or AI platform infrastructure (MLflow Kubeflow GenAI/RAG workloads).
- Familiarity with FinOps and cost optimization for shared platforms.
- Bachelors degree in Computer Science or related field and 4 years of relevant experience (or equivalent).
- Experience operating production-grade cloud infrastructure and Kubernetes platforms.
- Strong Infrastructure-as-Code CI/CD GitOps and observability experience.
- Proven troubleshooting skills across distributed systems and platform reliability issues.
- Proficiency in at least one scripting or programming language (Python Go Bash).
- Experience with on-call rotations incident response and root cause analysis.
- Experience building internal platforms or shared developer services.
- Multi-cloud experience and service mesh familiarity (Istio).
- Experience in regulated environments and cloud-native security practices.
This policy applies to all terms and conditions of employment including recruiting hiring placement promotion termination layoff recall transfer leaves of absence compensation and training.