We back our colleagues and their loved ones with benefits and programs that support their holistic wellbeing. That means we prioritize their physical financial and mental health through each stage of life. Benefits include:
- Competitive base salaries
- Bonus incentives
- Support for financialwellbeing and retirement
- Comprehensive medical dental vision life insurance and disability benefits (depending on location)
- Flexible working model with hybrid onsite or virtual arrangements depending on role and business need
- Generous paid parental leave policies (depending on your location)
- Free access to global onsite wellness centers staffed with nurses and doctors (depending on location)
- Free and confidential counseling support through our Healthy Minds program
- Career development and training opportunities
Offer of employment with American Express is conditioned upon the successful completion of a background verification check subject to applicable laws and regulations.
At American Express our culture is built on a 175year history of innovation shared values and leadership behaviors and an unwavering commitment to back our customers communities and part of Team Amex youll experience this powerful backing with comprehensive support for your holistic wellbeing and many opportunities to learn new skills develop as a leader and grow your career.
Here your voice and ideas matter your work makes an impact and together you will help us define the future of American Express.
We are seeking a versatile and highly skilled Full Stack Infrastructure Engineer with expertise in Compute Storage Network and Cloud technologies. The ideal candidate will design implement and manage robust infrastructure solutions ensuring reliability scalability and performance.
How will you make an impact in this role
- Ensure the reliability availability and performance of the entire infrastructure stack including compute storage network and cloud components.
- Lead incident response efforts across the infrastructure stack coordinating with Application Support SRE and Engineering teams to minimize MTTD and MTTR.
- Perform root cause analysis for infrastructure related incidents and implement corrective actions.
- Develop and maintain automation tools for managing infrastructure resources.
- Collaborate with Engineering teams to plan and execute system upgrades and maintenance.
- Conduct capacity planning and resource management for all infrastructure components.
- Participate in oncall rotations to provide 24x7 support for all critical infrastructure issues.
- Design and implement disaster recovery plans and business continuity strategies.
- Implement best practices for monitoring logging and alerting across the infrastructure.
- Foster a culture of continuous improvement and operational excellence.
- Analyze complex infrastructure problems design scalable and resilient solutions and lead the implementation of these solutions.
- Collaborate with architects and other engineers to design and enhance the architecture of infrastructure systems ensuring alignment with business needs and technology standards.
Minimum qualifications
- Proven experience managing and optimizing a diverse infrastructure stack.
- Extensive knowledge of cloud platforms (AWS Azure GCP) and infrastructure as code (Terraform CloudFormation).
- Familiarity of service mesh technologies (Istio Linkerd).
- Solid understanding of virtualization (VMware HyperV) and containerization (Docker Kubernetes) and orchestration.
- Understanding of storage solutions (SAN NAS cloud storage) and backup systems.
- Strong understanding of network protocols routing switching and firewalls.
- Experience with load balancers (F5 HAProxy Nginx) and network monitoring tools.
- Experience in DNS management and troubleshooting.
- Experience in network security best practices.
- Proficiency in monitoring and observability tools (Prometheus Grafana Splunk).
- Proficiency in at least one scripting language (Python Bash) for automation.
- Experience with CI/CD pipeline management and DevOps practices.
- Strong understanding of disaster recovery and business continuity planning.
- Experience with performance tuning and capacity planning.
- Understanding of chaos engineering principles and practices.
- Skills in cost optimization for cloud infrastructure.
Preferred qualifications
- Experience in using cloud native monitoring tools like AWS CloudWatch Azure Monitor and Google Cloud Operations Suite.
- Experience with packet capture tools like Wireshark for troubleshooting network issues.
- Experience in using traceroute utilities and performance analysis tools like perf for identifying and resolving bottlenecks.
- Familiarity with tools such as ipconfig/ifconfig for viewing network configurations flushing DNS and diagnosing network issues.
- Experience with SNMPbased tools for network device monitoring and performance management.
- Experience in using NetFlow for network traffic analysis.
- Experience with tools like iostat vmstat and dstat for monitoring storage and system performance.
- Experience in tools like df du lsblk and fdisk for managing and troubleshooting file systems and disk partitions.
- Familiarity with tools like Prometheus and Grafana for monitoring and observability.
Required Experience:
Senior IC