Role Lead Infrastructure Engineer
Contract
Location Charlotte NC or Phoenix AZ
In this role you will:
-
- Lead complex initiatives to develop infrastructure to provide solutions for business applications
- Architecting products to effectively utilize infrastructure platforms in a scalable reliable manner
- Debugging reliability and scalability issues across all stack layers including the products built using our infrastructure platforms
- Make monitoring and alerting alerts on symptoms and not on outages
- Have an enthusiastic go-for-it attitude. When you see something broken you cant help but fix it
- Have a desire to solve everyday challenges facing software engineers and automate their toil away
- Have an excellent ability to manage multiple tasks and expectations at once
- Participate in various projects intended to continually improve or upgrade the infrastructure
- Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals
- Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future
- Design build deploy and maintain infrastructure solutions through collaborative efforts with the team and third-party vendors
- Design code test debug and document programs using Agile development practices
- Make decisions in technical designs implementation plans and identify project risks and resource requirements
- Direct the daily risk and control flow of operations focusing on policies procedures and work standards to ensure success
- Recommend courses of action to maintain cost effectiveness and achieve results
- Collaborate and consult with peers colleagues and managers to resolve issues and achieve goals
- Interact with customer and vendor
- Lead small to medium cross-organizational transformational efforts in Platform space
- Provide expertise in Kafka brokers zookeepers Kafka connect schema registry KSQL Rest proxy and Kafka Control center
- Use automation tools like provisioning using BladeLogic Ansible Chef Jenkins and GitLab.
- Deliver results in less defined & constantly changing environments
- Communicate with broad and diverse audience including technology and business leaders; ability to simplify complex messages for consumption
- As an application support specialist position is responsible for leading support functions and driving the execution and maturity of multiple application support services including incident triage root cause analysis change evaluation-execution-validation deployment management and risk & vulnerability management. Works closely with development and infrastructure partners like middleware NAS database network etc.
- Partner to influence and support innovation & continued drive towards automation touch less operational sustainment as a design/architecture construct working with CIO technology partners/managers
- Operational sustainment and reduce risks in the eco-system by aggressively pursuing safety and soundness type of actions not limited to vulnerability patching end of life and resiliency
- Hands on engagement on all Production environment RunOps & DevOps support activities needed for the platform and applications
- Drive operational management via Incident response communication and tracking along with root cause identification and closure.
- Manage and coordinate Production change requests and release management.
- Provides operational continuity through the development management measurement analysis and reporting of key service-level metrics as required by management
- Sustained focus on driving continuous services improvements and innovation to design implement and ensure SLAs KPIs and OLAs for the critical business processes applications and partner interfaces
- Regular presentation of Production performance and incident root cause and preventative actions and trend analysis to technical and business Management teams.
- Maintain and update all Production related documentation (e.g. game plans run books procedures processes).
- Ensure effective Production systems monitoring alarming and notification response/maintenance.
- Provides general oversight and direction to virtual teams.
Required Qualifications US:
-
- 5 years of Technology Infrastructure Engineering and Solutions experience or equivalent demonstrated through one or a combination of the following: work experience training military experience education
- 5 years of experience troubleshooting environments across the entire architecture (i.e. applications to infrastructure)
- 3 years of hands-on Linux administration experience
Desired Qualifications:
-
- 1 years of experience in Artificial Intelligence Natural Language Processing Machine Learning Distributed Computing Chatbot and Virtual Assistant
- 1 Years of experience supporting and monitoring Apache Flink solutions for real-time data processing
- 1 Years supporting and monitoring service load balancing architectures including F5 VMware AVI
- 1 years of experience with Big Data or Hadoop tools such as Spark Hive Kafka and Map
-
- Cloud Architect or Engineer Certification (i.e. GCP Azure AWS etc.)
- A BS/BA degree or higher in information technology
- Competent working in one or more environments highly integrated with an operating system.
- Have experience with VMWare Pivotal Cloud Foundry (PCF) and Tanzu Application Service (TAS) technologies
- Have experience with Docker OpenShift Container Platform (OCP) Kubernetes Terraform or similar IaC technologies
- Have experience with MongoDB Redis Kafka Postgres or similar data technologies
- Experience implementing and administering/managing technical solutions in major large-scale system implementations.
- High critical thinking skills to evaluate alternatives and present solutions that are consistent with business objectives and strategy.
- Ability to lead projects/initiatives with high risk and complexity
- Ability to manage to production goals/SLAs/SLOs/KPIs deadlines and operational metrics
- Ability to manage tasks independently and take ownership of responsibilities
- Ability to learn from mistakes and apply constructive feedback to improve performance
- Ability to adapt to a rapidly changing environment.
- Proven leadership abilities including effective knowledge sharing conflict resolution facilitation of open discussions fairness and displaying appropriate levels of assertiveness.
- Ability to communicate highly complex technical information clearly and articulately for all levels and audiences.
- Willingness to learn new technologies/tool and train your peers.
- Ability to identify root-cause issues articulate improvement opportunities and design approaches/programs/products to improve overall quality assurance
- Strong knowledge of monitoring tools & their application (Glassbox AppDynamics Splunk BigPanda AIOps etc.)
- Understanding of system performance and how load drives utilization and customer experiences.
- Experience with Business Continuity Planning and Disaster Recovery Application Resiliency/Highly Available Architecture Site Resiliency
- Knowledge and understanding of Conversational Artificial Intelligence Machine Learning Deep Learning Linear Regression Models