Lead complex initiatives to develop infrastructure to provide solutions for business applications
Architecting products to effectively utilize infrastructure platforms in a scalable reliable manner
Debugging reliability and scalability issues across all stack layers including the products built using our infrastructure platforms
Make monitoring and alerting alerts on symptoms and not on outages
Have an enthusiastic go-for-it attitude. When you see something broken you cant help but fix it
Have a desire to solve everyday challenges facing software engineers and automate their toil away
Have an excellent ability to manage multiple tasks and expectations at once
Participate in various projects intended to continually improve or upgrade the infrastructure
Evaluate internal and external software solutions which could be leveraged to meet target state architecture goals
Review and analyze high impact outages to ensure the proper processes and procedures are in place to avoid problems in the future
Design build deploy and maintain infrastructure solutions through collaborative efforts with the team and third-party vendors
Design code test debug and document programs using Agile development practices
Make decisions in technical designs implementation plans and identify project risks and resource requirements
Direct the daily risk and control flow of operations focusing on policies procedures and work standards to ensure success
Recommend courses of action to maintain cost effectiveness and achieve results
Collaborate and consult with peers colleagues and managers to resolve issues and achieve goals
Interact with customer and vendor
Lead small to medium cross-organizational transformational efforts in Platform space
Provide expertise in Kafka brokers zookeepers Kafka connect schema registry KSQL Rest proxy and Kafka Control center
Use automation tools like provisioning using BladeLogic Ansible Chef Jenkins and GitLab.
Deliver results in less defined & constantly changing environments
Communicate with broad and diverse audience including technology and business leaders; ability to simplify complex messages for consumption
As an application support specialist position is responsible for leading support functions and driving the execution and maturity of multiple application support services including incident triage root cause analysis change evaluation-execution-validation deployment management and risk & vulnerability management. Works closely with development and infrastructure partners like middleware NAS database network etc.
Partner to influence and support innovation & continued drive towards automation touch less operational sustainment as a design/architecture construct working with CIO technology partners/managers
Operational sustainment and reduce risks in the eco-system by aggressively pursuing safety and soundness type of actions not limited to vulnerability patching end of life and resiliency
Hands on engagement on all Production environment RunOps & DevOps support activities needed for the platform and applications
Drive operational management via Incident response communication and tracking along with root cause identification and closure.
Manage and coordinate Production change requests and release management.
Provides operational continuity through the development management measurement analysis and reporting of key service-level metrics as required by management
Sustained focus on driving continuous services improvements and innovation to design implement and ensure SLAs KPIs and OLAs for the critical business processes applications and partner interfaces
Regular presentation of Production performance and incident root cause and preventative actions and trend analysis to technical and business Management teams.
Maintain and update all Production related documentation (e.g. game plans run books procedures processes).
Ensure effective Production systems monitoring alarming and notification response/maintenance.
Provides general oversight and direction to virtual teams.
Required Qualifications US:
5 years of Technology Infrastructure Engineering and Solutions experience or equivalent demonstrated through one or a combination of the following: work experience training military experience education
5 years of experience troubleshooting environments across the entire architecture (i.e. applications to infrastructure)
3 years of hands-on Linux administration experience
Desired Qualifications:
1 years of experience in Artificial Intelligence Natural Language Processing Machine Learning Distributed Computing Chatbot and Virtual Assistant
1 Years of experience supporting and monitoring Apache Flink solutions for real-time data processing
1 Years supporting and monitoring service load balancing architectures including F5 VMware AVI
1 years of experience with Big Data or Hadoop tools such as Spark Hive Kafka and Map
Cloud Architect or Engineer Certification (i.e. GCP Azure AWS etc.)
A BS/BA degree or higher in information technology
Competent working in one or more environments highly integrated with an operating system.
Have experience with VMWare Pivotal Cloud Foundry (PCF) and Tanzu Application Service (TAS) technologies
Have experience with Docker OpenShift Container Platform (OCP) Kubernetes Terraform or similar IaC technologies
Have experience with MongoDB Redis Kafka Postgres or similar data technologies
Experience implementing and administering/managing technical solutions in major large-scale system implementations.
High critical thinking skills to evaluate alternatives and present solutions that are consistent with business objectives and strategy.
Ability to lead projects/initiatives with high risk and complexity
Ability to manage to production goals/SLAs/SLOs/KPIs deadlines and operational metrics
Ability to manage tasks independently and take ownership of responsibilities
Ability to learn from mistakes and apply constructive feedback to improve performance
Ability to adapt to a rapidly changing environment.
Proven leadership abilities including effective knowledge sharing conflict resolution facilitation of open discussions fairness and displaying appropriate levels of assertiveness.
Ability to communicate highly complex technical information clearly and articulately for all levels and audiences.
Willingness to learn new technologies/tool and train your peers.
Ability to identify root-cause issues articulate improvement opportunities and design approaches/programs/products to improve overall quality assurance
Strong knowledge of monitoring tools & their application (Glassbox AppDynamics Splunk BigPanda AIOps etc.)
Understanding of system performance and how load drives utilization and customer experiences.
Experience with Business Continuity Planning and Disaster Recovery Application Resiliency/Highly Available Architecture Site Resiliency
Knowledge and understanding of Conversational Artificial Intelligence Machine Learning Deep Learning Linear Regression Models
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.