Title: Cloud Operations Lead
Locations: Rockville MD Princeton NJ NYC NY
Number of position: 03
Job Type: Full-Time FTE
Primary focus is on:
Control Tower Organization policies and management
Multi-Account deployment and management
AWS Backups and SSM Patching process - in detail.
AMI deployments & pushing config to multiple accounts
AWS EC2 ECS EKS RDS S3 Sage Maker CloudFront Lambda etc...
AWS S3 SFTP and Site externalization methods.
IaC - Terraform Cloud Formation templates and Python.
IAM polices and access management and restrictions.
Responsibilities:
Oversee the management and maintenance of cloud infrastructure ensuring high availability and reliability. Act as the primary point of contact for all Cloud infrastructure related issues and escalations.
Ensure cloud resources are optimally configured and managed to meet performance and cost objectives.
Implement and maintain monitoring solutions to track the health and performance of cloud infrastructure.
Drive the major incidents and potential incidents end to end with periodic updates to client stake holders for approvals/recommendations.
Ensure due diligence and impact analysis for all the changes that get implemented in the cloud platforms.
Lead and mentor a team of cloud engineers and administrators fostering a collaborative and high-performing work environment.
Provide guidance and support to team members facilitating their professional development and growth.
Coordinate and manage the teams daily activities ensuring alignment with organizational goals and priorities.
Lead the response to cloud-related incidents ensuring timely resolution and minimal impact on business operations.
Develop and implement incident management processes and procedures.
Perform root cause analysis and implement preventive measures to avoid recurrence of issues.
Identify opportunities to automate repetitive tasks and processes to improve efficiency and reduce operational overhead.
Develop and implement automation scripts and tools leveraging Infrastructure as Code (IaC) practices.
Continuously evaluate and improve cloud operations processes and procedures.
Ensure cloud infrastructure adheres to security policies standards and best practices.
Implement and maintain security controls to protect cloud resources and data.
Ensure compliance with regulatory requirements and industry standards (e.g. GDPR HIPAA).
Monitor and analyze cloud resource usage ensuring efficient utilization and avoiding over-provisioning.
Conduct capacity planning to support future growth and demand.
Implement cost management strategies to optimize cloud spending.
Develop and implement disaster recovery and business continuity plans for cloud infrastructure.
Ensure regular testing and validation of disaster recovery procedures.
Ensure cloud infrastructure is resilient and can recover quickly from failures or disruptions.
Work closely with other IT teams business units and stakeholders to understand requirements and deliver cloud solutions that meet their needs.
Collaborate with vendors and service providers to evaluate and integrate new cloud technologies and services.
Communicate effectively with stakeholders providing regular updates on cloud operations and performance.
Maintain comprehensive documentation of cloud infrastructure configurations processes and procedures.
Generate regular reports on cloud performance incidents and operational metrics.
Ensure documentation is up-to-date and accessible to relevant stakeholders.
Here ae some of the detailed responsibilities primarily from AWS environment followed by Azure and OCI environments.
Qualifications :
Bachelors degree in computer science Information Technology Electrical Engineering or a related field. Advanced degrees or relevant professional training are a plus.
Good experience in System administration and good experience in Cloud operations and leadership/senior technical role.
Proficiency in AWS cloud platforms. Strong working experience in Azure OCI clouds platforms.
Strong understanding of cloud architecture services and best practices.
Experience with cloud management and monitoring tools.
Proficiency in scripting and automation (e.g. PowerShell Python Terraform Ansible Playbooks/Ansible Tower Cloud Formation Puppet Chef).
Strong knowledge of cloud security principles and practices.
Proficiency in Windows/Linux Server administration and management.
Proficiency and working experience in VMWare/AD and Azure AD SSO platforms.
Strong networking skills - DNS DHCP PKI and LAN/WAN protocol understanding.
Effective communication and interpersonal skills with the ability to interact with stakeholders at all levels.
Experience in vendor management and contract negotiations.
A proactive approach to continuous improvement and innovation in data center operations.
Additional Information :
Remote Work :
No
Employment Type :
Full-time
Title: Cloud Operations LeadLocations: Rockville MD Princeton NJ NYC NYNumber of position: 03Job Type: Full-Time FTEPrimary focus is on:Control Tower Organization policies and managementMulti-Account deployment and managementAWS Backups and SSM Patching process - in detail.AMI deployments & pushin...
Title: Cloud Operations Lead
Locations: Rockville MD Princeton NJ NYC NY
Number of position: 03
Job Type: Full-Time FTE
Primary focus is on:
Control Tower Organization policies and management
Multi-Account deployment and management
AWS Backups and SSM Patching process - in detail.
AMI deployments & pushing config to multiple accounts
AWS EC2 ECS EKS RDS S3 Sage Maker CloudFront Lambda etc...
AWS S3 SFTP and Site externalization methods.
IaC - Terraform Cloud Formation templates and Python.
IAM polices and access management and restrictions.
Responsibilities:
Oversee the management and maintenance of cloud infrastructure ensuring high availability and reliability. Act as the primary point of contact for all Cloud infrastructure related issues and escalations.
Ensure cloud resources are optimally configured and managed to meet performance and cost objectives.
Implement and maintain monitoring solutions to track the health and performance of cloud infrastructure.
Drive the major incidents and potential incidents end to end with periodic updates to client stake holders for approvals/recommendations.
Ensure due diligence and impact analysis for all the changes that get implemented in the cloud platforms.
Lead and mentor a team of cloud engineers and administrators fostering a collaborative and high-performing work environment.
Provide guidance and support to team members facilitating their professional development and growth.
Coordinate and manage the teams daily activities ensuring alignment with organizational goals and priorities.
Lead the response to cloud-related incidents ensuring timely resolution and minimal impact on business operations.
Develop and implement incident management processes and procedures.
Perform root cause analysis and implement preventive measures to avoid recurrence of issues.
Identify opportunities to automate repetitive tasks and processes to improve efficiency and reduce operational overhead.
Develop and implement automation scripts and tools leveraging Infrastructure as Code (IaC) practices.
Continuously evaluate and improve cloud operations processes and procedures.
Ensure cloud infrastructure adheres to security policies standards and best practices.
Implement and maintain security controls to protect cloud resources and data.
Ensure compliance with regulatory requirements and industry standards (e.g. GDPR HIPAA).
Monitor and analyze cloud resource usage ensuring efficient utilization and avoiding over-provisioning.
Conduct capacity planning to support future growth and demand.
Implement cost management strategies to optimize cloud spending.
Develop and implement disaster recovery and business continuity plans for cloud infrastructure.
Ensure regular testing and validation of disaster recovery procedures.
Ensure cloud infrastructure is resilient and can recover quickly from failures or disruptions.
Work closely with other IT teams business units and stakeholders to understand requirements and deliver cloud solutions that meet their needs.
Collaborate with vendors and service providers to evaluate and integrate new cloud technologies and services.
Communicate effectively with stakeholders providing regular updates on cloud operations and performance.
Maintain comprehensive documentation of cloud infrastructure configurations processes and procedures.
Generate regular reports on cloud performance incidents and operational metrics.
Ensure documentation is up-to-date and accessible to relevant stakeholders.
Here ae some of the detailed responsibilities primarily from AWS environment followed by Azure and OCI environments.
Qualifications :
Bachelors degree in computer science Information Technology Electrical Engineering or a related field. Advanced degrees or relevant professional training are a plus.
Good experience in System administration and good experience in Cloud operations and leadership/senior technical role.
Proficiency in AWS cloud platforms. Strong working experience in Azure OCI clouds platforms.
Strong understanding of cloud architecture services and best practices.
Experience with cloud management and monitoring tools.
Proficiency in scripting and automation (e.g. PowerShell Python Terraform Ansible Playbooks/Ansible Tower Cloud Formation Puppet Chef).
Strong knowledge of cloud security principles and practices.
Proficiency in Windows/Linux Server administration and management.
Proficiency and working experience in VMWare/AD and Azure AD SSO platforms.
Strong networking skills - DNS DHCP PKI and LAN/WAN protocol understanding.
Effective communication and interpersonal skills with the ability to interact with stakeholders at all levels.
Experience in vendor management and contract negotiations.
A proactive approach to continuous improvement and innovation in data center operations.
Additional Information :
Remote Work :
No
Employment Type :
Full-time
View more
View less