Senior IT Analyst Technical Infrastructure (Lead Site Reliability Engineer)

Caterpillar

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 14 hours ago
Vacancies: 1 Vacancy

Job Summary

Career Area:

Technology Digital and Data

Job Description:

Your Work Shapes the World at Caterpillar Inc.

When you join Caterpillar yourejoining a global team who cares not just about the work we do but also about each other. We are the makers problem solvers and future world builders who are creating stronger more sustainable communities. We dontjust talk about progress and innovation here we make it happen with our customers where we work and live. Together we are building a better world so we can all enjoy living in it.

Your Impact Shapes the World at Caterpillar Inc

When you join Caterpillar youre joining a global team who cares not just about the work we do but also about each other. We are the makers problem solvers and future world builders who are creating stronger more sustainable communities. We dont just talk about progress and innovation here we make it happen with our customers where we work and live. Together we are building a better world so we can all enjoy living in it.

Job Summary


We are seeking a skilled Senior IT Analyst Technical Infrastructure (Lead Site Reliability Engineer) to join the Cat Technology GCIO IT Division. Come work on the Caterpillar IT Team as a Senior IT Analyst Technical Infrastructure supporting Caterpillars Autonomy & Autonomous Business Unit.

Come work on the Caterpillar IT Team as a Lead Technology Specialist supporting Caterpillars Autonomy & Autonomous Business Unit. The Autonomy and Automation team is focused on scaling technology solutions in mining construction quarry and aggregates and beyond to support customer safety and productivity goals. A&A is responsible for technology solutions including autonomy semi-autonomy remote control and other technologies. The goal is to address key customer problems including safety productivity labor shortage energy transition and process this role as Lead Site Reliability Engineer you will provide end to end operational ownership of Kubernetes based platform environments deployed on on premises hardware and in AWS. Ensure reliable provisioning configuration monitoring and continuous improvement of clusters and workloads. Perform bug triage and incident response drive observability and automation and partner with platform networking and application teams to meet reliability objectives and business needs.

The preference for this role is to be based out of Whitefield PSN Office -Bangalore KA Or Chennai WTC Centre TN -India

What you will do

  • Provision configure and maintain Kubernetes clusters on onpremises infrastructure (bare metal or virtualized) and in AWS (e.g. EKS).
  • Implement and manage Infrastructure as Code (IaC) and automated workflows for cluster creation upgrades and application deployments (e.g. Terraform Ansible Helm Gitbased pipelines).
  • Establish and operate comprehensive observability (metrics logs traces) including SLI/SLO definitions alerting dashboards and runbooks for platform and key services.
  • Monitor environment health (control plane and node components) capacity performance and cost; perform tuning and rightsizing across onprem and cloud.
  • Execute bug triage: reproduce issues collect diagnostics perform rootcause analysis and coordinate fixes with platform/application teams and vendors.
  • Lead incident response for reliability events (degradations outages) postincident reviews and preventive actions.
  • Administer Kubernetes security controls (RBAC network policies secrets management image signing/scanning) certificate management and compliance control implementation.
  • Manage platform services (container registry ingress/controllers CNI storage classes/CSI service mesh where applicable).
  • Implement backup/restore and disaster recovery strategies for clusters and stateful workloads (e.g. Velero) validate regularly.
  • Maintain and improve CI/CD workflows integrating testing policy checks and progressive delivery for platform and shared services.
  • Create and maintain operational documentation: standards diagrams runbooks automation playbooks and knowledge base articles.
  • Collaborate with networking security and application teams to ensure reliability performance and secure connectivity across data centers and AWS.
  • Drive continuous improvement: reliability engineering practices toil reduction automation and change management processes.

What you will have

  • Kubernetes administration and operations on onpremises and AWS environments (cluster lifecycle upgrades node management workload scheduling).
  • Infrastructure as Code and automation and Gitbased CI/CD.
  • Observability stacks and tooling (e.g. Prometheus Grafana Alertmanager OpenTelemetry; ELK/Lokiclass logging).
  • Linux systems administration (container runtime networking storage.
  • Networking fundamentals applied to Kubernetes (CNI DNS Ingress/Load Balancing TLS/cert management basic L3/L4 concepts).
  • Security best practices (RBAC pod security standards network policies image scanning secrets management).
  • Experience with incident response oncall participation and rootcause analysis in production environments.
  • Strong documentation and communication skills; ability to work effectively with geographically distributed teams.

Top Candidates Will Also Have:

  • Experience with service mesh (e.g. Istio/Linkerd) and advanced container networking (e.g. eBPFbased data paths network policy engines).
  • Familiarity with backup/DR tooling for Kubernetes (e.g. Velero) and stateful workload recovery.
  • Exposure to Operational Technology (OT) or edge/remote site constraints and ruggedized deployments.
  • Experience with configuration compliance policyascode (e.g. Open Policy Agent) and supplychain security.
  • Knowledge of platform registry operations image lifecycle and vulnerability management.
  • This position requires candidate to work a 5-day -a -week schedule in the office

Skills desired:


Technical Excellence: Knowledge of a given technology and various application methods; ability to develop and provide solutions to significant technical challenges.
Level Extensive Experience:
Advises others on the assessment and provision of all technical solutions.
Engages appropriate subject matter resources to effectively resolve technical issues.
Mentors others to enhance their technical competence and its application to achieve more effective technical solutions.
Coaches others in promoting defining analyzing and providing superior technical solutions to business problems.
Provides effective solutions to moderate technical challenges through strong technical competence effectively examining implications of events and issues.
Assumes accountability for personal technical performance and holds others responsible for theirs.

Technology Advising: Knowledge of effective advisory methods and ability to provide valued information and advice to clients regarding products technologies services and solutions for a specific technology domain.
Level Working Knowledge:
Assesses the current technology environment expressed needs and initiatives of client organizations.
Uses an effective consulting method to present technology solutions that resolve stated client business issues.
Advises clients regarding a family of specific products technologies or services in a technology domain.
Demonstrates basic competence and sound business knowledge regarding specific products technologies or services within a domain of technology expertise.
Achieves consulting relationship rating of professional by delivering timely meaningful advice meeting client needs in a narrow set of specific technologies.

Hardware Infrastructure: Knowledge of computer architecture and systems programming; ability to design build and integrate IT hardware into multi-platforms for the organization.
Level Extensive Experience:
Evaluates IT hardware vendors in the market and selects the most suitable products for the organization.
Guides employees on the integration of IT hardware throughout other organization-wide platforms.
Supervises the implementation process of IT hardware ensuring consistency in productivity and overall effectiveness.
Advises others on business standards and practices for IT hardware in order to meet designer requirements.
Evaluates the advantages and disadvantages of an organizations hardware components.
Diagnoses IT hardware problems and recommends dynamic solutions.

Requirements Analysis: Knowledge of tools methods and techniques of requirement analysis; ability to elicit analyze and record required business functionality and non-functionality requirements to ensure the success of a system or software development project.
Level Working Knowledge:
Follows policies practices and standards for determining functional and informational requirements.
Confirms deliverables associated with requirements analysis.
Communicates with customers and users to elicit and gather client requirements.
Participates in the preparation of detailed documentation and requirements.
Utilizes specific organizational methods tools and techniques for requirements analysis.

System Testing: Knowledge of system and software testing; ability to design plan and execute system testing strategies and tactics to ensure the quality of software at all stages of the system life cycle.
Level Extensive Experience:
Verifies the proper flow of transactions across all input output and storage channels or devices.
Evaluates interoperability of new systems with existing systems during the beta testing phase.
Supervises the testing of complex multi-platform and distributed applications.
Designs processes to ensure that the system meets and maintains requirements and expectations.
Coaches end users on the development of test data and test scenarios for system validation.
Manages the execution of test plans including resources strategies schedules processes and tools.

Systems Software Infrastructure: Knowledge of computer architecture and system software interaction; ability to design and build a fundamental architecture of operating systems database management systems communications protocols compilers and other development tools.
Level Working Knowledge:
Reports software connectivity and integration issues.
Demonstrates planned software changes on the local environment.
Administers software migration and contingency plans related to own function.
Analyzes the local software architecture components and products.
Tests key features for the entire software infrastructure environment.

Technical Troubleshooting: Knowledge of technical troubleshooting approaches tools and techniques; ability to anticipate recognize and resolve technical issues on hardware software application or operation.
Level Extensive Experience:
Emphasizes the business impact of failure and the criticality and timing of needed resolution so that problems can be avoided in the future.
Creates trouble reports for all issues found and reviews solutions for completeness and correctness.
Directs the resolution of communications problems in multi-vendor environments.
Resolves a variety of hardware software and communications malfunctions.
Coaches others on advanced diagnostic techniques and tools for unusual or performance-related problems.
Facilitates the distribution of releases reports and correction packages to departments or clients.

Technical Writing/Documentation: Knowledge of technical writing; ability to write technical documents such as manuals reports guidelines or documents on standards processes and applications.
Level Extensive Experience:
Conducts training on alternative documentation delivery mechanisms tools and techniques.
Manages cost items in producing and maintaining documentation.
Designs and implements formal methodologies for producing documentation.
Collaborates with support function managers the product management team and design engineers with writing projects.
Supervises the analysis design and data collation on large documentation initiatives.
Establishes and references best practices for existing and planned tools and delivery vehicles for proper documentation.

What you will get:

  • Work Life Harmony
  • Earned and medical leave.
  • Relocation assistance

Holistic Development

  • Personal and professional development through Caterpillar s employee resource groups across the globe
  • Career developments opportunities with global prospects

Health and Wellness

  • Medical coverage -Medical life and personal accident coverage
  • Employee mental wellness assistance program

Financial Wellness

  • Employee investment plan
  • Pay for performance -Annual incentive Bonus plan.

Additional Information:

Caterpillar is not currently hiring individuals for this position who now or in the future require sponsorship for employment visa status; however as a global company Caterpillar offers many job opportunities outside of the U.S. which can be found through our employment website at Dates:

February 19 2026 - March 4 2026

Caterpillar is an Equal Opportunity Employer. Qualified applicants of any age are encouraged to apply

Not ready to apply Join our Talent Community.


Required Experience:

Senior IC

Career Area:Technology Digital and DataJob Description:Your Work Shapes the World at Caterpillar Inc. When you join Caterpillar yourejoining a global team who cares not just about the work we do but also about each other. We are the makers problem solvers and future world builders who are creating ...
View more view more

Key Skills

  • CSS
  • C++
  • ABAP
  • Bank Reconciliation
  • Information Technology Sales
  • JavaScript

About Company

Company Logo

Caterpillar is the world’s leading manufacturer of construction and mining equipment, diesel and natural gas engines, industrial turbines and diesel-electric locomotives.

View Profile View Profile