Technology Architect

Not Interested
Bookmark
Report This Job

profile Job Location:

Pune - India

profile Monthly Salary: Not Disclosed
Posted on: 16 hours ago
Vacancies: 1 Vacancy

Job Summary

Platform Operations & Technical Ownership

3rd-Level Technical Support & Troubleshooting as key knowledge resource

  • Acts as the primary 3rd-level contact for:
    • Wazuh SIEM
    • PostgreSQL
    • S3 MinIO Object Storage
    • DNS Infrastructure
    • Remote platform access / bastion systems
    • Linux OS (SuSE RHEL Ubuntu)
    • NSXT networking and firewalling
    • SuSE Manager
  • Performs deep root-cause analyses including multi-system debugging.
  • Handles cross-team business-critical incidents requiring broad platform knowledge.

 

Capacity & Performance Management

  • End-to-end responsibility for FCI and Kubernetes cluster capacity management.
  • Continuous assessment of resource utilization trends and scaling requirements.

 

Platform Stability & Reliability

  • Drives improvements in platform stability and deployment reliability.
  • Optimizes operational models and CI/CD processes.
  • Ensures smooth transitions from project delivery to stable operations.

 

2. Platform Engineering & Automation

  • Prepares designs and executes Proofs of Concept (PoCs) for:
    • Ansible / AWX to enable automated deployments and configuration management.
    • Oracle-related technologies including integration and migration scenarios.
  • Develops automation strategies and contributes reusable modules and deployment templates.
  • Defines technical standards for automated operations.

 

3. Security Compliance & Governance

Audit Management & Collaboration with Auditors

  • Designs reviews and explains technical audit controls to internal and external auditors.
  • Coordinates audit activities for both platform and application-related topics.

Security-Driven Engineering

  • Embeds security controls into automated deployment workflows.
  • Creates and maintains compliance policies and technical guardrails.

Wazuh SIEM Responsibility

  • Designs maintains and operates the Wazuh security platform.
  • Develops use cases alerts dashboards and security incident processes.
  • Troubleshoots performance issues agent behavior and platform scalability.

 

4. Collaboration Stakeholder Management & Enablement

  • Coordinates work packages across AO teams development teams and infrastructure units.
  • Works closely with software teams to onboard applications onto the platform.
  • Supports service portfolio development and provides technical input for presales activities.
  • Shares best practices and mentors engineers regarding platform processes and tools.

 

5. Architecture Design & Technology Evaluation

  • Executes PoCs and evaluates new platform components.
  • Defines integration strategies for new technologies in alignment with architecture standards.
  • Creates reference architectures deployment blueprints and operational concepts.
  • Evaluates solutions based on scalability resilience security and cost efficiency.

 

6. Project Involvement

Project: Icinga Replacement

  • Coordinates work and dependencies with classic AO teams.
  • Supports AO teams in deploying and configuring exporters/agents on legacy VMs.
  • Standardizes client-side configurations and data mappings.
  • Implements standardized dashboards for platform service observability.
  • Defines monitoring and alerting for existing components and applications.
  • Performs advanced troubleshooting including:
    • missing or incomplete metrics
    • high scrape latency
    • time-series cardinality challenges
    • Kubernetes monitoring (Prometheus Operator ServiceMonitor/PodMonitor resources)

Project: MIF

  • Analysis of the existing application architecture and its components.
  • Conducts PoC for Cognos.
  • Supports DB2 PostgreSQL migration including data validation performance assessment and migration tooling.

 

7. Technical Skills & Competencies

Linux Platform Engineering & Operations

  • Advanced administration of enterprise-grade Linux systems (RHEL Ubuntu hardened distributions).
  • Deep OS-level troubleshooting (CPU memory IO bottlenecks process diagnostics).
  • Service lifecycle management using systemd including journald log analysis.
  • Kernel parameter tuning optimization and performance diagnostics.
  • Host-level incident investigation and forensic log analysis.
  • Definition and execution of patching and lifecycle management strategies.
  • Filesystem operations and troubleshooting (LVM XFS ext4 mount and IO issues).
  • User and remote access configuration including SSH hardening and bastion host concepts.

 

Kubernetes Platform Operations

  • Operational support for Kubernetes clusters across control plane and worker nodes.
  • Troubleshooting pod failures scheduling issues container crashes and resource exhaustion.
  • Debugging of networking-related problems (CNI layers service routing DNS resolution).
  • Management of persistent volumes storage classes and dynamic provisioning behaviors.
  • Resource forecasting and capacity planning for cluster growth (CPU memory storage).
  • Execution and validation of Kubernetes cluster upgrades.
  • Operational support for multi-cluster and multi-environment setups.
  • Analysis of Kubernetes system logs (kube-api kubelet controller-manager).
  • Maintenance and enhancement of the Kubernetes stack including version upgrades and feature adoption.

 

Observability & Security Platform (Wazuh)

  • Design deployment and operational management of the Wazuh SIEM platform.
  • Full lifecycle management of Wazuh agents including policy enforcement and tuning.
  • Troubleshooting log ingestion pipelines decoders enrichment rules and alert logic.
  • Integration of Wazuh with platform services and infrastructure.
  • Analysis of security alerts and support of incident investigations.
  • Performance optimization of SIEM components to ensure reliable event processing.
  • Maintenance of compliance dashboards and generation of audit-relevant evidence.
  • Continuous improvement of Wazuh stack via upgrades new features and configuration optimization.

 

Observability & Monitoring Platform (Prometheus / Grafana / Alerting)

  • Deployment configuration and operations of Prometheus-based monitoring stacks (standalone and Kubernetes-integrated).
  • Administration of scraping configurations service discovery rules and target troubleshooting.
  • Design and maintenance of recording rules and alert rules for platform components.
  • Alert noise reduction through tuning and improved signal quality.
  • Integration and troubleshooting of exporters (node database Kubernetes etc.).
  • Resolution of metric gaps scrape latency issues and cardinality-related performance problems.
  • Capacity planning for Prometheus TSDB retention storage requirements and query performance.
  • Development and lifecycle management of Grafana dashboards for platform and infrastructure services.
  • Troubleshooting dashboard performance data source connectivity and visualization accuracy.
  • Implementation of standardized dashboard templates across platform services.
  • Integration of alerting workflows into incident management systems.
  • Definition of platform SLIs/SLOs and reliability indicators.
  • Correlation of metrics and logs (including Wazuh and OS logs) for root-cause analysis.
  • Support and lifecycle management of Kubernetes monitoring components (Prometheus Operator ServiceMonitor/PodMonitor).
  • Validation of monitoring coverage for newly onboarded components and applications.

 

Database Platform Operations (PostgreSQL / Oracle PoC)

  • Operational management of PostgreSQL clusters across environments.
  • Monitoring key metrics (connections locks long-running queries replication lag).
  • Backup restore and disaster recovery validation.
  • Growth and capacity planning for compute and storage layers.
  • Support for database failover scenarios and resilience testing.
  • Preparation and execution of Oracle-related proofs of concept.
  • Evaluation of database deployment models (VM-based containerized or managed).
  • Maintenance and enhancement of the database stack including upgrades and feature adoption.

 

Object Storage Platform (MinIO / S3 APIs)

  • Deployment and operations of MinIO-based object storage clusters.
  • Troubleshooting of S3 API access authentication and compatibility issues.
  • Monitoring capacity usage planning storage expansions and scaling clusters.
  • Configuration of lifecycle policies data retention and archival strategies.
  • Integration of MinIO with platform workloads CI/CD and backup systems.
  • Performance analysis and troubleshooting of replication and erasure coding.

 

Networking & Firewall Operations (VMware NSX-T)

  • Operational support of software-defined networking environments using NSX-T.
  • Troubleshooting of routing issues overlay networking and cross-segment connectivity.
  • Management of distributed firewall policies and micro-segmentation rules.
  • Support for load balancers service exposure and virtual networking components.
  • Administration of DNS infrastructure (zones records service discovery).
  • Throughput latency and capacity analysis for critical network paths.

 

Remote Platform Access & Identity Integration

  • Design and support of secure remote access solutions using Apache Guacamole and Entra ID.
  • Troubleshooting identity flows authentication chains and access control policies.
  • Integration with enterprise identity providers using OIDC and directory services.
  • Implementation of secure access patterns for administrators and application teams.

 

Automation & Platform Engineering (Ansible / AWX)

  • Preparation and execution of Ansible and AWX proof-of-concepts.
  • Development of automation playbooks for platform configuration provisioning and lifecycle tasks.
  • Integration of configuration management workflows into operational routines.
  • Evaluation and optimization of automated operational processes.
  • Automated deployment validation and configuration compliance checks.

 

Incident Management & Reliability Engineering

  • 3rd-level escalation point for complex incidents across infrastructure and platform services.
  • Root cause analysis using logs metrics and system-level diagnostics.
  • Coordination of incident response across multiple technical domains.
  • Identification and remediation of recurring incident patterns.
  • Implementation of platform stabilization and hardening measures.
  • Transition of engineered solutions into long-term operational models.

 

Security Compliance & Audit Support

  • Design and discussion of audit controls with internal and external auditors.
  • Preparation of audit evidence for platform and application compliance.
  • Integration of security controls and guardrails into automated deployment workflows.
  • Maintenance of compliance-sensitive configuration baselines.
  • Support for remediation of audit findings and compliance gaps.

 

Architecture & Technology Evaluation

  • Execution of proofs of concept for emerging technologies and platform components.
  • Assessment of scalability resilience operational complexity and security posture.
  • Creation of technical blueprints and reference architectures.
  • Definition of integration strategies for new services within existing platform ecosystems.
  • Evaluation of cost efficiency maintainability and operational impact of architectural decisions.

 

Collaboration & Platform Enablement

  • Coordination of cross-team technical work packages across operations and engineering units.
  • Support for application onboarding to shared platform services.
  • Documentation of platform standards operational procedures and best practices.
  • Contribution to presales discussions and service portfolio evolution.

Delivery of knowledge transfer and enablement sessions for operations and development teams


Additional Information :

Please Note: Fraudulent job postings/job scams are increasingly common. Beware of misleading advertisements and fraudulent communication issuing offer letters on behalf of T-Systems in exchange for a fee. Please look for an authentic T-Systems email id - .

Stay vigilant. Protect yourself from recruitment fraud!

To know more please visit : Fraud Alert


Remote Work :

No


Employment Type :

Full-time

Platform Operations & Technical Ownership3rd-Level Technical Support & Troubleshooting as key knowledge resourceActs as the primary 3rd-level contact for:Wazuh SIEMPostgreSQLS3 MinIO Object StorageDNS InfrastructureRemote platform access / bastion systemsLinux OS (SuSE RHEL Ubuntu)NSXT networking an...
View more view more

About Company

Company Logo

T-Systems Information and Communication Technology India Private Limited (T-Systems ICT India Pvt. Ltd.) is a proud recipient of the prestigious Great Place To Work® Certification™. As a wholly owned subsidiary of T-Systems International GmbH, T-Systems India operates across Pune, Ban ... View more

View Profile View Profile