Technology Architect
Job Summary
Platform Operations & Technical Ownership
3rd-Level Technical Support & Troubleshooting as key knowledge resource
- Acts as the primary 3rd-level contact for:
- Wazuh SIEM
- PostgreSQL
- S3 MinIO Object Storage
- DNS Infrastructure
- Remote platform access / bastion systems
- Linux OS (SuSE RHEL Ubuntu)
- NSXT networking and firewalling
- SuSE Manager
- Performs deep root-cause analyses including multi-system debugging.
- Handles cross-team business-critical incidents requiring broad platform knowledge.
Capacity & Performance Management
- End-to-end responsibility for FCI and Kubernetes cluster capacity management.
- Continuous assessment of resource utilization trends and scaling requirements.
Platform Stability & Reliability
- Drives improvements in platform stability and deployment reliability.
- Optimizes operational models and CI/CD processes.
- Ensures smooth transitions from project delivery to stable operations.
2. Platform Engineering & Automation
- Prepares designs and executes Proofs of Concept (PoCs) for:
- Ansible / AWX to enable automated deployments and configuration management.
- Oracle-related technologies including integration and migration scenarios.
- Develops automation strategies and contributes reusable modules and deployment templates.
- Defines technical standards for automated operations.
3. Security Compliance & Governance
Audit Management & Collaboration with Auditors
- Designs reviews and explains technical audit controls to internal and external auditors.
- Coordinates audit activities for both platform and application-related topics.
Security-Driven Engineering
- Embeds security controls into automated deployment workflows.
- Creates and maintains compliance policies and technical guardrails.
Wazuh SIEM Responsibility
- Designs maintains and operates the Wazuh security platform.
- Develops use cases alerts dashboards and security incident processes.
- Troubleshoots performance issues agent behavior and platform scalability.
4. Collaboration Stakeholder Management & Enablement
- Coordinates work packages across AO teams development teams and infrastructure units.
- Works closely with software teams to onboard applications onto the platform.
- Supports service portfolio development and provides technical input for presales activities.
- Shares best practices and mentors engineers regarding platform processes and tools.
5. Architecture Design & Technology Evaluation
- Executes PoCs and evaluates new platform components.
- Defines integration strategies for new technologies in alignment with architecture standards.
- Creates reference architectures deployment blueprints and operational concepts.
- Evaluates solutions based on scalability resilience security and cost efficiency.
6. Project Involvement
Project: Icinga Replacement
- Coordinates work and dependencies with classic AO teams.
- Supports AO teams in deploying and configuring exporters/agents on legacy VMs.
- Standardizes client-side configurations and data mappings.
- Implements standardized dashboards for platform service observability.
- Defines monitoring and alerting for existing components and applications.
- Performs advanced troubleshooting including:
- missing or incomplete metrics
- high scrape latency
- time-series cardinality challenges
- Kubernetes monitoring (Prometheus Operator ServiceMonitor/PodMonitor resources)
Project: MIF
- Analysis of the existing application architecture and its components.
- Conducts PoC for Cognos.
- Supports DB2 PostgreSQL migration including data validation performance assessment and migration tooling.
7. Technical Skills & Competencies
Linux Platform Engineering & Operations
- Advanced administration of enterprise-grade Linux systems (RHEL Ubuntu hardened distributions).
- Deep OS-level troubleshooting (CPU memory IO bottlenecks process diagnostics).
- Service lifecycle management using systemd including journald log analysis.
- Kernel parameter tuning optimization and performance diagnostics.
- Host-level incident investigation and forensic log analysis.
- Definition and execution of patching and lifecycle management strategies.
- Filesystem operations and troubleshooting (LVM XFS ext4 mount and IO issues).
- User and remote access configuration including SSH hardening and bastion host concepts.
Kubernetes Platform Operations
- Operational support for Kubernetes clusters across control plane and worker nodes.
- Troubleshooting pod failures scheduling issues container crashes and resource exhaustion.
- Debugging of networking-related problems (CNI layers service routing DNS resolution).
- Management of persistent volumes storage classes and dynamic provisioning behaviors.
- Resource forecasting and capacity planning for cluster growth (CPU memory storage).
- Execution and validation of Kubernetes cluster upgrades.
- Operational support for multi-cluster and multi-environment setups.
- Analysis of Kubernetes system logs (kube-api kubelet controller-manager).
- Maintenance and enhancement of the Kubernetes stack including version upgrades and feature adoption.
Observability & Security Platform (Wazuh)
- Design deployment and operational management of the Wazuh SIEM platform.
- Full lifecycle management of Wazuh agents including policy enforcement and tuning.
- Troubleshooting log ingestion pipelines decoders enrichment rules and alert logic.
- Integration of Wazuh with platform services and infrastructure.
- Analysis of security alerts and support of incident investigations.
- Performance optimization of SIEM components to ensure reliable event processing.
- Maintenance of compliance dashboards and generation of audit-relevant evidence.
- Continuous improvement of Wazuh stack via upgrades new features and configuration optimization.
Observability & Monitoring Platform (Prometheus / Grafana / Alerting)
- Deployment configuration and operations of Prometheus-based monitoring stacks (standalone and Kubernetes-integrated).
- Administration of scraping configurations service discovery rules and target troubleshooting.
- Design and maintenance of recording rules and alert rules for platform components.
- Alert noise reduction through tuning and improved signal quality.
- Integration and troubleshooting of exporters (node database Kubernetes etc.).
- Resolution of metric gaps scrape latency issues and cardinality-related performance problems.
- Capacity planning for Prometheus TSDB retention storage requirements and query performance.
- Development and lifecycle management of Grafana dashboards for platform and infrastructure services.
- Troubleshooting dashboard performance data source connectivity and visualization accuracy.
- Implementation of standardized dashboard templates across platform services.
- Integration of alerting workflows into incident management systems.
- Definition of platform SLIs/SLOs and reliability indicators.
- Correlation of metrics and logs (including Wazuh and OS logs) for root-cause analysis.
- Support and lifecycle management of Kubernetes monitoring components (Prometheus Operator ServiceMonitor/PodMonitor).
- Validation of monitoring coverage for newly onboarded components and applications.
Database Platform Operations (PostgreSQL / Oracle PoC)
- Operational management of PostgreSQL clusters across environments.
- Monitoring key metrics (connections locks long-running queries replication lag).
- Backup restore and disaster recovery validation.
- Growth and capacity planning for compute and storage layers.
- Support for database failover scenarios and resilience testing.
- Preparation and execution of Oracle-related proofs of concept.
- Evaluation of database deployment models (VM-based containerized or managed).
- Maintenance and enhancement of the database stack including upgrades and feature adoption.
Object Storage Platform (MinIO / S3 APIs)
- Deployment and operations of MinIO-based object storage clusters.
- Troubleshooting of S3 API access authentication and compatibility issues.
- Monitoring capacity usage planning storage expansions and scaling clusters.
- Configuration of lifecycle policies data retention and archival strategies.
- Integration of MinIO with platform workloads CI/CD and backup systems.
- Performance analysis and troubleshooting of replication and erasure coding.
Networking & Firewall Operations (VMware NSX-T)
- Operational support of software-defined networking environments using NSX-T.
- Troubleshooting of routing issues overlay networking and cross-segment connectivity.
- Management of distributed firewall policies and micro-segmentation rules.
- Support for load balancers service exposure and virtual networking components.
- Administration of DNS infrastructure (zones records service discovery).
- Throughput latency and capacity analysis for critical network paths.
Remote Platform Access & Identity Integration
- Design and support of secure remote access solutions using Apache Guacamole and Entra ID.
- Troubleshooting identity flows authentication chains and access control policies.
- Integration with enterprise identity providers using OIDC and directory services.
- Implementation of secure access patterns for administrators and application teams.
Automation & Platform Engineering (Ansible / AWX)
- Preparation and execution of Ansible and AWX proof-of-concepts.
- Development of automation playbooks for platform configuration provisioning and lifecycle tasks.
- Integration of configuration management workflows into operational routines.
- Evaluation and optimization of automated operational processes.
- Automated deployment validation and configuration compliance checks.
Incident Management & Reliability Engineering
- 3rd-level escalation point for complex incidents across infrastructure and platform services.
- Root cause analysis using logs metrics and system-level diagnostics.
- Coordination of incident response across multiple technical domains.
- Identification and remediation of recurring incident patterns.
- Implementation of platform stabilization and hardening measures.
- Transition of engineered solutions into long-term operational models.
Security Compliance & Audit Support
- Design and discussion of audit controls with internal and external auditors.
- Preparation of audit evidence for platform and application compliance.
- Integration of security controls and guardrails into automated deployment workflows.
- Maintenance of compliance-sensitive configuration baselines.
- Support for remediation of audit findings and compliance gaps.
Architecture & Technology Evaluation
- Execution of proofs of concept for emerging technologies and platform components.
- Assessment of scalability resilience operational complexity and security posture.
- Creation of technical blueprints and reference architectures.
- Definition of integration strategies for new services within existing platform ecosystems.
- Evaluation of cost efficiency maintainability and operational impact of architectural decisions.
Collaboration & Platform Enablement
- Coordination of cross-team technical work packages across operations and engineering units.
- Support for application onboarding to shared platform services.
- Documentation of platform standards operational procedures and best practices.
- Contribution to presales discussions and service portfolio evolution.
Delivery of knowledge transfer and enablement sessions for operations and development teams
Additional Information :
Please Note: Fraudulent job postings/job scams are increasingly common. Beware of misleading advertisements and fraudulent communication issuing offer letters on behalf of T-Systems in exchange for a fee. Please look for an authentic T-Systems email id - .
Stay vigilant. Protect yourself from recruitment fraud!
To know more please visit : Fraud Alert
Remote Work :
No
Employment Type :
Full-time
About Company
T-Systems Information and Communication Technology India Private Limited (T-Systems ICT India Pvt. Ltd.) is a proud recipient of the prestigious Great Place To Work® Certification. As a wholly owned subsidiary of T-Systems International GmbH, T-Systems India operates across Pune, Ban ... View more