Role: IBM Workload Scheduler Administrator / Infrastructure Engineer
Location: Riverwoods IL (3 days a week onsite) / but remote may be considered for an exceptional candidate
Job Type: Contract (W2)
Expected Working Hours: Monday Friday 9:00 am 5:00 pm US/Central full time with flexible hours for occasional weekend change control rotating on-call with two other team members
Reports To: Senior Manager Software Engineering
Description:
- We are seeking a highly skilled (3 5 years dedicated experience administering) IBM Workload Scheduler (IWS) Administrator to manage maintain and optimize our enterprise batch scheduling infrastructure.
- The successful candidate will be responsible for the end-to-end administration of the IWS environment hosted primarily on Red Hat Enterprise Linux (RHEL).
- This role requires a strong blend of IWS expertise Linux system administration and scripting to ensure high availability and seamless execution of critical business workloads.
Responsibilities:
- Administer Production IBM Workload Scheduler (aka Tivoli Workload Scheduler) environment with 28000 unique daily jobs across 350000 daily job runs 44 servers and three other change control environments.
- Administer install configure and patch/upgrade IWS components (Master Domain Manager Dynamic Agents Dynamic Pool Dynamic Workload Console).
- Work with Product Owner on communicating work streams in Jira.
- Manage job promotions using Workload Application Template-based processes ensuring platform stability checks for each promotion.
- Manage change control across four separate environments enforcing standards and policies.
- Maintain and promote 99.17% Production platform uptime per calendar month (excluding planned outages and maintenance windows) using SOPs DevOps tools and disciplined change control.
- Communicate platform improvements to a user community of 500 developers and data engineers.
- Production consists of 44 servers across MDM DWC and dynamic agents.
- Resolve complex job failures performance bottlenecks agent issues and infrastructure issues.
- Advise on complex job scheduling design questions for the scheduling support team.
- Monitor scheduler health manage database maintenance perform backup/disaster recovery and conduct monthly failovers.
- Define and maintain security policies user authorizations and authentication for the DWC.
- Respond to cybersecurity vulnerability assessments and regulatory audit inquiries (including PCI).
- Design and implement Ansible automation and self-healing mechanisms to reduce unplanned outages.
- Coordinate with offshore teams performing SOPs during non-working hours.
- Script in Python using the IWS REST API.
Required Technical Skills:
- Strong experience with IBM Workload Scheduler architecture especially Dynamic Workload Broker V10.1 high availability of MDMs managing Fault Tolerant Agent and Dynamic Agent architectures.
- Strong conceptual understanding of Master Domain Manager (MDM) Backup MDM (BMDM) Dynamic Workload Console (DWC) Fault Tolerant Agent (FTA) Dynamic Agent (DA).
- Strong grasp of conman CLI to monitor and control production plan check job/job stream/resource status.
- Strong grasp of composer CLI to define modify and extract scheduling objects.
- Strong grasp of planman CLI to control pre-production plan and GUI mirroring.
- Strong grasp of lifecycle of daily production planning process phases of JNextPlan/FINAL.
- Proficiency in navigating the DWC web-based GUI to monitor workloads manage user access security and define scheduling objects.
- Experience installing IWS components applying Fix Packs and Interim Fixes.
- Troubleshooting with logs under TWSDATA/stdlist adjusting trace level for netman batchman writer mailman etc.
- Strong experience with IBM WebSphere Liberty.
- Strong grasp of reading FFDC logs.
- Strong grasp of configuring JVM heap sizes.
- Strong grasp of configuring tracing scope tracing levels tracing retention and trace strings.
- Strong experience with Red Hat Enterprise Linux 8.
- Deep familiarity with bash/shell commands for text processing (grep awk sed) file manipulation and system navigation.
- Ability to manage start stop and troubleshoot SystemD services using systemctl and journalctl for IWS agents and MDM.
- Managing user accounts groups service accounts and deep knowledge of Linux file permissions (chmod chown ACL on local filesystems and NFS).
- Ability to monitor system performance using top htop vmstat iostat sar to troubleshoot bottlenecks and platform unresponsiveness.
- Understanding of Logical Volume Manager (LVM) and filesystem usage.
- Checking TCP port availability firewall rules (firewalld/iptables) and connectivity between MDM and Dynamic Agents using netstat ss ping curl etc.
- Managing SSL/TLS certificates private keystores public truststores and working with Certificate Authority.
- Strong experience with scripting (Bash Shell Python etc.) for automation.
- Understanding of networking principles.
- Understanding of basic Oracle database administration enough to troubleshoot with DBAs to prove when an issue is in Oracle.
- Understanding of basic SQL to query job metadata.
- Understanding of checking database connectivity.
- Understanding of AWS cloud infrastructure.
- Experience with using secrets manager (CyberArk PPM Hashicorp Vault or similar).
Role: IBM Workload Scheduler Administrator / Infrastructure Engineer Location: Riverwoods IL (3 days a week onsite) / but remote may be considered for an exceptional candidate Job Type: Contract (W2) Expected Working Hours: Monday Friday 9:00 am 5:00 pm US/Central full time with flexible ho...
Role: IBM Workload Scheduler Administrator / Infrastructure Engineer
Location: Riverwoods IL (3 days a week onsite) / but remote may be considered for an exceptional candidate
Job Type: Contract (W2)
Expected Working Hours: Monday Friday 9:00 am 5:00 pm US/Central full time with flexible hours for occasional weekend change control rotating on-call with two other team members
Reports To: Senior Manager Software Engineering
Description:
- We are seeking a highly skilled (3 5 years dedicated experience administering) IBM Workload Scheduler (IWS) Administrator to manage maintain and optimize our enterprise batch scheduling infrastructure.
- The successful candidate will be responsible for the end-to-end administration of the IWS environment hosted primarily on Red Hat Enterprise Linux (RHEL).
- This role requires a strong blend of IWS expertise Linux system administration and scripting to ensure high availability and seamless execution of critical business workloads.
Responsibilities:
- Administer Production IBM Workload Scheduler (aka Tivoli Workload Scheduler) environment with 28000 unique daily jobs across 350000 daily job runs 44 servers and three other change control environments.
- Administer install configure and patch/upgrade IWS components (Master Domain Manager Dynamic Agents Dynamic Pool Dynamic Workload Console).
- Work with Product Owner on communicating work streams in Jira.
- Manage job promotions using Workload Application Template-based processes ensuring platform stability checks for each promotion.
- Manage change control across four separate environments enforcing standards and policies.
- Maintain and promote 99.17% Production platform uptime per calendar month (excluding planned outages and maintenance windows) using SOPs DevOps tools and disciplined change control.
- Communicate platform improvements to a user community of 500 developers and data engineers.
- Production consists of 44 servers across MDM DWC and dynamic agents.
- Resolve complex job failures performance bottlenecks agent issues and infrastructure issues.
- Advise on complex job scheduling design questions for the scheduling support team.
- Monitor scheduler health manage database maintenance perform backup/disaster recovery and conduct monthly failovers.
- Define and maintain security policies user authorizations and authentication for the DWC.
- Respond to cybersecurity vulnerability assessments and regulatory audit inquiries (including PCI).
- Design and implement Ansible automation and self-healing mechanisms to reduce unplanned outages.
- Coordinate with offshore teams performing SOPs during non-working hours.
- Script in Python using the IWS REST API.
Required Technical Skills:
- Strong experience with IBM Workload Scheduler architecture especially Dynamic Workload Broker V10.1 high availability of MDMs managing Fault Tolerant Agent and Dynamic Agent architectures.
- Strong conceptual understanding of Master Domain Manager (MDM) Backup MDM (BMDM) Dynamic Workload Console (DWC) Fault Tolerant Agent (FTA) Dynamic Agent (DA).
- Strong grasp of conman CLI to monitor and control production plan check job/job stream/resource status.
- Strong grasp of composer CLI to define modify and extract scheduling objects.
- Strong grasp of planman CLI to control pre-production plan and GUI mirroring.
- Strong grasp of lifecycle of daily production planning process phases of JNextPlan/FINAL.
- Proficiency in navigating the DWC web-based GUI to monitor workloads manage user access security and define scheduling objects.
- Experience installing IWS components applying Fix Packs and Interim Fixes.
- Troubleshooting with logs under TWSDATA/stdlist adjusting trace level for netman batchman writer mailman etc.
- Strong experience with IBM WebSphere Liberty.
- Strong grasp of reading FFDC logs.
- Strong grasp of configuring JVM heap sizes.
- Strong grasp of configuring tracing scope tracing levels tracing retention and trace strings.
- Strong experience with Red Hat Enterprise Linux 8.
- Deep familiarity with bash/shell commands for text processing (grep awk sed) file manipulation and system navigation.
- Ability to manage start stop and troubleshoot SystemD services using systemctl and journalctl for IWS agents and MDM.
- Managing user accounts groups service accounts and deep knowledge of Linux file permissions (chmod chown ACL on local filesystems and NFS).
- Ability to monitor system performance using top htop vmstat iostat sar to troubleshoot bottlenecks and platform unresponsiveness.
- Understanding of Logical Volume Manager (LVM) and filesystem usage.
- Checking TCP port availability firewall rules (firewalld/iptables) and connectivity between MDM and Dynamic Agents using netstat ss ping curl etc.
- Managing SSL/TLS certificates private keystores public truststores and working with Certificate Authority.
- Strong experience with scripting (Bash Shell Python etc.) for automation.
- Understanding of networking principles.
- Understanding of basic Oracle database administration enough to troubleshoot with DBAs to prove when an issue is in Oracle.
- Understanding of basic SQL to query job metadata.
- Understanding of checking database connectivity.
- Understanding of AWS cloud infrastructure.
- Experience with using secrets manager (CyberArk PPM Hashicorp Vault or similar).
View more
View less