This role owns the end-to-end discovery acquisition and ingestion pipeline for continuously discovering crawling extracting indexing and normalizing millions of new artifacts dailyincluding documents chats forums leaked datasets repositories threat actor communications hacker marketplaces unsecured infrastructure and decentralized networks across the surface web deep web dark web and anonymized networks.
Our Threat Research Teams mission is aggressive: achieve near-total coverage of global breach and leak data with 99% automation. Your work directly enables HEROICs ability to identify exposures before they are weaponized.
What You Will Do:
Automated Intelligence Collection & Discovery
Architect and operate large-scale distributed crawling and discovery systems across:
Surface web deep web and dark web
Hacker forums underground marketplaces and breach communities
Chat platforms (Telegram Discord IRC WhatsApp etc.)
Paste sites code repositories and social platforms used for breach disclosure
Continuously discover archive and download newly released datasets logs credentials and artifacts the moment they appear
Dark Web Anonymized & Decentralized Networks
Build automated collectors and archivers for anonymized and decentralized networks including:
Design resilient workflows for unreliable adversarial or ephemeral data sources
Normalize and index data from non-traditional network protocols and formats
Infrastructure & Exposure Discovery
Pipeline Engineering & Operations
Build ETL pipelines to clean normalize enrich and index structured and unstructured data
Implement advanced anti-bot evasion strategies (proxy rotation fingerprinting CAPTCHA mitigation session management)
Integrate collected intelligence into centralized databases and search systems
Design APIs and internal tooling to support downstream analysis and AI/ML workflows
Implement advanced anti-bot evasion and resiliency techniques (proxy rotation fingerprinting CAPTCHA mitigation session handling)
Automate deployment scaling and monitoring using Docker Kubernetes and cloud infrastructure
Continuously optimize performance reliability and cost efficiency of crawler clusters
Requirements
Minimum 4 years of hands-on experience in data engineering intelligence collection crawling or distributed data pipelines
Strong Python expertise and experience with frameworks such as Scrapy Playwright Selenium or custom async systems
Proven experience operating high-volume automated data collection systems in production
Deep understanding of web protocols HTTP DOM parsing and adversarial scraping environments
Experience with asynchronous concurrent and distributed architectures
Familiarity with SQL and NoSQL databases (PostgreSQL MongoDB Elasticsearch Cassandra)
Strong Linux/Unix shell scripting and Git-based workflows
Experience deploying and operating systems using Docker Kubernetes AWS or GCP
Excellent analytical debugging and problem-solving skills
Strong written and verbal communication skills.
Preferred / High-Value Experience
Direct experience with dark web intelligence breach data OSINT or threat research
Familiarity with Tor I2P underground forums stealer logs or credential ecosystems
Experience processing large breach datasets or stealer logs
Background working in adversarial data environments
Exposure to AI/ML-driven intelligence platforms
Benefits
- Position Type: Full-time
- Location: Remote in India. Work from wherever you please! Your home the beach our offices etc.
- Compensation: USD 1300-2000 monthly
- Professional Growth: Amazing upward mobility in a rapidly expanding company.
- Innovative Culture: Be part of a team that leverages AI and cutting-edge technologies.
About Us: HEROIC Cybersecurity () is building the future of cybersecurity. Unlike traditional cybersecurity solutions HEROIC takes a predictive and proactive approach to intelligently secure our users before an attack or threat occurs. Our work environment is fast-paced challenging and exciting. At HEROIC youll work with a team of passionate engaged individuals dedicated to intelligently securing the technology of people all over the world.
Required Skills:
BS/BA Degree in Information Technology Computer Science or comparable experience/certifications in an IT related field. Minimum of 8 years of experience in IT support system administration or related technical roles including at least 2 years in a leadership or client management capacity. Well versed in a broad set of technologies including networking storage service desk cybersecurity virtualization cloud infrastructure (IaaS PaaS) etc. Microsoft Windows Linux Active Directory Office365 Enterprise Networking Virtual environments Storage technologies Endpoint Management Computer Imaging Experience deploying and supporting cloud-based applications Knowledgeable with cybersecurity best Practices HIPAA and/or HITECH requirements. Ability to work independently and manage multiple priorities in a fast-paced environment. Excellent English verbal/written communication and interpersonal skills. Proven ability to manage multiple projects and prioritize effectively. Excellent problem-solving communication and leadership skills. Ability to work 100% onsite in Watsonville CA
Required Education:
BS/BA Degree in Information Technology Computer Science or comparable experience/certifications in an IT related field.
About the Role: HEROIC Cybersecurity () is seeking a senior-level Threat Intelligence Data Engineer - Automated Collection & Dark Web Intelligence to design build and operate fully automated intelligence collection systems that power our AI-driven cybersecurity and breach intelligence platforms.This...
This role owns the end-to-end discovery acquisition and ingestion pipeline for continuously discovering crawling extracting indexing and normalizing millions of new artifacts dailyincluding documents chats forums leaked datasets repositories threat actor communications hacker marketplaces unsecured infrastructure and decentralized networks across the surface web deep web dark web and anonymized networks.
Our Threat Research Teams mission is aggressive: achieve near-total coverage of global breach and leak data with 99% automation. Your work directly enables HEROICs ability to identify exposures before they are weaponized.
What You Will Do:
Automated Intelligence Collection & Discovery
Architect and operate large-scale distributed crawling and discovery systems across:
Surface web deep web and dark web
Hacker forums underground marketplaces and breach communities
Chat platforms (Telegram Discord IRC WhatsApp etc.)
Paste sites code repositories and social platforms used for breach disclosure
Continuously discover archive and download newly released datasets logs credentials and artifacts the moment they appear
Dark Web Anonymized & Decentralized Networks
Build automated collectors and archivers for anonymized and decentralized networks including:
Design resilient workflows for unreliable adversarial or ephemeral data sources
Normalize and index data from non-traditional network protocols and formats
Infrastructure & Exposure Discovery
Pipeline Engineering & Operations
Build ETL pipelines to clean normalize enrich and index structured and unstructured data
Implement advanced anti-bot evasion strategies (proxy rotation fingerprinting CAPTCHA mitigation session management)
Integrate collected intelligence into centralized databases and search systems
Design APIs and internal tooling to support downstream analysis and AI/ML workflows
Implement advanced anti-bot evasion and resiliency techniques (proxy rotation fingerprinting CAPTCHA mitigation session handling)
Automate deployment scaling and monitoring using Docker Kubernetes and cloud infrastructure
Continuously optimize performance reliability and cost efficiency of crawler clusters
Requirements
Minimum 4 years of hands-on experience in data engineering intelligence collection crawling or distributed data pipelines
Strong Python expertise and experience with frameworks such as Scrapy Playwright Selenium or custom async systems
Proven experience operating high-volume automated data collection systems in production
Deep understanding of web protocols HTTP DOM parsing and adversarial scraping environments
Experience with asynchronous concurrent and distributed architectures
Familiarity with SQL and NoSQL databases (PostgreSQL MongoDB Elasticsearch Cassandra)
Strong Linux/Unix shell scripting and Git-based workflows
Experience deploying and operating systems using Docker Kubernetes AWS or GCP
Excellent analytical debugging and problem-solving skills
Strong written and verbal communication skills.
Preferred / High-Value Experience
Direct experience with dark web intelligence breach data OSINT or threat research
Familiarity with Tor I2P underground forums stealer logs or credential ecosystems
Experience processing large breach datasets or stealer logs
Background working in adversarial data environments
Exposure to AI/ML-driven intelligence platforms
Benefits
- Position Type: Full-time
- Location: Remote in India. Work from wherever you please! Your home the beach our offices etc.
- Compensation: USD 1300-2000 monthly
- Professional Growth: Amazing upward mobility in a rapidly expanding company.
- Innovative Culture: Be part of a team that leverages AI and cutting-edge technologies.
About Us: HEROIC Cybersecurity () is building the future of cybersecurity. Unlike traditional cybersecurity solutions HEROIC takes a predictive and proactive approach to intelligently secure our users before an attack or threat occurs. Our work environment is fast-paced challenging and exciting. At HEROIC youll work with a team of passionate engaged individuals dedicated to intelligently securing the technology of people all over the world.
Required Skills:
BS/BA Degree in Information Technology Computer Science or comparable experience/certifications in an IT related field. Minimum of 8 years of experience in IT support system administration or related technical roles including at least 2 years in a leadership or client management capacity. Well versed in a broad set of technologies including networking storage service desk cybersecurity virtualization cloud infrastructure (IaaS PaaS) etc. Microsoft Windows Linux Active Directory Office365 Enterprise Networking Virtual environments Storage technologies Endpoint Management Computer Imaging Experience deploying and supporting cloud-based applications Knowledgeable with cybersecurity best Practices HIPAA and/or HITECH requirements. Ability to work independently and manage multiple priorities in a fast-paced environment. Excellent English verbal/written communication and interpersonal skills. Proven ability to manage multiple projects and prioritize effectively. Excellent problem-solving communication and leadership skills. Ability to work 100% onsite in Watsonville CA
Required Education:
BS/BA Degree in Information Technology Computer Science or comparable experience/certifications in an IT related field.
View more
View less