Web Scraper Developer

Pune - India

Monthly Salary: Not Disclosed

Experience Required: 5years

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About the Job: HEROIC Cybersecurity ( ) is seeking a Web Scraper Developer with deep expertise in building scalable automated web data collection systems to power our AI-driven cybersecurity intelligence platforms.

You will be responsible for developing deploying and maintaining high-performance web crawlers and data extraction pipelines that source threat intelligence leaked datasets and cybersecurity-related data from the surface deep and dark web.

This role requires strong technical knowledge in Python-based scraping frameworks distributed data pipelines and automation systems to collect and normalize large-scale datasets with minimal manual intervention. Your work will directly support HEROICs mission to make the internet safer through intelligent data-driven cybersecurity insights.

What you will do:

Design develop and maintain large-scale distributed web crawlers and data extraction pipelines.
Build automated systems to scrape clean and normalize structured and unstructured data from multiple web sources (surface deep and dark web).
Develop resilient scraping solutions using frameworks like Scrapy Selenium Playwright or custom Python-based tools.
Implement strategies to overcome anti-bot challenges (e.g. proxy rotation CAPTCHA handling user-agent management).
Integrate scraped data into centralized databases (PostgreSQL MySQL etc).
Collaborate with the backend team to design ingestion workflows that feed into HEROICs cybersecurity intelligence platform.
Monitor and optimize scraping performance reliability and compliance with data usage policies.
Automate deployment and scaling of crawler clusters using Docker Kubernetes or cloud infrastructure (AWS/GCP).
Write and maintain APIs scripts and ETL components for downstream data processing.
Collaborate closely with software development team to ensure seamless data flow and usability.

Requirements

Bachelors Degree in Computer Science Information Technology or related field
Minimum 4 years of hands-on experience in web scraping data crawling or data pipeline development.
Strong proficiency in Python and scraping frameworks such as Scrapy Selenium Playwright or BeautifulSoup.
Proven experience building scalable crawlers capable of handling high-volume dynamic or JavaScript-rendered sites.
Deep understanding of HTTP DOM structures XPath/CSS selectors and data parsing.
Experience managing asynchronous/concurrent scraping tasks and distributed crawling architectures.
Knowledge of data pipelines ETL workflows and API integrations.
Familiarity with NoSQL and SQL databases (e.g. MongoDB PostgreSQL Elasticsearch Cassandra).
Strong command of Linux/Unix systems shell scripting and version control (Git).
Experience with containerization and cloud-based deployments (Docker Kubernetes AWS or GCP).
Excellent problem-solving analytical and debugging skills.
Strong written and verbal communication in English.
Prior experience in cybersecurity data intelligence or dark web data collection (preferred but not required).

Benefits

Position Type: Full-time
Location: India (Remote Work from anywhere)
Salary: Competitive salary based on experience
Other Benefits: PTOs & National Holidays
Professional Growth: Work with cutting-edge AI cybersecurity and SaaS technologies
Culture: Fast-paced innovative mission-driven team.

About Us: HEROIC Cybersecurity () is building the future of cybersecurity. Unlike traditional solutions HEROIC takes a predictive and proactive approach to intelligently secure users before an attack or threat occurs. Our work environment is fast-paced challenging and exciting. At HEROIC youll collaborate with a team of passionate driven individuals dedicated to making the world a safer digital place.

Required Skills:

Minimum 4 years of hands-on experience in web scraping data crawling or data pipeline development. Strong proficiency in Python and scraping frameworks such as Scrapy Selenium Playwright or BeautifulSoup. Proven experience building scalable crawlers capable of handling high-volume dynamic or JavaScript-rendered sites. Deep understanding of HTTP DOM structures XPath/CSS selectors and data parsing. Experience managing asynchronous/concurrent scraping tasks and distributed crawling architectures. Knowledge of data pipelines ETL workflows and API integrations. Familiarity with NoSQL and SQL databases (e.g. MongoDB PostgreSQL Elasticsearch Cassandra). Strong command of Linux/Unix systems shell scripting and version control (Git). Experience with containerization and cloud-based deployments (Docker Kubernetes AWS or GCP). Excellent problem-solving analytical and debugging skills. Strong written and verbal communication in English.

Required Education:

Bachelors Degree in Computer Science Engineering or equivalent hands-on experience

About the Job: HEROIC Cybersecurity ( ) is seeking a Web Scraper Developer with deep expertise in building scalable automated web data collection systems to power our AI-driven cybersecurity intelligence platforms.You will be responsible for developing deploying and maintaining high-performance w...

What you will do:

Design develop and maintain large-scale distributed web crawlers and data extraction pipelines.
Build automated systems to scrape clean and normalize structured and unstructured data from multiple web sources (surface deep and dark web).
Develop resilient scraping solutions using frameworks like Scrapy Selenium Playwright or custom Python-based tools.
Implement strategies to overcome anti-bot challenges (e.g. proxy rotation CAPTCHA handling user-agent management).
Integrate scraped data into centralized databases (PostgreSQL MySQL etc).
Collaborate with the backend team to design ingestion workflows that feed into HEROICs cybersecurity intelligence platform.
Monitor and optimize scraping performance reliability and compliance with data usage policies.
Automate deployment and scaling of crawler clusters using Docker Kubernetes or cloud infrastructure (AWS/GCP).
Write and maintain APIs scripts and ETL components for downstream data processing.
Collaborate closely with software development team to ensure seamless data flow and usability.

Requirements

Bachelors Degree in Computer Science Information Technology or related field
Minimum 4 years of hands-on experience in web scraping data crawling or data pipeline development.
Strong proficiency in Python and scraping frameworks such as Scrapy Selenium Playwright or BeautifulSoup.
Proven experience building scalable crawlers capable of handling high-volume dynamic or JavaScript-rendered sites.
Deep understanding of HTTP DOM structures XPath/CSS selectors and data parsing.
Experience managing asynchronous/concurrent scraping tasks and distributed crawling architectures.
Knowledge of data pipelines ETL workflows and API integrations.
Familiarity with NoSQL and SQL databases (e.g. MongoDB PostgreSQL Elasticsearch Cassandra).
Strong command of Linux/Unix systems shell scripting and version control (Git).
Experience with containerization and cloud-based deployments (Docker Kubernetes AWS or GCP).
Excellent problem-solving analytical and debugging skills.
Strong written and verbal communication in English.
Prior experience in cybersecurity data intelligence or dark web data collection (preferred but not required).

Benefits

Position Type: Full-time
Location: India (Remote Work from anywhere)
Salary: Competitive salary based on experience
Other Benefits: PTOs & National Holidays
Professional Growth: Work with cutting-edge AI cybersecurity and SaaS technologies
Culture: Fast-paced innovative mission-driven team.