We are seeking a Data Scraping to help collect organize and normalize data from public and government sources into a consistent structured format. This role focuses on solving complex data acquisition challenges researching unfamiliar sources extracting information from websites and feeds and transforming it into predefined formats that can be consumed by downstream systems. The ideal candidate enjoys working with messy datasets investigating how websites and data sources are structured and creating reusable solutions that can be executed repeatedly with consistent results. This position requires strong problem-solving skills attention to detail and the ability to work independently while documenting findings and processes clearly.
Schedule: Monday to Friday - 12:00 PM 8:00 PM CST
Responsibilities: Research and identify public and government data sources. Extract and normalize data from websites APIs feeds and online repositories. Build reusable maintainable and re-runnable scripts and scraping workflows. Deliver structured outputs in predefined formats. Provide sample outputs for review before processing larger datasets. Document data sources extraction methodologies challenges encountered and re-run procedures. Capture and report any relevant information discovered during extraction including inconsistencies amendments effective dates repeal notes or related metadata. Troubleshoot data acquisition issues and propose alternative approaches when needed. Collaborate with stakeholders through regular check-ins and written communication. Maintain version-controlled code repositories and follow standard development practices.
Requisitos
Strong experience with web scraping and data extraction. Practical programming experience using Python or similar scripting languages. Experience working with HTML parsing APIs HTTP requests FTP sources and structured or unstructured data. Ability to evaluate debug and improve scraping solutions. Strong analytical and problem-solving skills. Experience building reusable automation workflows rather than one-off scripts. Familiarity with relational databases (PostgreSQL preferred) and a normal Git workflow. Strong documentation and communication skills. Ability to work independently and take ownership of technical challenges. High attention to detail and commitment to data accuracy. Nice to Have: Experience working with government regulatory compliance or public-sector datasets. Experience with Playwright Selenium Puppeteer Scrapy or similar scraping frameworks. Experience with data versioning change detection or document lineage. Familiarity with AI-assisted development tools and workflows.
Required Skills:
Strong experience with web scraping and data extraction. Practical programming experience using Python or similar scripting languages. Experience working with HTML parsing APIs HTTP requests FTP sources and structured or unstructured data. Ability to evaluate debug and improve scraping solutions. Strong analytical and problem-solving skills. Experience building reusable automation workflows rather than one-off scripts. Familiarity with relational databases (PostgreSQL preferred) and a normal Git workflow. Strong documentation and communication skills. Ability to work independently and take ownership of technical challenges. High attention to detail and commitment to data accuracy.
Required Education:
Data Engineer
Este es un puesto de trabajo remoto. We are seeking a Data Scraping to help collect organize and normalize data from public and government sources into a consistent structured format. This role focuses on solving complex data acquisition challenges researching unfamiliar sources extracting infor...
Este es un puesto de trabajo remoto.
We are seeking a Data Scraping to help collect organize and normalize data from public and government sources into a consistent structured format. This role focuses on solving complex data acquisition challenges researching unfamiliar sources extracting information from websites and feeds and transforming it into predefined formats that can be consumed by downstream systems. The ideal candidate enjoys working with messy datasets investigating how websites and data sources are structured and creating reusable solutions that can be executed repeatedly with consistent results. This position requires strong problem-solving skills attention to detail and the ability to work independently while documenting findings and processes clearly.
Schedule: Monday to Friday - 12:00 PM 8:00 PM CST
Responsibilities: Research and identify public and government data sources. Extract and normalize data from websites APIs feeds and online repositories. Build reusable maintainable and re-runnable scripts and scraping workflows. Deliver structured outputs in predefined formats. Provide sample outputs for review before processing larger datasets. Document data sources extraction methodologies challenges encountered and re-run procedures. Capture and report any relevant information discovered during extraction including inconsistencies amendments effective dates repeal notes or related metadata. Troubleshoot data acquisition issues and propose alternative approaches when needed. Collaborate with stakeholders through regular check-ins and written communication. Maintain version-controlled code repositories and follow standard development practices.
Requisitos
Strong experience with web scraping and data extraction. Practical programming experience using Python or similar scripting languages. Experience working with HTML parsing APIs HTTP requests FTP sources and structured or unstructured data. Ability to evaluate debug and improve scraping solutions. Strong analytical and problem-solving skills. Experience building reusable automation workflows rather than one-off scripts. Familiarity with relational databases (PostgreSQL preferred) and a normal Git workflow. Strong documentation and communication skills. Ability to work independently and take ownership of technical challenges. High attention to detail and commitment to data accuracy. Nice to Have: Experience working with government regulatory compliance or public-sector datasets. Experience with Playwright Selenium Puppeteer Scrapy or similar scraping frameworks. Experience with data versioning change detection or document lineage. Familiarity with AI-assisted development tools and workflows.
Required Skills:
Strong experience with web scraping and data extraction. Practical programming experience using Python or similar scripting languages. Experience working with HTML parsing APIs HTTP requests FTP sources and structured or unstructured data. Ability to evaluate debug and improve scraping solutions. Strong analytical and problem-solving skills. Experience building reusable automation workflows rather than one-off scripts. Familiarity with relational databases (PostgreSQL preferred) and a normal Git workflow. Strong documentation and communication skills. Ability to work independently and take ownership of technical challenges. High attention to detail and commitment to data accuracy.