Senior Data Engineer
Vienna, VA - USA
Job Summary
SteerBridge Strategies is a modern technology company delivering innovative missionfocused solutions to the U.S. Government and private sector.Leveraging deep expertise in federal acquisition digital transformation and emerging technologies we deliver agile commercialgrade capabilities that accelerate operational effectiveness and drive measurable mission success.
At the core of SteerBridge is our peopleespecially the veterans whose leadership problemsolving mindset and commitment to excellence elevate every project we support. We dont simply hire exceptional talent;we cultivate it creating meaningful career pathways for veterans military spouses and professionals who share our passion for advancing technology and strengthening the missions we serve.
SteerBridge seeks a highly skilled and motivated individual to join our team as a Senior Data Engineer to align data solutions to business requirements by planning and managing data infrastructure and strategy for our AI/ML Maintenance Sustainment and Deployment Planning Project. Our team is dedicated to harnessing the power of AI/ML to increase parts availability and reduce maintenance wait times ultimately maximizing aircraft availability.
In this role you will be responsible performing Data Engineering tasks within the existing systems of record with multiple databases. Your mission will be to enhance and optimize data entry management and extraction within this database to ensure its usability within our proprietary system. Data management activities include performing data quality checks analysis presenting data and documenting the process. The ideal candidate is a quick learner curious innovative results-oriented and has strong interpersonal skills
Benefits
- Health insurance
- Dental insurance
- Vision insurance
- Life Insurance
- 401(k) Retirement Plan with matching
- Paid Time Off
- Paid Federal Holidays
Required
- Must be a U.S. Citizen.
- Bachelors Degree or Above in Systems Engineering Computer Science or related field.
- An active security clearance or the ability to obtain one is required.
- Minimum 6 years of experience to include:
- Experience in data pipelines utilizing advanced analytics tools and platforms and Python.
- Experience in scripting tooling and automating large-scale computing environments.
- Extensive experience with major tools such as Python Pandas PySpark NumPy SciPy SQL and Git; Minor experience with TensorFlow PyTorch and Scikit-learn.
- Location: Preferred local to the Vienna Va area and able to work on-site at our Vienna VA office (3 or more days/week). Hybrid opportunities at supervisors discretion.
Experience
Data Modeling and Design
- Advanced data modeling (conceptual logical and physical) with emphasis on scalability and maintainability.
- Strong understanding of database paradigms (relational NoSQL graph time-series and document-based).
- Expertise with modern data warehousing platforms (Redshift Snowflake BigQuery).
- Deep understanding of dimensional modeling (star/snowflake schemas) and data vault techniques.
- Experience designing for both OLTP and OLAP workloads.
- Proficiency with schema evolution metadata-driven pipelines and data versioning strategies.
- Implementing data retention archival and lifecycle policies.
- Project Experience:
- Delivered optimized production-grade data models supporting analytics reporting and ML workflows aligning with established architecture and performance standards.
Data Pipeline Development
- Hands-on experience with distributed processing tools (Apache Kafka Airflow Spark Flink NiFi).
- Skilled in building and orchestrating batch and real-time pipelines on cloud platforms (AWS Glue GCP Dataflow Azure Data Factory).
- Deep understanding of incremental processing idempotency schema evolution and backfill logic.
- Proficient in pipeline automation observability and monitoring (metrics logging alerting).
- Strong Python development for ETL modular testable reusable and performance-optimized.
- Knowledge of workflow dependency management retries and failure recovery strategies.
- Project Experience:
- Owned the end-to-end design and implementation of fault-tolerant high-throughput pipelines integrating diverse data sources while maintaining data quality and SLAs.
Cloud Platforms and Services
- Deep expertise in AWS GCP or Azure data ecosystems.
- Experience building and managing cloud-native data solutions (Data Lakes Data Warehouses Data Mesh).
- Strong understanding of cloud storage (S3 Blob) managed databases (RDS DynamoDB) and compute (EMR Dataproc ECS).
- Cost governance and performance optimization for large-scale data workloads.
- Knowledge of serverless data patterns (AWS Lambda Athena GCF BigQuery).
- Experience with hybrid/multi-cloud architecture and inter-cloud data movement.
- Project Experience:
- Led migration of legacy ETL workflows and data systems to cloud-native architectures delivering measurable cost scalability and performance improvements.
Big Data Technologies
- Hands-on experience with distributed computing frameworks (Hadoop Spark Hive Presto).
- Proficiency with data lake and lakehouse architectures (Delta Lake Apache Iceberg Apache Hudi).
- Understanding of partitioning data compaction schema evolution and ACID compliance.
- Strong knowledge of query optimization on massive datasets (Athena Trino Presto).
- Performance tuning in petabyte-scale distributed systems.
- Project Experience:
- Built and maintained data platforms capable of processing structured and unstructured data at scale enabling advanced analytics and data science workloads.
Database Administration and Optimization
- Advanced SQL/NoSQL query tuning indexing sharding and partitioning strategies.
- Proficient with replication backups and disaster recovery across distributed systems.
- Skilled in analyzing query execution plans and applying cost-based optimization.
- Experience optimizing data-intensive application code and database interfaces.
- Familiarity with temporal tables data versioning and caching strategies.
- Project Experience:
- Improved query performance and system scalability through advanced indexing schema refactoring and distributed database optimization.
Data Governance and Security
- Implementing data privacy and compliance frameworks (GDPR CCPA).
- Experience with data cataloging lineage and metadata management (DataHub Collibra Alation).
- Role-based access control and sensitive data protection across multi-tenant systems.
- Integration of data quality validation and data contract testing within CI/CD pipelines.
- Automation of governance and security policies using cloud-native tools.
- Project Experience:
- Implemented enterprise-grade data governance and access control frameworks ensuring compliance lineage visibility and trust in analytics.
Programming and Software Engineering
- Strong proficiency in Python and SQL for data processing automation and API integration.
- Expertise in object-oriented programming (OOP) and design patterns in Python.
- Deep understanding of algorithmic complexity (Big O) and code performance optimization.
- Familiarity with parallel and distributed computing frameworks (Spark Dask Ray).
- Skilled with version control (Git) and CI/CD tools (GitLab Jenkins CircleCI).
- Proficient in software engineering best practices: testing (pytest/unittest) documentation type hinting and linting.
- Ability to debug profile and optimize large-scale data workflows.
- Project Experience:
- Developed performant maintainable Python-based data frameworks automated ETL systems and optimized code for distributed workloads.
AI/ML Pipeline Enablement
- Collaboration with data scientists on feature engineering data preparation and model deployment.
- Knowledge of ML orchestration and experiment tracking (MLflow Kubeflow).
- Familiarity with feature stores and data lineage for ML.
- Integration of batch and streaming data pipelines for real-time inference.
- Hands-on experience with analytics and visualization tools (Tableau Power BI).
- Project Experience:
- Built and maintained ML-ready data pipelines and infrastructure supporting training experimentation and real-time inference.
Leadership and Collaboration
- Mentoring and guiding junior engineers in data design coding standards and performance optimization.
- Leading cross-functional projects with data scientists analysts and business partners.
- Promoting best practices for data engineering and governance within the organization.
- Effective stakeholder communication documentation and Agile project management.
- Ability to conduct technical reviews and enforce design and scalability standards.
- Project Experience:
- Provided technical leadership for cross-domain data initiatives fostering best practices in data engineering and enabling scalable maintainable systems.
Additional Skills
- DevOps/DataOps: Infrastructure as code Docker/Kubernetes automated deployment of data infrastructure.
- Testing & CI/CD: Git-based workflows automated integration testing and continuous delivery for data pipelines.
- Performance & Cost Optimization: Tuning query execution pipeline efficiency and resource utilization.
- Automation: Building self-healing data pipelines with retry logic monitoring and alerting.
- Documentation: Strong communication of technical architecture using tools like Lucidchart PlantUML .
A salary commensurate with background and experience will be offered.
Required Experience:
Senior IC
About Company
SteerBridge Strategies is proud to be an Equal Opportunity Employer. We are committed to creating a diverse and inclusive workplace where all qualified applicants and employees are treated with respect and dignity—regardless of race, color, gender, age, religion, national origin, ance ... View more