Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailThis is a remote position.
The developer will be responsible for building a lightweight selfhealing autoscaling multiplatform APM (Application Performance Monitoring) agent that can:
Instrumentation & Data Collection:
Automatically instrument applications to collect transaction traces logs and performance metrics.
Capture distributed tracing across microservices.
Track response times error rates resource usage and database query performance.
Collect and forward application system and security logs.
Performance & Efficiency Optimization:
Implement adaptive sampling to reduce overhead.
Ensure async & nonblocking data collection.
Optimize CPU memory and network utilization to minimize application impact.
Distributed Tracing & Database Monitoring:
Assign and propagate trace IDs across microservices.
Monitor slow queries and database calls with minimal overhead.
Log Collection & Security Monitoring:
Collect filter and forward application/system logs.
Detect security anomalies and unusual resource usage patterns.
Communication & Data Transmission:
Efficiently batch and compress data before sending to the APM platform.
Use lightweight protocols (gRPC Protobuf etc. for communication.
SelfHealing & AutoScaling Mechanisms:
Implement triggers for autoscaling based on CPU memory and latency thresholds.
Enable selfhealing by restarting services upon failure or excessive resource usage.
Delivery Timeline & Reporting:
Develop the agent within 2 to 3 weeks.
Provide technical documentation and performance benchmarks.
Programming Expertise: Proficiency in languages commonly used for APM agents such as Java Python Go .NET or C.
Instrumentation & Monitoring Experience: Handson experience with code profiling distributed tracing (OpenTelemetry) and application instrumentation.
Performance Optimization: Knowledge of efficient data collection strategies async programming and lowlatency data transmission.
Logging & Security: Experience integrating with logging pipelines (ELK Splunk Loki) and implementing basic security anomaly detection.
Scalability & Resilience: Familiarity with autoscaling selfhealing mechanisms and cloudnative architectures.
APM & Observability Tools: Experience with tools like Prometheus OpenTelemetry Datadog New Relic or Dynatrace is a plus.
Networking & Communication Protocols: Proficiency in gRPC Protobuf or HTTPbased telemetry data transfer.
Agile Development & FastPaced : Ability to deliver a functional prototype within 23 weeks and iterate based on feedback.
Strong Debugging & ProblemSolving Skills: Ability to analyze performance bottlenecks and optimize agent behavior.
Full Time