Staff DevOps Engineer
Remote (United States)
This opportunity is for a Staff DevOps Engineer focused on developer infrastructure, platform engineering, and internal tooling. The role is responsible for building the infrastructure that supports modern development pipelines, pre-production environments, automated quality gates, and AI-assisted engineering workflows.
This position sits at the intersection of DevOps, platform engineering, and developer tooling. The role helps create reliable systems that allow engineers and AI agents to validate, test, and ship production-ready code with greater speed, visibility, and confidence.
Employment Type: Full-Time
Compensation: $96.15 – $120.19 per hour
What You’ll Do
- Design and operate ephemeral, pre-warmed development environments that engineers and AI agents can launch on demand
- Extend internal CLI tooling so new engineers or AI agents can start a validated local environment within minutes
- Automate service discovery, dependency management, and local configuration for development environments
- Build environment parity monitoring to ensure development environments closely match production behavior
- Own infrastructure-level pre-production quality gates that validate deployments before they reach production
- Build and operate automated load testing, performance benchmarking, and security scanning gates within CI/CD pipelines
- Partner with QA and engineering teams to expand quality gate coverage across services
- Build containerized mock services generated from OpenAPI specifications for local integration testing
- Support realistic third-party dependency validation before pull requests are opened
- Stand up Playwright-based UI validation within AI-agent and CI workflows
- Create infrastructure that supports iterative self-refinement, allowing engineers or agents to run, test, fix, and improve output before review
- Build review tooling, metrics dashboards, and operational controls that make development pipelines observable and improvable
- Surface scoring signals, approval rate trends, gate pass rates, and common failure patterns
- Create policy layers that define approval requirements by component, task type, or delivery workflow
- Work across EKS, AWS services, and CI/CD systems to improve delivery reliability and engineering productivity
- Collaborate closely with architects, product teams, QA, infrastructure, developer experience, and engineering teams
- Use AI development tools as part of infrastructure operations, debugging, automation, and delivery workflows
Qualifications
- 6-10+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering roles
- Strong experience building and operating production systems at scale
- Active experience using AI development tools such as Claude Code, Codex, or similar tools in infrastructure workflows
- Ability to use AI tools for Terraform changes, Kubernetes debugging, automation scripting, and operational investigations
- Expertise with Kubernetes, especially EKS
- Strong hands-on experience with AWS services including IAM, VPC, ECR, SSM, Secrets Manager, S3, SQS, Lambda, and RDS/Aurora
- Strong Infrastructure as Code experience, with Terraform preferred
- Experience with GitOps workflows using Argo CD or similar tools
- Proven experience building ephemeral environments, developer tooling, internal platforms, CLIs, scaffolding tools, or developer portals
- Experience with load testing frameworks such as k6, Locust, Gatling, or similar tools
- Experience automating performance gates within CI/CD pipelines
- Experience building mock or stub infrastructure for large-scale integration testing
- Experience with containerized services, API mocking, and dependency isolation
- Deep CI/CD experience with CircleCI, GitHub Actions, or similar platforms
- Experience with caching, parallelism, artifact management, test reliability, and pipeline observability
- Experience with release strategies such as canary releases, blue-green deployments, automated rollbacks, and progressive delivery
- Strong observability fundamentals using tools such as Datadog and OpenTelemetry
- Ability to define SLIs and SLOs and connect them to delivery decisions
- Excellent cross-team communication skills
- Ability to translate platform constraints into developer-friendly tools, solutions, and documentation
Technology Stack
- Infrastructure: AWS, EKS, Terraform, Argo CD, Docker, Vault
- CI/CD: CircleCI, Argo CD, GitHub Actions
- Messaging: Kafka and Confluent Cloud
- Observability: Datadog and OpenTelemetry
- Languages and Applications: Node.js, TypeScript microservices, Python jobs, and React front ends
How You’ll Succeed
- Treat infrastructure as a product by understanding engineer needs, measuring adoption, and improving tools based on feedback
- Build systems that multiply team output rather than only maintaining existing infrastructure
- Prioritize automation, reproducibility, and measurable outcomes
- Create tools or gates when repeated manual work appears in the delivery process
- Operate with high ownership across infrastructure, developer experience, QA, and product engineering teams
- Use AI tools to move faster while maintaining strong verification, rigor, and engineering judgment
- Help develop better team patterns for AI-assisted infrastructure work
If you notice a problem with this job, email us at
contact@7seventy.net.
Looking for more opportunities?
View All Jobs