Senior Software Engineer - Platform
Remote (United States)
Job Details
Status: Full-Time
Pay: $170,000 – $300,000 per year
About the Role
This opportunity is for a Senior Software Engineer focused on platform infrastructure, internal developer tooling, shared libraries, automation, and reliability. The role is designed to help product engineers ship features quickly while maintaining strong performance, uptime, security, and scalability.
You will build the systems, tools, and paved paths that make development and deployment safer, faster, and more reliable. This includes infrastructure as code, CI/CD pipelines, self-service developer workflows, observability systems, security defaults, data platform reliability, and platform automation.
What You’ll Do
Platform Infrastructure & Developer Tooling
- Own and evolve AWS infrastructure using Terraform and Pulumi Cloud
- Treat infrastructure as a product that engineering teams depend on
- Design and maintain internal developer tooling, shared libraries, SDKs, code generation, data access patterns, and service scaffolding
- Create golden paths for common workflows such as new service setup, background jobs, event streams, and APIs
- Build platform defaults that make security, observability, and consistency easier for engineering teams to adopt
- Reduce engineering toil through automation and self-service tooling
CI/CD, Environments & Delivery
- Design and build CI/CD systems that help engineers deploy dozens of times per day with confidence
- Maintain Buildkite pipelines and TypeScript pipeline-as-code workflows
- Build and support per-PR ephemeral environments
- Improve local development tooling, templates, and CI-integrated workflows
- Make deployments safer, reproducible, and easier to roll out
Reliability, Observability & Security
- Drive reliability through SLOs, autoscaling, incident response, and postmortems
- Build systems that reduce the likelihood and impact of future incidents
- Create observability tooling and shared instrumentation libraries that give engineers real-time service insight
- Improve alerting so signals are useful and actionable rather than noisy
- Enforce security best practices across IAM, secrets management, encryption, audit logging, and DDoS protection
- Build secure platform defaults without slowing engineering velocity
Data Platform, Scaling & AI-First Engineering
- Own the reliability and performance of Aurora PostgreSQL, including provisioning, backups, failover, tuning, and safe usage patterns
- Solve scaling problems across Aurora PostgreSQL, Kafka throughput, and cost-efficient compute autoscaling
- Extend internal TypeScript tooling, including schema and code generation, service templates, shared data access libraries, and instrumentation
- Build tooling and best practices for AI-first software engineering
- Support background agents that can autonomously make code changes
- Design frameworks that allow AI product experiences to self-optimize based on user interactions
- Contribute to architecture decisions, documentation, and shared engineering standards
- Participate in the shared engineering on-call rotation
Qualifications
- Experience building internal platforms or developer tooling such as code generation, CLIs, templates, shared SDKs, or frameworks
- Strong TypeScript skills and strong API design judgment
- Experience building stable primitives that other engineers rely on
- Deep AWS experience across compute, networking, storage, and security, including ECS, Lambda, VPC, ALB, IAM, RDS, ElastiCache, MSK, OpenSearch, S3, CloudWatch, CloudTrail, or GuardDuty
- Strong Terraform and/or Pulumi experience, including modules, workspaces, and CI-driven plan/apply workflows
- Experience designing and operating CI/CD systems that help engineering teams ship frequently and confidently
- Experience building production observability stacks using tools such as Datadog, CloudWatch, Sentry, distributed tracing, and SLOs
- Experience building secure and reliable paved paths, including instrumentation helpers, policy-as-code, and safe-by-default deployment patterns
- Security-first mindset with the ability to harden infrastructure while preserving team velocity
- Experience operating Aurora PostgreSQL at scale, including backups, point-in-time recovery, failover, read replicas, and query tuning
- Comfort working with Docker and container orchestration environments
- Reliability engineering mindset, including SLOs, error budgets, and incident response
- Curiosity about the infrastructure demands of AI and LLM workloads
- Strong written communication skills and ability to document decisions clearly
Tech Environment
- AWS infrastructure managed with Terraform and Pulumi Cloud
- Application services running in Docker on ECS EC2 or Fargate
- Aurora PostgreSQL, ElastiCache Redis, MSK Kafka, and OpenSearch
- Buildkite CI/CD with TypeScript pipeline-as-code
- Internal TypeScript platform libraries for code generation, service templates, shared data access, and instrumentation
- Datadog, CloudWatch, and Sentry for observability
- Cloudflare for DNS and CDN
- TypeScript monorepo with Node/Express, React frontend, and GraphQL/Apollo API layer
- GitHub for source control
Benefits
- Robust medical coverage with 100% of employee and family premiums covered
- Vision and dental coverage
- 401(k)
- HSA and FSA options
- Access to a lender partner employee loan program
- Remote-first culture
- Flexible time off
Looking for more opportunities?
View All Jobs