Research Engineer
Remote (United States)
About the Role
This opportunity is focused on applying reinforcement learning to large-scale production systems that improve web data extraction, structured output generation, and intelligent agent workflows. The role combines reinforcement learning, large language model systems, training infrastructure, experimentation, and production deployment.
This is a highly hands-on engineering and research position where ownership includes building training systems, reward pipelines, evaluation frameworks, and production-ready model infrastructure from the ground up.
The role is designed for someone who moves quickly, runs rapid experiments, and bridges classical reinforcement learning approaches with modern LLM agent systems.
Employment Type: Full-Time
Compensation: $180,000 - $290,000 per year
Equity: Up to 0.15%
What You’ll Do
- Build training infrastructure and reward pipelines from scratch for model training and evaluation
- Own the complete model lifecycle including data collection, reward modeling, training runs, evaluation, and deployment
- Design and maintain custom infrastructure required for reinforcement learning workflows
- Fine-tune foundation models for advanced web data extraction and structured content generation
- Improve model quality through rigorous experimentation and optimization techniques
- Apply reinforcement learning methods to multi-step LLM agent workflows
- Design reward signals and policy optimization strategies for agent behavior improvement
- Identify where classical reinforcement learning approaches outperform prompting methods and vice versa
- Run fast, iterative experiments based on meaningful hypotheses and measurable outcomes
- Interpret experiment results quickly and make rapid technical decisions
- Communicate reinforcement learning concepts clearly to engineers, product teams, and leadership stakeholders
- Explain reward functions, model behavior, and optimization strategies in practical business terms
- Collaborate closely with research engineers and engineering teams on search, ranking, and product improvements
- Connect reinforcement learning improvements directly to production product systems
- Deploy and maintain models serving real production traffic
- Balance tradeoffs between model quality, latency, scalability, and infrastructure cost
Qualifications
- 3+ years of experience in applied reinforcement learning, machine learning engineering, or production model training
- Strong experience building custom training infrastructure and reward pipelines independently
- Experience designing and operating training loops, reward models, data pipelines, and evaluation frameworks
- Hands-on experience managing GPU clusters and large-scale training runs
- Ability to debug convergence issues and production model behavior
- Proven experience fine-tuning models to achieve high-performance results
- Deep understanding of data curation, training dynamics, hyperparameter optimization, and evaluation methodology
- Strong knowledge of PPO, RLHF, reward modeling, policy optimization, and reinforcement learning systems
- Experience working with modern large language model agents and agent workflows
- Ability to integrate reinforcement learning techniques with LLM-based systems
- Production mindset with experience deploying models into real-world applications
- Ability to make practical tradeoffs between quality, latency, and operational cost
- Strong experimentation skills with rapid iteration cycles
- Excellent written and verbal communication skills
- Ability to explain complex technical findings clearly to non-specialists
- Experience collaborating in fast-moving engineering and research environments
Preferred Backgrounds
- Experience as a reinforcement learning engineer at AI labs or applied machine learning teams
- Background in RLHF or reward modeling for language model systems
- Experience building machine learning training infrastructure at startups
- Experience combining reinforcement learning with language model systems
- Background in applied research environments with strong production focus
- Experience working on intelligent agent systems
What This Role Is Not Looking For
- Purely theoretical researchers without production experience
- Candidates who rely on dedicated platform teams for infrastructure setup
- Professionals experienced only in reinforcement learning or only in large language models without overlap between both domains
- Slow experimentation cycles with lengthy iteration timelines
- Communication styles that rely heavily on technical jargon without practical clarity
Work Environment & Pace
This role operates in a fast-paced environment with a strong emphasis on rapid iteration, technical ownership, experimentation speed, and production impact.
Benefits & Perks
- Competitive compensation based on impact and contribution
- Equity participation up to 0.15%
- Generous paid time off including 15 mandatory PTO days
- Flexible additional PTO approval process
- 12 weeks of fully paid parental leave
- $100 monthly wellness stipend for health and wellness expenses
- Up to $1,000 annually for learning and professional development
- Company-sponsored team offsites
- Three-month paid sabbatical after four years of employment
Benefits for US-Based Employees
- Medical, dental, and vision coverage with 100% employee coverage and 50% dependent coverage
- Employer-paid life insurance and disability coverage
- Optional accident, critical illness, hospital indemnity, and voluntary life insurance plans
- Telehealth access through Doctegrity
- 401(k) retirement plan
- Pre-tax FSAs and commuter benefits
- Pet insurance coverage
Additional Perks for San Francisco Employees
- Office snacks, beverages, and team lunches
- Collaborative startup office environment
- Access to a loaner electric bike for city transportation
Interview Process
- Application review focused on production systems, training infrastructure, and model experience
- Introductory conversation covering technical background and career goals
- Technical deep dive discussing reinforcement learning systems, reward design, and production deployment
- Founder discussion focused on culture, ownership, and work style
- Paid work trial lasting 1-2 weeks involving a real production-focused reinforcement learning problem
- Fast final decision process following the trial period
Looking for more opportunities?
View All Jobs