Senior/Staff Reinforcement Learning Engineer - Machine Learning Research & Development

Kaiko · Amsterdam, Netherlands

Posted 2 hours ago

Amsterdam, Netherlands

About us

Kaiko is actively developing a next-generation autonomous clinical AI assistant that aids clinicians in reasoning across patient data, guidelines, and diagnostics. The company works closely with leading hospitals and research centers, including the Netherlands Cancer Institute, to develop specialized diagnostic agents and ensure safe operation in clinical settings.

Job description

As a senior/staff RL engineer - ML R&D, you will own the RL training infrastructure end-to-end, managing the distributed training stack, reward pipelines, and experiment infrastructure. You will tackle complex challenges such as reward hacking and objective-level instability, while also exploring new algorithms and bringing effective solutions into production. Your role is crucial in ensuring the alignment and reasoning capabilities of our AI systems, directly impacting healthcare outcomes.

Build and maintain reward pipelines: verifiable reward signals, LLM-based reward models, and reward shaping strategies for complex clinical reasoning tasks
Own the RL training stack end-to-end and keep it scaling cleanly across large MoE models and long contexts
Debug training instabilities at root cause — reward hacking, entropy collapse, credit assignment failures, gradient issues — and ship fixes, not workarounds
Explore new RL algorithms and reward designs; run controlled experiments and translate promising results into the main training stack
Scale runs across more nodes, longer contexts, and more complex parallelism as models and tasks grow
Contribute upstream to open-source frameworks when you find bugs or missing features

Relevant work experience

Deep hands-on experience with RL training systems: you have shipped and scaled RL or post-training runs, not just run tutorials
Fluent in at least one distributed training framework at a level where you can read the source and debug silent failures
Strong understanding of core RL challenges: reward hacking, credit assignment, exploration, entropy collapse, sample efficiency — and practical ways to address them
Comfortable at the intersection of research and engineering: you read papers, implement ideas, and know when something is worth productionising
Excellent software engineering: clean Python, typed code, reproducible experiments, good test coverage
Independent operator: you don't need prescribed task lists; you take a system from 'running' to 'stable, fast, and understood.'

Benefits

Autonomy to do your work the way that works best for you, whether you have a kid or prefer early mornings

An attractive and competitive salary, a good pension plan, and 25 vacation days per year

Great offsites and team events to strengthen the team and celebrate successes together

A EUR 1000 learning and development budget to help you grow

An annual commuting subsidy

Skills required for the job

ExperimentationReinforcement LearningDistributed TrainingPythonSoftware EngineeringDebuggingCollaborationProblem SolvingResearchScalabilityAI SystemsClinical Reasoning