This software engineering practical focuses on developing a benchmark for reinforcement learning from human feedback (RLHF), with an emphasis on control applications. Inspired by the B-Pref benchmark, this project aims to create a comprehensive toolbox for evaluating preference-based reinforcement learning algorithms, which derive preferred behaviors from human feedback in simulated environments.
The exact goals will be determined in the first weeks of the project, but you may work on:
- Algorithm Re-implementation: Re-implement or adapt RLHF algorithms such as PrefPPO, PEBBLE, SURF, Meta-Reward-Net, and RUNE. This includes reading and understanding the original research papers, and implementing the algorithms in a clear and efficient manner, ensuring they are easily maintainable and readable for future research use.
- Validation: Validate the re-implemented algorithms against the performance reported in their original papers. This may involve re-running original experiments, requiring some system administration skills to manage unmaintained codebases.
- Tools and Interface Development: Implement tools for logging and visualizing algorithm performance, and develop a command-line interface for running benchmark experiments.
What You'll Learn:
- Python for Research Applications: Enhance your Python programming skills, focusing on its use in machine learning with libraries like PyTorch and Jax.
- Version Control with Git: Develop expertise in using Git for collaborative software development and project management.
- Code Quality & Testing: Learn to write clean, maintainable code and implement robust testing strategies, crucial for research integrity.
We will start the course with a practical exercise that each student has to perform individually, which will serve as a check for the necessary prerequisites. Available places will be filled based on the capacity and the results of this exercise. Following this, you will then work in teams for the remainder of the course.
For a deeper understanding of the course's content and expectations, we recommend reviewing at least the first two of the following foundational papers:
[1] https://openreview.net/forum?id=ps95-mkHF_
[2] https://papers.nips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html
[3] https://proceedings.mlr.press/v139/lee21i/lee21i.pdf
[4] https://openreview.net/forum?id=TfhfZLQ2EJO
[5] https://proceedings.neurips.cc/paper_files/paper/2022/hash/8be9c134bb193d8bd3827d4df8488228-Abstract-Conference.html
[6] https://openreview.net/forum?id=OWZVD-l-ZrC
- Enseignant: Eyke Hüllermeier
- Enseignant: Timo Kaufmann
- Enseignant: Konstantinos Kotsakis or Cotsaces
- Enseignant: Maximilian Muschalik
- Enseignant: Tobias Oberkofler
- Enseignant: Evi Berchtold
- Enseignant: Armin Hadziahmetovic
- Enseignant: Samuel Klein
- Enseignant: Felix Offensperger
- Enseignant: Ralf Zimmer
- Enseignant: Evi Berchtold
- Enseignant: Caroline Friedel
- Enseignant: Volker Heun
- Enseignant: Samuel Klein
- Enseignant: Ralf Zimmer
- Enseignant: Yahav Bar
- Enseignant: Evi Berchtold
- Enseignant: Luis Heinzlmeier
- Enseignant: Luis Heinzlmeier
- Enseignant: Ralf Zimmer
Ort: B U101 Oettingenstr. 67 (B)
Zeit: Mo 16:00-18:00 (wöchentlich)
- Enseignant: Daniel Diefenthaler
- Enseignant: Fabian Dreer
- Enseignant: Karl Fürlinger
Ort: A 125 Geschw.-Scholl-Pl. 1 (A)
Zeit: Mo 12:00-14:00 (wöchentlich)
- Enseignant: Simon Rauch
- Enseignant: Matthias Schubert
- Enseignant: Ludwig Zellner
Ort: 105 Akademiestr. 7
Zeit: Do 10:00-12:00 (wöchentlich)
- Enseignant: Ursula Fantauzzo
- Enseignant: Ming Gui
- Enseignant: Johannes Schusterbauer