This software engineering practical focuses on developing a benchmark for reinforcement learning from human feedback (RLHF), with an emphasis on control applications. Inspired by the B-Pref benchmark, this project aims to create a comprehensive toolbox for evaluating preference-based reinforcement learning algorithms, which derive preferred behaviors from human feedback in simulated environments.
The exact goals will be determined in the first weeks of the project, but you may work on:
- Algorithm Re-implementation: Re-implement or adapt RLHF algorithms such as PrefPPO, PEBBLE, SURF, Meta-Reward-Net, and RUNE. This includes reading and understanding the original research papers, and implementing the algorithms in a clear and efficient manner, ensuring they are easily maintainable and readable for future research use.
- Validation: Validate the re-implemented algorithms against the performance reported in their original papers. This may involve re-running original experiments, requiring some system administration skills to manage unmaintained codebases.
- Tools and Interface Development: Implement tools for logging and visualizing algorithm performance, and develop a command-line interface for running benchmark experiments.
What You'll Learn:
- Python for Research Applications: Enhance your Python programming skills, focusing on its use in machine learning with libraries like PyTorch and Jax.
- Version Control with Git: Develop expertise in using Git for collaborative software development and project management.
- Code Quality & Testing: Learn to write clean, maintainable code and implement robust testing strategies, crucial for research integrity.
We will start the course with a practical exercise that each student has to perform individually, which will serve as a check for the necessary prerequisites. Available places will be filled based on the capacity and the results of this exercise. Following this, you will then work in teams for the remainder of the course.
For a deeper understanding of the course's content and expectations, we recommend reviewing at least the first two of the following foundational papers:
[1] https://openreview.net/forum?id=ps95-mkHF_
[2] https://papers.nips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html
[3] https://proceedings.mlr.press/v139/lee21i/lee21i.pdf
[4] https://openreview.net/forum?id=TfhfZLQ2EJO
[5] https://proceedings.neurips.cc/paper_files/paper/2022/hash/8be9c134bb193d8bd3827d4df8488228-Abstract-Conference.html
[6] https://openreview.net/forum?id=OWZVD-l-ZrC
- Викладач: Hüllermeier Eyke
- Викладач: Kaufmann Timo
- Викладач: Kotsakis or Cotsaces Konstantinos
- Викладач: Muschalik Maximilian
- Викладач: Oberkofler Tobias
- Викладач: Berchtold Evi
- Викладач: Hadziahmetovic Armin
- Викладач: Klein Samuel
- Викладач: Offensperger Felix
- Викладач: Zimmer Ralf
- Викладач: Berchtold Evi
- Викладач: Friedel Caroline
- Викладач: Heun Volker
- Викладач: Klein Samuel
- Викладач: Zimmer Ralf
- Викладач: Bar Yahav
- Викладач: Berchtold Evi
- Викладач: Heinzlmeier Luis
- Викладач: Heinzlmeier Luis
- Викладач: Zimmer Ralf
Ort: B U101 Oettingenstr. 67 (B)
Zeit: Mo 16:00-18:00 (wöchentlich)
- Викладач: Diefenthaler Daniel
- Викладач: Dreer Fabian
- Викладач: Fürlinger Karl
Ort: A 125 Geschw.-Scholl-Pl. 1 (A)
Zeit: Mo 12:00-14:00 (wöchentlich)
- Викладач: Rauch Simon
- Викладач: Schubert Matthias
- Викладач: Zellner Ludwig
Ort: 105 Akademiestr. 7
Zeit: Do 10:00-12:00 (wöchentlich)
- Викладач: Fantauzzo Ursula
- Викладач: Gui Ming
- Викладач: Schusterbauer Johannes