This software engineering practical focuses on developing a benchmark for reinforcement learning from human feedback (RLHF), with an emphasis on control applications. Inspired by the B-Pref benchmark, this project aims to create a comprehensive toolbox for evaluating preference-based reinforcement learning algorithms, which derive preferred behaviors from human feedback in simulated environments.
The exact goals will be determined in the first weeks of the project, but you may work on:
- Algorithm Re-implementation: Re-implement or adapt RLHF algorithms such as PrefPPO, PEBBLE, SURF, Meta-Reward-Net, and RUNE. This includes reading and understanding the original research papers, and implementing the algorithms in a clear and efficient manner, ensuring they are easily maintainable and readable for future research use.
- Validation: Validate the re-implemented algorithms against the performance reported in their original papers. This may involve re-running original experiments, requiring some system administration skills to manage unmaintained codebases.
- Tools and Interface Development: Implement tools for logging and visualizing algorithm performance, and develop a command-line interface for running benchmark experiments.
What You'll Learn:
- Python for Research Applications: Enhance your Python programming skills, focusing on its use in machine learning with libraries like PyTorch and Jax.
- Version Control with Git: Develop expertise in using Git for collaborative software development and project management.
- Code Quality & Testing: Learn to write clean, maintainable code and implement robust testing strategies, crucial for research integrity.
We will start the course with a practical exercise that each student has to perform individually, which will serve as a check for the necessary prerequisites. Available places will be filled based on the capacity and the results of this exercise. Following this, you will then work in teams for the remainder of the course.
For a deeper understanding of the course's content and expectations, we recommend reviewing at least the first two of the following foundational papers:
[1] https://openreview.net/forum?id=ps95-mkHF_
[2] https://papers.nips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html
[3] https://proceedings.mlr.press/v139/lee21i/lee21i.pdf
[4] https://openreview.net/forum?id=TfhfZLQ2EJO
[5] https://proceedings.neurips.cc/paper_files/paper/2022/hash/8be9c134bb193d8bd3827d4df8488228-Abstract-Conference.html
[6] https://openreview.net/forum?id=OWZVD-l-ZrC
- معلم: Eyke Hüllermeier
- معلم: Timo Kaufmann
- معلم: Konstantinos Kotsakis or Cotsaces
- معلم: Maximilian Muschalik
- معلم: Tobias Oberkofler
- معلم: Evi Berchtold
- معلم: Armin Hadziahmetovic
- معلم: Samuel Klein
- معلم: Felix Offensperger
- معلم: Ralf Zimmer
- معلم: Evi Berchtold
- معلم: Caroline Friedel
- معلم: Volker Heun
- معلم: Samuel Klein
- معلم: Ralf Zimmer
- معلم: Yahav Bar
- معلم: Evi Berchtold
- معلم: Luis Heinzlmeier
- معلم: Luis Heinzlmeier
- معلم: Ralf Zimmer
Ort: B U101 Oettingenstr. 67 (B)
Zeit: Mo 16:00-18:00 (wöchentlich)
- معلم: Daniel Diefenthaler
- معلم: Fabian Dreer
- معلم: Karl Fürlinger
Ort: A 125 Geschw.-Scholl-Pl. 1 (A)
Zeit: Mo 12:00-14:00 (wöchentlich)
- معلم: Simon Rauch
- معلم: Matthias Schubert
- معلم: Ludwig Zellner
Ort: 105 Akademiestr. 7
Zeit: Do 10:00-12:00 (wöchentlich)
- معلم: Ursula Fantauzzo
- معلم: Ming Gui
- معلم: Johannes Schusterbauer