This software engineering practical focuses on developing a benchmark for reinforcement learning from human feedback (RLHF), with an emphasis on control applications. Inspired by the B-Pref benchmark, this project aims to create a comprehensive toolbox for evaluating preference-based reinforcement learning algorithms, which derive preferred behaviors from human feedback in simulated environments.
The exact goals will be determined in the first weeks of the project, but you may work on:
- Algorithm Re-implementation: Re-implement or adapt RLHF algorithms such as PrefPPO, PEBBLE, SURF, Meta-Reward-Net, and RUNE. This includes reading and understanding the original research papers, and implementing the algorithms in a clear and efficient manner, ensuring they are easily maintainable and readable for future research use.
- Validation: Validate the re-implemented algorithms against the performance reported in their original papers. This may involve re-running original experiments, requiring some system administration skills to manage unmaintained codebases.
- Tools and Interface Development: Implement tools for logging and visualizing algorithm performance, and develop a command-line interface for running benchmark experiments.
What You'll Learn:
- Python for Research Applications: Enhance your Python programming skills, focusing on its use in machine learning with libraries like PyTorch and Jax.
- Version Control with Git: Develop expertise in using Git for collaborative software development and project management.
- Code Quality & Testing: Learn to write clean, maintainable code and implement robust testing strategies, crucial for research integrity.
We will start the course with a practical exercise that each student has to perform individually, which will serve as a check for the necessary prerequisites. Available places will be filled based on the capacity and the results of this exercise. Following this, you will then work in teams for the remainder of the course.
For a deeper understanding of the course's content and expectations, we recommend reviewing at least the first two of the following foundational papers:
[1] https://openreview.net/forum?id=ps95-mkHF_
[2] https://papers.nips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html
[3] https://proceedings.mlr.press/v139/lee21i/lee21i.pdf
[4] https://openreview.net/forum?id=TfhfZLQ2EJO
[5] https://proceedings.neurips.cc/paper_files/paper/2022/hash/8be9c134bb193d8bd3827d4df8488228-Abstract-Conference.html
[6] https://openreview.net/forum?id=OWZVD-l-ZrC
- المعلم: Eyke Hüllermeier
- المعلم: Timo Kaufmann
- المعلم: Konstantinos Kotsakis or Cotsaces
- المعلم: Maximilian Muschalik
- المعلم: Tobias Oberkofler
- المعلم: Evi Berchtold
- المعلم: Armin Hadziahmetovic
- المعلم: Samuel Klein
- المعلم: Felix Offensperger
- المعلم: Ralf Zimmer
- المعلم: Evi Berchtold
- المعلم: Caroline Friedel
- المعلم: Volker Heun
- المعلم: Samuel Klein
- المعلم: Ralf Zimmer
- المعلم: Yahav Bar
- المعلم: Evi Berchtold
- المعلم: Luis Heinzlmeier
- المعلم: Luis Heinzlmeier
- المعلم: Ralf Zimmer
Ort: B U101 Oettingenstr. 67 (B)
Zeit: Mo 16:00-18:00 (wöchentlich)
- المعلم: Daniel Diefenthaler
- المعلم: Fabian Dreer
- المعلم: Karl Fürlinger
Ort: A 125 Geschw.-Scholl-Pl. 1 (A)
Zeit: Mo 12:00-14:00 (wöchentlich)
- المعلم: Simon Rauch
- المعلم: Matthias Schubert
- المعلم: Ludwig Zellner
Ort: 105 Akademiestr. 7
Zeit: Do 10:00-12:00 (wöchentlich)
- المعلم: Ursula Fantauzzo
- المعلم: Ming Gui
- المعلم: Johannes Schusterbauer