This software engineering practical focuses on developing a benchmark for reinforcement learning from human feedback (RLHF), with an emphasis on control applications. Inspired by the B-Pref benchmark, this project aims to create a comprehensive toolbox for evaluating preference-based reinforcement learning algorithms, which derive preferred behaviors from human feedback in simulated environments.

The exact goals will be determined in the first weeks of the project, but you may work on:

  • Algorithm Re-implementation: Re-implement or adapt RLHF algorithms such as PrefPPO, PEBBLE, SURF, Meta-Reward-Net, and RUNE. This includes reading and understanding the original research papers, and implementing the algorithms in a clear and efficient manner, ensuring they are easily maintainable and readable for future research use.
  • Validation: Validate the re-implemented algorithms against the performance reported in their original papers. This may involve re-running original experiments, requiring some system administration skills to manage unmaintained codebases.
  • Tools and Interface Development: Implement tools for logging and visualizing algorithm performance, and develop a command-line interface for running benchmark experiments.
The project will be carried out in Python, utilizing GitHub or GitLab for version control and issue tracking. Teams will work collaboratively to ensure a cohesive final product.

What You'll Learn:
  • Python for Research Applications: Enhance your Python programming skills, focusing on its use in machine learning with libraries like PyTorch and Jax.
  • Version Control with Git: Develop expertise in using Git for collaborative software development and project management.
  • Code Quality & Testing: Learn to write clean, maintainable code and implement robust testing strategies, crucial for research integrity.
Applicants should have a basic understanding of machine learning concepts and some experience with ML models. Most important is the intrinsic motivation for self-directed learning of the concepts listed above (and beyond). Prior familiarity with Python and version control systems like Git is recommended.

We will start the course with a practical exercise that each student has to perform individually, which will serve as a check for the necessary prerequisites. Available places will be filled based on the capacity and the results of this exercise. Following this, you will then work in teams for the remainder of the course.

For a deeper understanding of the course's content and expectations, we recommend reviewing at least the first two of the following foundational papers:
[1] https://openreview.net/forum?id=ps95-mkHF_
[2] https://papers.nips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html
[3] https://proceedings.mlr.press/v139/lee21i/lee21i.pdf
[4] https://openreview.net/forum?id=TfhfZLQ2EJO
[5] https://proceedings.neurips.cc/paper_files/paper/2022/hash/8be9c134bb193d8bd3827d4df8488228-Abstract-Conference.html
[6] https://openreview.net/forum?id=OWZVD-l-ZrC

Ort: B U101 Oettingenstr. 67 (B)
Zeit: Mo 16:00-18:00 (wöchentlich)

Zentralanmeldung

Ort: A 125 Geschw.-Scholl-Pl. 1 (A)
Zeit: Mo 12:00-14:00 (wöchentlich)

Zentralanmeldung

Ort: 105 Akademiestr. 7
Zeit: Do 10:00-12:00 (wöchentlich)

Zentralanmeldung