NeurIPS 2022 Workshop on Human Evaluation of Generative Models
With rapid advances in generative models for both language and vision modalities, such as GPT-3, DALL-E, CLIP, and OPT, human evaluation of these systems is critical to ensure that they are meaningful, reliable, and aligned with the values of those who need them. These human evaluations are often trusted as indicators of whether models are safe enough to deploy, so it is important that these evaluations themselves are reliable. Several applications relying on these models have since emerged. Aside from the private sector, even governments are increasingly using generative models such as chatbots to better serve their citizens. However, the community also faces a lack of clarity around how to best conduct human evaluations (and what to even evaluate for). It is thus unclear whether prior established practices are sufficient given the socio-technical challenges posed by these systems. Recognizing the successes and socio-technical challenges associated with these technologies, this workshop aims to bring together researchers, practitioners, policy thinkers and implementers, and philanthropic funders to discuss major challenges, outline recent advances, and facilitate future research in these areas.
Partnership with the Day One Project
In partnership with the Day One Project—a project of the Federation of American Scientists, an impact driven policy think tank, that aims to develop subject matter experts into policy enterpreneurs—we will also select a few papers with clear policy implications and recommendations, and invite authors to write policy memos and work to implement those policy recommendations. Finally, we will capture the discussions that happen during our panels in a paper that will summarize the workshop recommendations and seek to publish that work for scholarly record.
Key Dates
- Submission Deadline: September 22nd, 23:59 GMT
- Accept/Reject notifications: October 20th
- Workshop: December 3rd, in person in New Orleans
Topics of interest for submission include but are not limited to the following:
- Experimental design and methods for human evaluations
- Role of human evaluation in the context of value alignment of large generative models
- Designing testbeds for evaluating generative models
- Reproducibility of human evaluations
- Ethical considerations in human evaluation of computational systems
- Issues in meta-evaluation of automatic metrics by correlation with human evaluations
- Methods for assessing the quality and the reliability of human evaluations
Contact us at hegm-workshop at lists.andrew.cmu.edu if you have any questions
Organizers
Divyansh Kaushik | Carnegie Mellon University; Federation of American Scientists |
Jennifer Hsia | Carnegie Mellon University |
Jessica Huynh | Carnegie Mellon University |
Yonadav Shavit | Schmidt Futures |
Samuel R. Bowman | New York University; Anthropic |
Ting-Hao Kenneth Hunag | Penn State University |
Douwe Kiela | Huggingface |
Zachary C. Lipton | Carnegie Mellon University |
Eric Smith | Facebook AI Research |