The Case for Evaluating Model Behaviors

Most evaluations of AI systems focus on their capabilities: how good they are at coding tasks, how effectively they can answer complex scientific questions, and so on. From a safety perspective, capability evaluations have a place: by understanding how close we are to different capabilities, and the rate of progress on them, we can forecast when different risks are likely to occur, as well as the broad shape of AI development. These capability evaluations were very useful to me when writing GPT-2030 , and more recently I've found the METR time horizon graph useful for extrapolating the likely degree of autonomy of future agents. However, these evaluations also have pretty significant externalities: accurate capability measurements speed up capability research, and the work needed to fully elicit model capabilities involves developing agent scaffolds and other artifacts that directly advance model capabilities. This also means that AI labs are already highly incentivized to produce such

Read Original Article →

Source

https://www.alignmentforum.org/posts/J5KkwYnnaeNX7hL2s/the-case-for-evaluating-model-behaviors