Power-seeking agents will likely be developed

I am going to argue that we will likely eventually get AIs that are strongly power-seeking, much more so than current SOTA LLMs. [1] TLDR Right now SOTA LLMs are still largely in a simulator regime. This buffers against power-seeking. Long-horizon RL or similar methods (applied to LLMs or otherwise) will turn AIs into consequentialists, motivating power-seeking. It will likely be difficult to prevent other actors from building consequentialist AI without leading labs being prepared to do so themselves. Instrumental convergence does not apply to pretraining LLM pretraining and SFT can be understood as creating a simulator . The model learns to imitate the continuation of the training distribution conditioned on the prompt. Note that a simulator, in this sense, does not optimize for simulation [2] ; for example, it will not be inclined to harvest compute to improve its simulations. This is because simulators are consequence-blind : they don’t take into account the effects of their action

Read Original Article →

Source

https://www.lesswrong.com/posts/CtnHpECuoq6eLL8fu/power-seeking-agents-will-likely-be-developed