PSA: Almost nobody is directly working on superintelligent alignment

Edit: The original title was unnecessarily provocative. This was a very quick post inspired by talking to someone who assumed that a large fraction of the safety community are working on directly figuring out how to align superintelligent AIs. Obviously much (all?) of what the rest of the safety community is doing is also ultimately aimed at bringing about a future where superintelligent AIs are aligned but more indirectly and we wanted to created common knowledge about that. (While being neutral about whether this is good or bad. As mentioned, notably we both work on AI safety and neither of us work on alignment.) There’s also lots of work where it’s debatable whether it’s directly working on alignment but that’s kind of the point of the post. There’s not that much work that unarguably directly tries to figure out superintelligent alignment. Leaving the list below as is for now despite not that strong confidence/opinions on how exactly we should draw the line since it doesn't seem that important for the core message of this post. People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human instructions. Currently, the people who we know of that work on alignment are roughly: The Alignment Research Center who work on a research bet by Paul Christiano Probably Sequent who just got announced yesterday Parts of GDM (agent foundations work, some debate work) Some scattered people who work at universities or independently, some of whom hang around Berkeley ?? A lot of the remainder of the AI safety community does indirect work like capability evaluations, risk assessments, control, policy, AI science, understanding misalignment (which maybe should partially count as alignment work), demos and so on. Some production alignment work (i.e., making current models behave well) might help with more ambitious alignment, too (e.g., some COT-monitoring). Many people also work on aligning current/next-generation models so that these models help with aligning future models, and hope this scales to superintelligence. We are not necessarily saying this is bad and that people are making a big mistake (e.g., neither of us work on alignment) but it's a notable fact that seems good to make known to those who don't know about it. Discuss

Read Original Article →

Source

https://www.lesswrong.com/posts/kJo2qsEdib8RZLvW6/psa-almost-nobody-is-directly-working-on-superintelligent