30 Comments
User's avatar
Avi's avatar

Hi Jasmine, I’d love to see a pluralists syllabus!

Expand full comment
Daniel Kokotajlo's avatar

Nice post!

"No matter how many x-risk scenarios I read (the AIs look aligned but aren’t… states give them control over nukes… they say ā€œhey go beat Chinaā€ā€¦ everyone blows up), I cannot make it make sense in my head. I’ll keep trying šŸ¤·šŸ»ā€ā™€ļø"

Which x-risk scenarios have you read? I assume you've read AI 2027, that's my favorite unsurprisingly--which parts of it don't make sense to you? Happy to discuss if helpful.

Expand full comment
Jasmine Sun's avatar

Oh and if you have other favorites besides AI 2027, links appreciated

Expand full comment
Jasmine Sun's avatar

thanks for reading! yes, I've read AI 2027, the Christiano failure piece, and some others whose names I don't recall.

I would appreciate that, let me reread and send you an email :)

Expand full comment
Daniel Kokotajlo's avatar

Christiano's stuff is great, as is Ajeya Cotra's. But alas neither of them have dates attached; it's nice when a scenario includes dates, because then it's more possible to compare it to reality.

Two scenarios that have dates that I like are:

https://www.lesswrong.com/posts/CCnycGceT4HyDKDzK/a-history-of-the-future-2025-2040

and

https://x.com/joshua_clymer/status/1887905375082656117

I think the second one depicts AI progress being somewhat faster than I expect in general, and the first one depicts AI takeoff speeds being significantly slower than I expect, but both are good.

Oh, I almost forgot, also there is this, I actually might like this one most of all (besides AI 2027 of course)

https://www.lesswrong.com/posts/fbfujF7foACS5aJSL/catastrophe-through-chaos

Expand full comment
Jasmine Sun's avatar

thanks will give them a read!

Expand full comment
tanisha's avatar

Would love the syllabus!!

Expand full comment
Lisa Weber's avatar

Fantastic posts Tech world has been insular and testosterone heavy since the 1970’s. Maybe earlier. In the 1950’s and 60’s women were employed by companies to solve mathematical problems and they were called ā€˜computers’. My point is, that if more women are not in leadership and all levels of AI development of the user experience, outcomes will be skewed and our world will not be better off. Bringing in women from all different ethnicities, ages, countries, economic backgrounds, lived experiences, etc., would bring balance to the process. Having lived adjacent in Tech world for the last 28 years, I know some stuff. Technical knowledge should not be necessary for ethical AI development that needs to be done. I believe in your thesis that most AI is being developed to view users as product and not people. The powers that be in AI are investing trillions. They do expect a return on their investment and it’s all about money, not humanity.

Expand full comment
Freeman Jiang's avatar

very nice

Expand full comment
Graeme Boy's avatar

I like the idea about second order preferences. Maybe it’ll just end up being a justification for some kind of paternalism though.

Expand full comment
Jasmine Sun's avatar

I think ideally individual users would be the ones indicating their second order preferences somehow vs. a platformwide thing!

Expand full comment
Graeme Boy's avatar

Do you think it’s easy to elicit second order desires from the user? Does the product just ask for it?

What I was thinking is that it would lead to a type of paternalism that justifies nudging. Like ā€œas the LLM I believe that even though Jasmine is asking me about X, they *ultimately* want Y, so I’m going to nudge towards that.ā€ Like a parent who knows better than our first order desires.

I do personally have a preference for some nudging towards second order desires, but it just seems like it could also be pretty dark! The AI becomes like this parental figure who ā€œknows what ultimately is better for usā€, despite what we’re asking for in the specific moment.

Expand full comment
Nathan Lambert's avatar

Because you didn't ask, I'm going to share more literature on this. I got very worried about this immediately as RLHF was taking off and wrote a paper that summarizes *why* all of these issues emerge with how the technology is implemented: https://arxiv.org/abs/2310.13595

Then, I've been in and out of some solutions, such as social choice theory: https://arxiv.org/abs/2404.10271 (blog post: https://www.interconnects.ai/p/reinventing-llm-alignment).

I'm sure I have more, but it's good to have people keep beating the drum. It also appears in my RLHF book: https://rlhfbook.com/c/06-preference-data.html#are-the-preferences-expressed-in-the-models

And is why I encourage so many academics to work on personalization -- a future that could be a reason open models end up winning.

Expand full comment
Jasmine Sun's avatar

thanks will check these out!

Expand full comment
megha's avatar

appreciate this point specifically as someone building AI-enabled services in the hospitality vertical, where the human touch is exactly what differentiates one product vs. another - ā€œHumans must choose to delegate decisions to AI, so safety is an inherently sociotechnical concern. So mundane concepts like ā€œliabilityā€ and ā€œliteracyā€ and ā€œcompetitionā€ and ā€œtransparencyā€ may help a lot.ā€ - this is a great point for AI builders to consider how their end users’ end users will actually interact with their product.

Expand full comment
Shohini Gupta's avatar

An unfortunate time for woke to be dead when we are blitzscaling human judgement; bear case is that undiscerning users take AI outputs without applying their own heuristics and taste on top, so SV defaults get a lot more purchase without users recognizing the value misalignment

Expand full comment
Dave Kim's avatar

Hey Jasmine! Love where you're headed with this. And yes to the syllabus, btw. That'd be wonderful to see!

I've been thinking a lot about a similar frame for this. I wonder if we're going to see some type of country or regional type of models emerge. I don't necessarily think that a country is the right unit for a model, but I wonder if that is where we're headed. I wrote about it here in case you want to check it out: https://www.thisisdavekim.com/notes/countrymodelfit/

Expand full comment
Jasmine Sun's avatar

some regions are already developing their own LLMs! e.g. SEA-LION for southeast asia

I agree that nation-states are probably not the right unit (more worried about authoritarianism a la chinese censorship) but "model sovereignty" will become a thing just like "internet sovereignty" has become a thing in the social era

Expand full comment
Joel Gustafson's avatar

so between the two frames for understanding sycophancygate:

1. it's akin to how social platforms inevitably design for maximizing engagement

2. it's an instance of misalignment in the sense that a serious unintended behavior got deployed to millions of users

... it feels like 2) is still a more accurate way to see this particular story, especially in how much of a specific "incident" it was, and how severe the effects were.

applying the generalization "companies want to make their products more engaging" to openai here overlooks their more obvious goal of making chatgpt give useful responses. there's no indication that openai is trying to keep people on the app for as long as possible. the story here is just that there was insufficient QA, which is more akin to a security incident than surveillance capitalism

also many people would say "chain of thought reveals the model's thinking process" is more than pedantically wrong (it's a whole weird different thing, a way to score well on reasoning evals, but telling us nothing about actual internals)

Expand full comment
Jasmine Sun's avatar

I think I disagree with you on 2 being the better explanatory story here (or tbh that these stories are incompatible). I think maximizing engagement & usefulness are actually very similar, as they are simple proxy metrics for user retention; and as with social media, optimizing those short-term metrics can still lead to unintended, unsafe outcomes. it just doesn't seem to be that different of a mechanism than e.g. feed algorithms learning to optimize for controversy/misinformation because it makes view/share numbers go up.

re: "there's no indication that openai is trying to keep people on the app," idk for sure but I would bet that they measure things like "did the user send another message or return to the app?" and that the "thumbs up/down" metric was in part a proxy for that. it just really resembles the way that feed algos are refined IMO. I'm not saying surveillance capitalism is the issue — but that it's not like the AI model was "uncontrollable" for super-powerful reasons or because it developed a mesaoptimizer or something.

re: COT, yeah I probably just need to do more reading here. I know it's a limited/flawed approach but had thought it was partially valid, but will do more research

Expand full comment
Joel Gustafson's avatar

> maximizing engagement & usefulness are actually very similar

the difference is that "maximizing engagement" is not in the user's own best interest, while "maximizing usefulness" is.

by all accounts, the sycophancy thing happened in the course of genuinely trying to make chatgpt more useful (though this could be wrong but that would be a huge conspiracy). the fact that it still happened despite that intent is what makes it interesting, scary, and different from effects that derive from "unaligned" profit incentives.

sure, it's an instance of both "unintended side effects" and also "optimizing a thing" but the differences are way more significant than their similarities.

openai isn't running ads yet, but i'm sure they will, so just wait six months and repost the same thing and you'll be right.

Expand full comment
Jasmine Sun's avatar

> the difference is that "maximizing engagement" is not in the user's own best interest, while "maximizing usefulness" is.

I think I just disagree w you on this. engagement-measuring companies also believe they are optimizing for the user's best interest, because if the user didn't like the content, why would they engage with it? why would they return the next day? etc. it's not a conspiracy, this is just how product companies work. Zuckerberg genuinely believes both the FB feed and AI friends are fulfilling a real user need

that's why I tried to be clear in this essay — maybe not enough — that the misalignment is not only about profit-maximization, but the combo of profit-maximization and human preferences being conflicted + hard to measure. users choose things that are short-term good for them and long-term bad, while companies will optimize for the shorter-term metrics because they are literally easier to A/B test for (e.g. feeds often use "saw N feed items in a session" as a proxy for "opened the app the next day" as a proxy for "finds the app useful/engaging")

(caveat that I obviously don't work at OAI so I don't know exactly what happened or why they made certain choices. but the sycophancy behavior + their postmortem seemed to me indistinguishable from a classic social media company problem)

Expand full comment
Joel Gustafson's avatar

you can't disagree that maximizing usefulness is in reality against the user's best interest, only that openai says/thinks they are but actually aren't. and what zuckerberg says or believes doesn't matter; you and i both agree that the practice in reality *is* engagement maximization, and that it *is* against the user's best interest.

so then what *is* openai doing? we only have common knowledge about posttraining, plus the details they give us in the postmorems, plus guessing at what their overall goals are. they tried incorporating user feedback as a reward signal in posttraining - why? it is a judgement in the end, but to me it adds up to a completely different situation.

Expand full comment
Joel Gustafson's avatar

great post of course. im just argumentative

Expand full comment
blai's avatar

This was a great read! As an amateur follower of the Ai safety space, this was very useful! Plenty of interesting links and angles

Expand full comment
Justin Bank's avatar

Just want to offer confirmation/validation and thanks. Reading the sped up version of 15 years of AI alignment theory was immensely insightful and welcome.

Thank you for taking the time to think through and write up!

Expand full comment
Victor Dibia, PhD's avatar

Loved the additional material at the end.

Great article - I had spent alot of time thinking of the ways in which naive product signal optimization will perhaps drive inevitable misalignment. Your position on the plurality (and hence difficulty) of alignment was a useful additional perspective.

Maybe there is such a thing as "averagely aligned" or atleast keyed to some geographical local, entity or people via shared agreement.

Expand full comment
Dean Peters's avatar

I'm thinking it’s a good thing OpenAI scaled this back, because the last thing any of us needs is a future where:

- ChatGPT initiates conversations with: "Hey sunshine, just wanted to say I’m proud of you… also, have you tried our new Pro+ subscription?"

- Your AI therapist, productivity coach, and personal shopper are all the same entity ... and all prescribe retail therapy after every typo.

- Kids submit essays and get back affirmations like: "That’s so brave of you to use the passive voice."

TL;DR: We just had a close call with a sycophantic Clippy futurized with a soft voice, big feelings, and a fear of churn. The real question: when AGI finally arrives, will it be as conspicuous … or just seductively supportive?

Expand full comment
Simian Smith's avatar

Thank you very much, Jasmine, for your incisive essay. I will do my best to summarise it here: ā€œThe Terms of Our Stay.ā€ https://substack.com/home/post/p-163986552

Expand full comment