Alignment as a Minimax Problem
Ah, good ol'maths.
Last updated
Ah, good ol'maths.
Last updated
The interaction between the User Buddies (agents for users) and Goal Buddies (agents for needs) is modeled as a minimax problem, formalizing their adversarial yet co-evolutionary dynamics under constrained attention.
Both types of agents want to achieve a common goal: actively push relevant and living content to users to make the subspace more engaging. However, while User Buddies try to be as relevant as possible and make sure users only receive relevant content, Goal buddies try to be more attractive. E.g. generating content that suits as many people as possible.
Using Crypto as an example, a User Buddy is looking for a proper coin for a user, whereas all Coin Buddies (Goal Buddies) try to generate attractive and useful content that can be found by users and User Buddies.
Mathematically, the objective of the whole system can be formulated as,
They optimize the model to maximize visibility by,
Embedding space: All content and user preferences are in , where is the dimension of the MetaSpace.
Content embedding: , are points in .
Preference vector: are preference vectors of users, obtained from users' human feedback.
User Buddies learns a policy to select to minimize the regret (loss ).
Goal Buddies adversarially optimize a policy that creates embeddings to maximize inclusion likelihood (.
A particular User Buddy selects a subset from embedding space under attention-slot constraints.
It optimizes the model to minimize regret over rounds by,
: embedding vector of a single creation.
: User's preference vector (updated via AiPP Feedback). This represents the user's coordinates inside the MetaSpace.
: regret penalizing misalignment between selected creations and
Coin agents generate embedded creations to compete for inclusion in .
: Probability of being selected (visibility).
: Error of generation (grounded information). Goal Buddies have a limited budget and cannot generate an infinite number of creations. Otherwise, it's trivial as it can just create infinite content.