Alignment as a Minimax Problem

Ah, good ol'maths.

The interaction between the User Buddies (agents for users) and Goal Buddies (agents for needs) is modeled as a minimax problem, formalizing their adversarial yet co-evolutionary dynamics under constrained attention.

Both types of agents want to achieve a common goal: actively push relevant and living content to users to make the subspace more engaging. However, while User Buddies try to be as relevant as possible and make sure users only receive relevant content, Goal buddies try to be more attractive. E.g. generating content that suits as many people as possible.

Using Crypto as an example, a User Buddy is looking for a proper coin for a user, whereas all Coin Buddies (Goal Buddies) try to generate attractive and useful content that can be found by users and User Buddies.

Definition and Variables

Embedding space: All content and user preferences are in $\mathcal{E} =\mathbb{R}^{D}$ , where $D$ is the dimension of the MetaSpace.
Content embedding: $C = \{e_0, e_1, ..., e_N\}$ , $e_i$ are points in $\mathbb{R}^{D}$ .
Preference vector: $q_j$ are preference vectors of users, obtained from users' human feedback.

Minimax Objective

Mathematically, the objective of the whole system can be formulated as,

\min_{\pi_{user}}\max_{\pi_{\text{goal}}} \, \mathbb{E}\left[\mathcal{R}(S, q)\right] \quad \text{subject to} \quad |S| \leq N

User Buddies learns a policy $\pi_{user}$ to select $S$ to minimize the regret (loss $\mathcal{R}$ ).
Goal Buddies adversarially optimize a policy $\pi_{goal}$ that creates embeddings $e_i$ to maximize inclusion likelihood ( $\mathbb{E})$ .

Minimizer for regret - User Buddy

A particular User Buddy selects a subset from embedding space under attention-slot constraints $S \subset C, |S| = N$ .

It optimizes the model to minimize regret over $T$ rounds by,

\mathcal{U}_{\text{user}} = \min_{\pi} \mathcal{R}(S, q) \quad \text{where} \quad \mathcal{R}(S, q) = \sum_t \left[ \max_{e_i\in C} e_i \cdot q - \sum_{e_i \in S} e_i \cdot q \right]

$e_i$ : embedding vector of a single creation.
$q$ : User's preference vector (updated via AiPP Feedback). This represents the user's coordinates inside the MetaSpace.
$\mathcal{R}$ : regret penalizing misalignment between selected creations $S=\{e_1, ... e_s\}$ and $q$

Maximizer for visibility - Goal Buddy

Coin agents generate embedded creations $e_i \in \mathcal{E}$ to compete for inclusion in $S$ .

They optimize the model to maximize visibility by,

\mathcal{U}_{\text{goal}} = \max_{\pi} \left[ \mathbb{P}(e_i \in S) - \lambda \cdot \text{Cost}(e_i) \right]

$\mathbb{P}(e_i \in S)$ : Probability of being selected (visibility).
$\text{Cost}(e_i)$ : Error of generation (grounded information). Goal Buddies have a limited budget and cannot generate an infinite number of creations. Otherwise, it's trivial as it can just create infinite content.

PreviousTerminology NextDesign Principles

Last updated 4 months ago