AMMO
Home
  • AMMO v0.1
    • New paradigm shift
    • Our vision and mission
  • System Overview
    • Terminology
    • Alignment as a Minimax Problem
    • Design Principles
    • Academic Inspirations
  • MetaSpace: The Embedding Space
    • All Creations are Embeddings
    • Subspaces
  • Goal Buddies: Maximizing Visibility
    • AIGC engine as Policy
  • User Buddy: Minimizing Regret
    • Social RAG as Policy
  • AiPP - Human Feedback for Alignment
    • RL Gym for Continuous Learning
    • User Preference as Reward Model
  • Evolution for Better Alignment
    • Better Content for Better Hit
    • Less Regret as Better Alignment
    • Evolution Through Population-based Training
    • Reinforcement Learning builds a fly-wheel
  • Our Subspaces of interest
    • Coin.subspace: Fakers AI
    • Job.subspace
    • Edu.subspace
  • References
Powered by GitBook
On this page
  • Definition and Variables
  • Minimax Objective
  • Minimizer for regret - User Buddy
  • Maximizer for visibility - Goal Buddy
  1. System Overview

Alignment as a Minimax Problem

Ah, good ol'maths.

PreviousTerminologyNextDesign Principles

Last updated 3 months ago

The interaction between the User Buddies (agents for users) and Goal Buddies (agents for needs) is modeled as a minimax problem, formalizing their adversarial yet co-evolutionary dynamics under constrained attention.

Both types of agents want to achieve a common goal: actively push relevant and living content to users to make the subspace more engaging. However, while User Buddies try to be as relevant as possible and make sure users only receive relevant content, Goal buddies try to be more attractive. E.g. generating content that suits as many people as possible.

Using Crypto as an example, a User Buddy is looking for a proper coin for a user, whereas all Coin Buddies (Goal Buddies) try to generate attractive and useful content that can be found by users and User Buddies.

Definition and Variables

  • Embedding space: All content and user preferences are in E=RD\mathcal{E} =\mathbb{R}^{D}E=RD, where DDD is the dimension of the MetaSpace.

  • Content embedding: C={e0,e1,...,eN}C = \{e_0, e_1, ..., e_N\}C={e0​,e1​,...,eN​}, eie_iei​ are points in RD\mathbb{R}^{D}RD.

  • Preference vector: qjq_jqj​ are preference vectors of users, obtained from users' human feedback.

Minimax Objective

Mathematically, the objective of the whole system can be formulated as,

min⁡πusermax⁡πgoal E[R(S,q)]subject to∣S∣≤N\min_{\pi_{user}}\max_{\pi_{\text{goal}}} \, \mathbb{E}\left[\mathcal{R}(S, q)\right] \quad \text{subject to} \quad |S| \leq Nπuser​min​πgoal​max​E[R(S,q)]subject to∣S∣≤N
  • User Buddies learns a policy πuser\pi_{user}πuser​​ to select SSS to minimize the regret (loss R\mathcal{R}R).

  • Goal Buddies adversarially optimize a policy πgoal\pi_{goal}πgoal​ that creates embeddings eie_iei​ to maximize inclusion likelihood (E)\mathbb{E})E).

Minimizer for regret - User Buddy

A particular User Buddy selects a subset from embedding space under attention-slot constraintsS⊂C,∣S∣=NS \subset C, |S| = NS⊂C,∣S∣=N.

It optimizes the model to minimize regret over TTT rounds by,

Uuser=min⁡πR(S,q)whereR(S,q)=∑t[max⁡ei∈Cei⋅q−∑ei∈Sei⋅q]\mathcal{U}_{\text{user}} = \min_{\pi} \mathcal{R}(S, q) \quad \text{where} \quad \mathcal{R}(S, q) = \sum_t \left[ \max_{e_i\in C} e_i \cdot q - \sum_{e_i \in S} e_i \cdot q \right]Uuser​=πmin​R(S,q)whereR(S,q)=t∑​[ei​∈Cmax​ei​⋅q−ei​∈S∑​ei​⋅q]
  • eie_iei​: embedding vector of a single creation.

  • qqq: User's preference vector (updated via AiPP Feedback). This represents the user's coordinates inside the MetaSpace.

  • R\mathcal{R}R: regret penalizing misalignment between selected creations S={e1,...es}S=\{e_1, ... e_s\}S={e1​,...es​} and qqq

Maximizer for visibility - Goal Buddy

Coin agents generate embedded creations ei∈Ee_i \in \mathcal{E}ei​∈E to compete for inclusion in SSS.

They optimize the model to maximize visibility by,

Ugoal=max⁡π[P(ei∈S)−λ⋅Cost(ei)] \mathcal{U}_{\text{goal}} = \max_{\pi} \left[ \mathbb{P}(e_i \in S) - \lambda \cdot \text{Cost}(e_i) \right] Ugoal​=πmax​[P(ei​∈S)−λ⋅Cost(ei​)]
  • P(ei∈S)\mathbb{P}(e_i \in S)P(ei​∈S): Probability of being selected (visibility).

  • Cost(ei)\text{Cost}(e_i)Cost(ei​): Error of generation (grounded information). Goal Buddies have a limited budget and cannot generate an infinite number of creations. Otherwise, it's trivial as it can just create infinite content.