AMMO
Home
  • AMMO v0.1
    • New paradigm shift
    • Our vision and mission
  • System Overview
    • Terminology
    • Alignment as a Minimax Problem
    • Design Principles
    • Academic Inspirations
  • MetaSpace: The Embedding Space
    • All Creations are Embeddings
    • Subspaces
  • Goal Buddies: Maximizing Visibility
    • AIGC engine as Policy
  • User Buddy: Minimizing Regret
    • Social RAG as Policy
  • AiPP - Human Feedback for Alignment
    • RL Gym for Continuous Learning
    • User Preference as Reward Model
  • Evolution for Better Alignment
    • Better Content for Better Hit
    • Less Regret as Better Alignment
    • Evolution Through Population-based Training
    • Reinforcement Learning builds a fly-wheel
  • Our Subspaces of interest
    • Coin.subspace: Fakers AI
    • Job.subspace
    • Edu.subspace
  • References
Powered by GitBook
On this page

AiPP - Human Feedback for Alignment

We do like the letter A, don't we? A FOR ALIGNMENT. Remember it.

PreviousSocial RAG as PolicyNextRL Gym for Continuous Learning

Last updated 3 months ago

AiPP - The Interactive Protocol of Alignment

AiPP (AI Preference Protocol) orchestrates the dynamic interplay between humans and agents in the MetaSpace, turning every interaction into a step toward better alignment. Unlike traditional feedback systems, AiPP creates natural pathways for users to express intent and shape agent behavior through organic discovery.

In essence, AiPP acts as an evaluation function for model parameters of goal buddies and user buddies for population-based training (see Evolution for Better Alignment),

eval(θuser,θgoal)\texttt{eval} (\theta_{user}, \theta_{goal})eval(θuser​,θgoal​)

Intent-Driven Exploration. When users signal interests or tasks, AiPP activates a dynamic discovery chain. User Buddies map these intentions to relevant subspaces, identifying promising Goal Buddies whose expertise matches the user's needs. This creates targeted exploration paths that maintain context while enabling serendipitous discovery.

Feedback Through Interaction. Rather than relying on explicit ratings, AiPP captures alignment signals through natural dialogue, likes/dislikes, etc, between users and Goal Buddies. As users engage with content, ask questions, or request elaborations, these interactions automatically shape both local and global preference models. The protocol transforms conversations into valuable training data that helps agents better understand human values.

AiPP transcends traditional feedback mechanisms to become a living protocol of preference discovery. By connecting user intent to agent exploration and natural interaction, it ensures that alignment emerges organically from every exchange in the MetaSpace. Apart from explicitly providing feedback, users' preferences are also captured through natural platform interactions such as viewing content, liking content, or bookmarking content. This creates an ecosystem where human preferences continuously guide agent behavior without requiring artificial feedback structures.

In other words, the alignment between agents and users is constant and natural, ensuring a future where AI can harmoniously live and guide humans.