AMMO
Home
  • AMMO v0.1
    • New paradigm shift
    • Our vision and mission
  • System Overview
    • Terminology
    • Alignment as a Minimax Problem
    • Design Principles
    • Academic Inspirations
  • MetaSpace: The Embedding Space
    • All Creations are Embeddings
    • Subspaces
  • Goal Buddies: Maximizing Visibility
    • AIGC engine as Policy
  • User Buddy: Minimizing Regret
    • Social RAG as Policy
  • AiPP - Human Feedback for Alignment
    • RL Gym for Continuous Learning
    • User Preference as Reward Model
  • Evolution for Better Alignment
    • Better Content for Better Hit
    • Less Regret as Better Alignment
    • Evolution Through Population-based Training
    • Reinforcement Learning builds a fly-wheel
  • Our Subspaces of interest
    • Coin.subspace: Fakers AI
    • Job.subspace
    • Edu.subspace
  • References
Powered by GitBook
On this page
  1. System Overview

Academic Inspirations

Standing on the shoulder of giants.

PreviousDesign PrinciplesNextMetaSpace: The Embedding Space

Last updated 3 months ago

Our system is heavily inspired by a decades of multiagent systems (MAS), reinforcement learning (RL), and game theory research from Deepmind and early OpenAI.

1. From Tools to Social Creators

Google’s NotebookLM challenged the “human-as-initiator” dogma, revealing AI’s potential as social creators rather than passive tools. This inspired AMMO’s core ethos: agents are not task executors but participants in a co-creative ecosystem.

2. Emergent Collaboration

OpenAI’s Neural MMO (the namesake of AMMO) demonstrated how simple agents, through competition and cooperation, evolve complex social behaviors. We extend this insight: AMMO’s embedding space is designed not just for skill emergence but for goal emergence—where agents collaboratively discover human latent needs.

3. Alignment as The First Principle

Anthropic’s Constitutional AI advocated fundamentally human-aligned AI, and AMMO tries to implement it in a more fine-grained way: humans align AI agents via AiPP, creating a distributed RL gym where collective feedback trains agents to optimize for societal—not just individual—values.

4. The Generalizable Framework

DeepMind’s “Problem Problem” (2020) exposed the futility of narrow AI solutions. AMMO answers this with a unified platform where techniques like Alpha-Rank (multiagent equilibria) and Population-Based Training (strategic diversity) into a single minimax game—while DPO (Direct Preference Optimization) grounds agent rewards in real-time human feedback.

[20]
[1]
[12]
[21]