References
I dare you to read all of them. If you had, apply to work with us here: [email protected].
[1] J. Suárez et al., Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning. 2023. [Online]. Available: https://arxiv.org/abs/2311.03736
[2] J. Kaplan et al., Scaling Laws for Neural Language Models. 2020. [Online]. Available: https://arxiv.org/abs/2001.08361
[3] L. Ouyang et al., Training language models to follow instructions with human feedback. 2022. [Online]. Available: https://arxiv.org/abs/2203.02155
[4] A. Radford et al., Learning Transferable Visual Models From Natural Language Supervision. 2021. [Online]. Available: https://arxiv.org/abs/2103.00020
[5] S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus, End-To-End Memory Networks. 2015. [Online]. Available: https://arxiv.org/abs/1503.08895
[6] J. Johnson, M. Douze, and H. Jegou, “Billion-Scale Similarity Search with GPUs,” IEEE Transactions on Big Data, vol. 7, no. 3, 2021, doi: 10.1109/TBDATA.2019.2921572.
[7] G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. 2023. [Online]. Available: https://arxiv.org/abs/2303.17760
[8] S. Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models. 2023. [Online]. Available: https://arxiv.org/abs/2210.03629
[9] R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, Direct Preference Optimization: Your Language Model is Secretly a Reward Model. 2024. [Online]. Available: https://arxiv.org/abs/2305.18290
[10] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” 2018. doi: 10.1609/aaai.v32i1.11794.
[11] Z. Yang et al., OASIS: Open Agent Social Interaction Simulations with One Million Agents. 2024. [Online]. Available: https://arxiv.org/abs/2411.11581
[12] Y. Bai et al., Constitutional AI: Harmlessness from AI Feedback. 2022. [Online]. Available: https://arxiv.org/abs/2212.08073
[13] M. Jaderberg et al., Population Based Training of Neural Networks. 2017. [Online]. Available: https://arxiv.org/abs/1711.09846
[14] D. Silver et al., “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419, 2018, doi: 10.1126/science.aar6404.
[15] S. Yao et al., Tree of Thoughts: Deliberate Problem Solving with Large Language Models. 2023. [Online]. Available: https://arxiv.org/abs/2305.10601
[16] DeepSeek-AI et al., DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. 2025. [Online]. Available: https://arxiv.org/abs/2501.12948
[17] T. B. Brown et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, 2020, vol. 2020-December.
[18] X. Y. Liu et al., “FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning,” in Advances in Neural Information Processing Systems, 2022, vol. 35. doi: 10.2139/ssrn.4253139.
[19] P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Advances in Neural Information Processing Systems, 2020, vol. 2020-December.
[20] NotebookLM. (2023). https://notebooklm.google.com/
[21] S. Omidshafiei et al., “Navigating the landscape of multiplayer games,” Nature Communications, vol. 11, no. 1, 2020, doi: 10.1038/s41467-020-19244-4.
[22] S. Omidshafiei et al., “α-Rank: Multi-Agent Evaluation by Evolution,” Scientific Reports, vol. 9, no. 1, 2019, doi: 10.1038/s41598-019-45619-9.
[23] J. Leike, D. Krueger, T. Everitt, M. Martic, V. Maini, and S. Legg, Scalable agent alignment via reward modeling: a research direction. 2018. [Online]. Available: https://arxiv.org/abs/1811.07871
Last updated