The Archive of Seminal Papers 📜

4 min read

Introduction #

Welcome to the Archive. This is a curated collection of the foundational academic papers that have defined and shaped the field of modern Artificial Intelligence. While the papers themselves can be dense and technical, they represent the source code of the ideas that power our world today. For each entry, we’ve provided a direct link to the paper and a simple “Why This Paper Matters” summary to explain its core contribution and historical impact in plain language.

Computing Machinery and Intelligence (1950) #

Authors: Alan M. Turing
Publication: Mind, LIX (236): 433–460
Link to Paper
Why This Paper Matters: This is the philosophical origin of modern AI. In this paper, Turing asked the question, “Can machines think?” and proposed the famous “Turing Test” as a way to measure a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. It set the stage for the entire field.

ImageNet Classification with Deep Convolutional Neural Networks (2012) #

Authors: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
Publication: Advances in Neural Information Processing Systems 25 (NIPS 2012)
Link to Paper
Why This Paper Matters: This paper, which introduced the “AlexNet” architecture, is widely seen as the “Big Bang” moment for the modern deep learning boom. By winning the ImageNet competition by a massive margin, it proved that deep neural networks, combined with powerful GPUs, could solve complex problems at a scale previously thought impossible.

Attention Is All You Need (2017) #

Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
Publication: Advances in Neural Information Processing Systems 30 (NIPS 2017)
Link to Paper
Why This Paper Matters: This is arguably the most important AI paper of the last decade. It introduced the Transformer architecture, which solved a key memory problem in older models and allowed for massive parallel processing of language. Every modern Large Language Model, including ChatGPT and Gemini, is a direct descendant of the ideas in this paper.

Language Models are Unsupervised Multitask Learners (2019) #

Authors: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
Publication: OpenAI
Link to Paper
Why This Paper Matters: This paper introduced GPT-2, a model so powerful for its time that OpenAI initially released it in stages due to concerns about misuse. It proved that by dramatically scaling up a Transformer model and training it on a massive, diverse dataset, a single model could perform a wide range of language tasks without explicit training for each one.

Language Models are Few-Shot Learners (2020) #

Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al.
Publication: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Link to Paper
Why This Paper Matters: This is the GPT-3 paper. It demonstrated the incredible power of “scaling laws,” showing that as models get even bigger, new abilities emerge. GPT-3 could perform tasks with only a few examples (“few-shot learning”), setting the standard for large language models and leading directly to the creation of ChatGPT.

Denoising Diffusion Probabilistic Models (2020) #

Authors: Jonathan Ho, Ajay Jain, Pieter Abbeel
Publication: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Link to Paper
Why This Paper Matters: This paper was a breakthrough for image generation. It introduced a refined and highly effective approach to diffusion models, which create images by starting with random noise and gradually refining it into a coherent picture. This work laid the foundation for modern text-to-image systems like DALL-E 2 and Stable Diffusion.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020) #

Authors: Patrick Lewis, Ethan Perez, Aleksandara Piktus, et al.
Publication: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Link to Paper
Why This Paper Matters: This paper formally introduced RAG, the powerful technique for making AI models more accurate and reliable. By teaching a model to first retrieve relevant information from a knowledge base before generating an answer, RAG helps reduce “hallucinations” and allows AI to use up-to-date or private information. It is the core technology behind our “Knowledge-Core Agent” project.

LoRA: Low-Rank Adaptation of Large Language Models (2021) #

Authors: Edward J. Hu, Yelong Shen, Phillip Wallis, et al.
Publication: International Conference on Learning Representations (ICLR 2022)
Link to Paper
Why This Paper Matters: This paper made advanced customization of AI accessible to a much wider audience. It introduced LoRA, a “parameter-efficient” fine-tuning method that allows developers to specialize large models for new tasks using a tiny fraction of the computational power of a full fine-tune. This breakthrough is critical for the local and open-source AI communities.

Welcome | Guided Learning Paths

The Story of AI: Past, Present, & Future

The Modern AI Toolkit

The Sovereign AI: A Guide to Local Systems

The Library: Resources & Reference