So what does “generative AI” have to do with AGI?  We’ll cover universal compression (Kolmogorov complexity) and universal induction (Solomonoff induction) and why the two are essentially equivalent. We’ll then look at AIXI, a simple model of superintelligent AGI which combines universal induction with optimal planning. The idea that compression and sequential decision making are all you need for AGI was arguably the bet that DeepMind was making in its early days. From an alignment perspective, AIXI is a constructive proof of the Orthogonality Thesis: despite being extremely smart, it continues optimizing whatever silly objective you gave it. It’s also really hard to specify goals directly because its internal representations are opaque.

LLMs are really good compressors, though (fortunately) we don’t yet have good enough sequential decision making algorithms to build an AIXI-like agent. How does LLM compression relate to universal induction?  Is it a good model for the limit of GPT-k for large k?  There are some suggestive similarities, but also some important differences (e.g. lots of background knowledge, limited sequential depth, lack of weight sharing -> hard to express iterative computations, no directed information acquisition).

Slides

L06_Compression.pdf

Scaling Laws

Supplemental Readings