AI Data Crisis and Failed Markets

This is a nice short paper/read on how (relatively) small the training data for AI systems are, and due to data moats and hoarding, advances in AI models may start to become limited.

When a data owner shares a piece of data, the owner loses all control over how it will be used, copied, and shared further. When the owner sells a piece of data, they don’t sell the original data — they sell a copy. When a dataset is copied, the global supply goes up, the price goes down, and every customer becomes a competitor for the future sale and use of that data.

This is a nice paper covering this topic and other elements of the AI data crisis.