AI Data Crisis and Failed Markets

This is a nice short paper/read on how (relatively) small the training data for AI systems are, and due to data moats and hoarding, advances in AI models may start to become limited.

When a data owner shares a piece of data, the owner loses all control over how it will be used, copied, and shared further. When the owner sells a piece of data, they don’t sell the original data — they sell a copy. When a dataset is copied, the global supply goes up, the price goes down, and every customer becomes a competitor for the future sale and use of that data.

This is a nice paper covering this topic and other elements of the AI data crisis.

read more

Bookmark: Open Source Charting Library

Happy to have stumbled across a new open source charting library for JS from the Apache Foundation. Having used everything from paid version of Highcharts.com, ChartsJS, Plotly to D3.js … it’s always a pleasure to find such a broad, well supported and extensive collection.

Bookmarks: Emerging Architectures for Modern Data Infrastructure

As an industry, we’ve gotten exceptionally good at building large, complex software systems. We’re now starting to see the rise of massive, complex systems built around data – where the primary business value of the system comes from the analysis of data, rather than the software directly. We’re seeing quick-moving impacts of this trend across the industry, including the emergence of new roles, shifts in customer spending, and the emergence of new startups providing infrastructure and tooling around data.

In fact, many of today’s fastest growing infrastructure startups build products to manage data. These systems enable data-driven decision making (analytic systems) and drive data-powered products, including with machine learning (operational systems). They range from the pipes that carry data, to storage solutions that house data, to SQL engines that analyze data, to dashboards that make data easy to understand – from data science and machine learning libraries, to automated data pipelines, to data catalogs, and beyond. read more