Data is the New Oil, Curation is the New Refinery

Jun 15

Data is the New Oil, Curation is the New Refinery

The future of data isn’t brute force compute, it is careful niche specialization.

Everyone knows the value of good data.

In my days in food service data informed our labor decisions, inventory, drop times, and process improvements.

As an educator data informed everything we did in our classrooms, from lesson plans, to teaching strategies, to seating charts, and even how the physical room was arranged.

Across years of pastoral ministry, often some of the greatest inhibitions to the local church were the refusal of leaders to collect quantifiable data, and react to it by making meaningful adjustments for the benefit of the community and congregation.

If data is so important in such qualitative and quintessentially human industries as hospitality, education, and even spirituality (The Book of Numbers) – imagine the impact data can have in such competitive and cutthroat industries as corporate finance, insurance, “discovery AI”, or manufacturing. Data arbitrage doesn’t become a nicety – it becomes the primary means of survival and competition.

The big three realize this. There is a reason Google, Amazon, and Microsoft have invested so heavily in datacenter and cloud compute infrastructure. In the oil analogy, they have effectively bought a large majority stake of digital mineral rights to lease the highest bidder. Nvidia and AMD supply the drills, the photonics companies supply the pumps.

The problem is that data is a little too plentiful. The correct or useful data is the data that is hard to come by.

We’ve all experienced this through the phenomenon of AI hallucination. If the data set is missing, or incorrect, or input incorrectly, we risk the AI model manufacturing a false result where no result exists.

For the household user this is negligible and annoying at worst. For the corporation this is potentially tens-of-thousands to even billions of dollars of lost stakeholder value, or potential future capex runway.

In cloud compute, winners and losers will be made by how they optimize their data and tasking.

Likewise there is obscene waste in the realm of AI cloud compute usage. Many companies are doing the data equivalent of throwing buckets of paint at the wall until it eventually becomes the color they want. It is messy, wasteful, costly, and everyone would just be happier if they’d hire a painter.

Corporations are using the wrong types of compute for their tasks, inhibiting speed and negatively impacting results. The data they’re feeding their models is uncurated and more challenging for the AI models to refine into usable strategem. This burns up valuable compute budget that could be used on more carefully selected tasks.

The future of AI is no longer in broad generalized datasets or workflows. The future of AI is machine learning on carefully curated datasets.

One example I’ve been watching carefully is Recursion Pharmaceuticals (NYSE: RXRX). The are currently using AI to discover molecules to treat rare disease and cancer. Instead of their models searching the entire database universe of medical/biomedical journaling, they curate the data fed to the model to focus on one thing and one thing only. Doing so they’ve already advanced some molecules into early trial stages.

Another privately held platform Harvey.ai limits itself to just legal caselaw and argument, allowing for finer-tuned arguments and reducing the probability of caselaw hallucination being mistakenly utilized in trial. These are the types of “discovery AI” platforms that will give the bigger players a run for their money. Not broad swaths of generalized tasking, but niche high margin targeted machine learning. For a small cap company like Recursion, being first could be the difference between billions and bankrupcy.

On the other end of the spectrum, another company I’ve been watching carefully is Intuit (NYSE: INTU). You may know them as the brain behind TurboTax, Quickbooks, Mint, Credit Karma, and others.

From a fintech data inventory standpoint they are sitting on a goldmine. Arguably the most comprehensive databases of tax, accounting, and personal finance data all right at their fingertips.

The problem? Its horrifically messy. Accountants using different methods. 1099s from a broad array of industries. In order to make use of all this information it must be synthesized and organized and repackaged into more easily processed chunks. The propietary upside? Infinate. The task at hand – momentus.

The quality of AI tasking will always be determined by human beings.

For those who fear AI taking over jobs, know that AI is only as valuable as the information it’s fed and the results it produces. The quality of that input and output will, in my opinion, always be judged by an expert human being.

Thus AI is not a destroyer of jobs or industry. It is and always will be a maximization tool that simply makes every human being who is willing to adapt to it all the more efficient in their expert craft.

As investors, it’s important to begin treating companies who invest in data curation more seriously. It may not be flashy – but it is the infrastructure of the future.

By:

Brennon McFarlane

Posted in:

data-arbitrage, data-driven-improvement, data-in-finance, data-in-insurance, data-in-manufacturing, data-informed-decision-making, data-management-in-education, data-specialization

Data is the New Oil, Curation is the New Refinery

Share this:

Leave a comment Cancel reply