A video is inappropriate. Essentially I am scraping large volumes of open data from government and public sources, centralising and identifying regular expressions from random samples of data points from across each table and row, alongside selected pattern recognition statastical tools (clustering table data together for initial matching, anf fluctuating glanularity based on a decision framework) to produce decision making trees for the linking of relationships between large quantities of datasets in a logarithmic fashion, without having to process every individual data point.
The end result will be the centralisation and unification of all publicly available data, with relationships established at the time of entry and query to a multidimensional pointer index for associations. This is theoretically possible but a very complex implementation.. its kind of like how google learns stuff with its knowledge graph role out (structured data and context building through statistics). It wasn't really my initial choice for project choice/databases are super uninspiring, it was more of an assumed to exist dependancy. It doesn't exist, someones going to have to build it. If you have a discrete math/information theory background or would otherwise like to contribute, hit me up by email.
I've been using notebooks, so my code isn't ready for git/is attrociously shocking. I'll create and share a repo in due course. This is suppper way out of my skill set from a software implementation perspective, so I could do with a hand on the non math front if anyone is available. This capability isn't a luxury, it needed to be established yesterday, how the hell is this not even on the radar. Everyone should lose sleep that we don't already employ large scale data matching and predicition. We have no centralised data repo adhereing to schematic and formating standards, thats just outright apathy to our most critical of infrastructure. Our data is useless if we can't use it at scale. If universal structured data standards were implemented alongside efficient relationship matching/context building algorithims, we will be able to harness artifical intelligence on a wide array of datasets from a diverse amount of sources, in a manner that best addresses the curse of dimensionality.
Cute data visualisations and basic foundational statistical functions on repeat (slightly tweaked and rebranded as machine learning) isn't the data revolution, that shits literally existed for decades. Exponential intelligence and the ability to predict the future with an infinately increasing degree of certainty is the data revolution. MIT already demonstrated this to an academic standard. How the fuck are we going to be able to harness this incredible technology when we don't even have a proper relational data repository.