Automated schema understanding for data lakes and data warehouses
Friday, June 7, 2024
Data discovery and understanding is the second step in the machine learning development cycle, right after problem identification. Meanwhile, data lakes are growing in size and scope, making the task of identifying the local neighborhood of tables relevant to a given problem harder. Additionally, for those who do consulting, this is a constant problem with every new customer engagement. In order to address this bottleneck we present a new, fully automated solution to data discovery and understanding in data lakes and data warehouses.