Inhalt des Dokuments
- © Copyright??
One of the crucial requirements before consuming datasets for any application is to understand the dataset at hand -- this is known as data profiling. Profiling activities range from ad-hoc approaches, such as eye-balling random subsets of the data or formulating aggregation queries, to systematic inference of structural information and statistics of a dataset using dedicated profiling tools.In a collaboration with Felix Naumann (HPI) and Lukasz Golab (University of Waterloo) we surveyed relevant algorithms and systems in the area of data profiling and classified them based on there computational complexity. The survey was published in VLDBJ and we gave a tutorial based on the survey at ICDE 2016.