Algorithms for the Discovery of Embedded Functional Dependencies
Author: Wei, Z; Hartmann, S; Link, S
Publisher: Department of Computer Science, The University of Auckland, New Zealand
Link to this item using this URL: https://hdl.handle.net/2292/58005
Embedded functional dependencies (eFDs) were recently introduced to tailor relational schema design to data completeness requirements of applications. They also facilitate data cleaning and data integration. A problem that is essential to unlocking these applications is the discovery of all eFDs that hold on a given data set. We show that the discovery problem of eFDs is NP-complete, W-complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Despite these computational challenges, we use novel data structures and search strategies to develop row-efficient, columnefficient, and hybrid algorithms that can efficiently solve the discovery problem for eFDs on large real-world benchmark data sets. Our experiments also demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. Finally, we demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for approximate and genuine functional dependencies.
Subjects: Fields of Research
Citation: ["CDMTCS Research Reports CDMTCS-542 (2020)"]
Copyright: Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.