Context:
Data quality should be at the core of many Artificial Intelligence initiatives from the very first moment in which data is required for a successful analysis. Measurement and evaluation of the level of quality are crucial to determining whether data can be used for the tasks at hand. Conscientious of this importance, industry and academia have proposed several data quality measurements and assessment frameworks over the last two decades. Unfortunately, there is no common and shared vocabulary for data quality terms. Thus, it is difficult and time-consuming to integrate data quality analysis within a (Big) Data workflow for performing Artificial Intelligence tasks. One of the main reasons is that, except for a reduced number of proposals, the presented vocabularies are neither machine-readable nor processable, needing human processing to be incorporated.
Objective:
This paper proposes a unified data quality measurement and assessment information model. This model can be used in different environments and contexts to describe data quality measurement and evaluation concerns.
Method:
The model has been developed as an ontology to make it interoperable and machine-readable. For better interoperability and applicability, this ontology, BIGOWL4DQ, has been developed as an extension of a previously developed ontology for describing knowledge management in Big Data analytics.
Conclusions:
This extended ontology provides a data quality measurement and assessment framework required when designing Artificial Intelligence workflows and integrated reasoning capacities. Thus, BIGOWL4DQ can be used to describe Big Data analysis and assess the data quality before the analysis.
Result:
Our proposal has been validated with two use cases. First, the semantic proposal has been assessed using an academic use case. And second, a real-world case study within an Artificial Intelligence workflow has been conducted to endorse our work.