Supplementary Update | July 2023
Data and Data Governance (Art 10)
High-risk AI systems which make use of techniques involving the training of models with data, must be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in Art 10(2) to Art 10(5) as far as this is technically feasible according to the specific market segment or scope of application. Techniques that do not require labelled input data such as unsupervised learning and reinforcement learning shall be developed on the basis of data sets such as for testing and verification that meet the quality criteria referred to in paragraphs 2 (Art 10(2) to 5 (Art 10(5)) (Art 10(1)).
Training, validation and testing data sets shall be subject to data governance appropriate for the context of use as well as the intended purpose of the AI system. Those measures shall concern in particular:
- relevant design choices (Art 10(2)(a));
- transparency as regards the original purpose of data collection (Art 10(2)(aa));
- data collection processes (Art 10(2)(b));
- data preparation processing operations, such as annotation, labelling, cleaning, updating enrichment and aggregation (Art 10(2)(c));
- the formulation of assumptions, notably with respect to the information that the data are supposed to measure and represent (Art 10(2)(d));
- an assessment of the availability, quantity and suitability of the data sets that are needed (Art (10(2)(e));
- examination in view of possible biases that are likely to affect the health and safety of persons, negatively impact fundamental rights or lead to discrimination prohibited under Union law, especially where data outputs influence inputs for future operations (‘feedback loops’) and appropriate measures to detect, prevent and mitigate possible biases (Art (10(2)(f));
- appropriate measures to detect, prevent and mitigate possible bias (Art (10(2)(f)a);
- the identification of any relevant data gaps or shortcomings, that prevent compliance with the EU AI Act, and how those gaps and shortcomings can be addressed (Art 10(2)(g)).
Training datasets, and where they are used, validation and testing datasets, including the labels, shall be relevant, sufficiently representative, appropriately vetted for errors and be as complete as possible in view of the intended purpose. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used. These characteristics of the data sets shall be met at the level of individual datasets or a combination thereof (Art 10(3)).
Datasets shall take into account, to the extent required by the intended purpose or reasonably foreseeable misuses of the AI system, the characteristics or elements that are particular to the specific geographical, behavioural or functional setting within which the high-risk AI system is intended to be used (Art 10(4)).
To the extent that it is strictly necessary for the purposes of ensuring negative bias detection and correction in relation to the high-risk AI systems, the providers of such systems may exceptionally process special categories of personal data referred to in Article 9(1) of Regulation (EU) 2016/679, Article 10 of Directive (EU) 2016/680 and Article 10(1) of Regulation (EU) 2018/1725, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons, including technical limitations on the reuse and use of state-of-the-art security and privacy-preserving.
In particular, all the following conditions shall apply in order for this processing to occur:
- the bias detection and correction cannot be effectively fulfilled by processing synthetic or anonymised data;
- the data are pseudonymised;
- the provider takes appropriate technical and organisational measures to ensure that the data processed for the purpose of this paragraph are secured, protected, subject to suitable safeguards and only authorised persons have access to those data with appropriate confidentiality obligations;
- the data processed for the purpose of this paragraph are not to be transmitted, transferred or otherwise accessed by other parties;
- the data processed for the purpose of this paragraph are protected by means of appropriate technical and organisational measures and deleted once the bias has been corrected or the personal data has reached the end of its retention period;
- effective and appropriate measures are in place to ensure availability, security and resilience of processing systems and services against technical or physical incidents;
- effective and appropriate measures are in place to ensure physical security of locations where the data are stored and processed, internal IT and IT security governance and management, certification of processes and products; Providers having recourse to this provision shall draw up documentation explaining why the processing of special categories of personal data was necessary to detect and correct biases (Art 10(5)).
Appropriate data governance and management practices shall apply for the development of high-risk AI systems other than those which make use of techniques involving the training of models in order to ensure that those high-risk AI systems comply with Art 10(2) (Art 10(6)).< BACK TO ARTICLE
Where the provider cannot comply with the obligations laid down in this Article because that provider does not have access to the data and the data is held exclusively by the deployer, the deployer may, on the basis of a contract, be made responsible for any infringement of this Article (Art 10(6a)).