Three key points on the Delegated Act: How to preserve researcher autonomy under Article 40 DSA?

Sophie Stalla-Bourdillon

Co-director of the Brussels Privacy Hub, LSTS, VUB

29 November 2024

 

Despite its obvious merits, it remains unclear whether the Commission’s draft delegated act on access to platform data strikes the right balance between data providers’ commercial interests and the public interest. This contribution highlights three points from the draft with recommendations to help preserve researcher autonomy.

 


 

On 29 October 2024, the European Commission released its long-awaited draft delegated regulation intended to supplement the Digital Services Act (DSA)’s complex rules on researcher access to data. This so-called delegated act lays down the specific technical conditions and procedures to govern data sharing between the largest online platforms and vetted researchers under Article 40 DSA. The draft is now open to public consultation until December

What should we think of the draft delegated act? Is there room for improvement? The short answer is yes.

Early reactions to the draft (e.g., here, here) have pointed out its various strengths, noting that it helpfully refers to a wide range of data types that should be available to vetted researchers (Recital 12), and acknowledges the role of independent advisory mechanisms to support the work of Digital Services Coordinators (DSCs) in Article 14.

For this blog post, the question is whether and how the Commission has managed to preserve autonomy for vetted researchers, i.e., the V Online Platforms and Very Large Online Search Engines (VLOs) of this world. Given the power asymmetries that exist between VLOs and researchers (see e.g., here, here, here and here for examples), it is vital to preserve researchers’ autonomy as much as possible.

This post highlights three key points aimed at enhancing researcher autonomy under the DSA’s data access framework. First, platforms should be mandated to disclose comprehensive metadata to improve data discoverability. Second, research organizations should have the option to build and manage their own secure processing environments for accessing platform data, reducing dependence on platform-provided environments. Third, data providers should not be tasked with evaluating the merit of research proposals.

Before grabbing the highlighter, let’s restate the obvious goal of the delegated act, which is “to lay down technical conditions and procedures necessary to enable such access.” (Recital 1) for scientific research purposes. “Enabling” should also mean accelerating data access (to use the terminology of data analysts), as an overly long timeline to actual data access would undermine the entire framework.

1. Data discovery (Article 6)

To enable data access, the first step is to facilitate data discovery. This is intended to be achieved through a requirement, set forth in Article 6(4), for VLOs to “make available and easily accessible on their online interfaces (…) an overview of the data inventory of their services, including examples of available datasets and suggested modalities to access them.” Article 6(4) thus stays at the dataset level.

adopts a slightly different formulation and states that “data providers should provide an overview of the data inventory of their services easily accessible online, including indications on the data and data structures available, and where possible, indicate suggested modalities for accessing them.”

As with any data overview, actual data discovery capabilities lie in the details. Data can indeed be categorized and classified at varying levels of granularity. Because these data providers already leverage sophisticated data cataloguing capabilities for their analytic environments, why not be a little bit more specific and use a terminology that is closer to what one finds, for example, in the Data Act, but tailored to meet the needs of researchers before they submit a data access request?

VLOs could then be asked to “make available and easily accessible on their online interfaces (…) an overview of the data inventory of their services, including [relevant metadata necessary to interpret and use the data that they hold,] examples of available datasets, and suggested modalities to access them”, even if there is no requirement that the metadata be exhaustive.

In their current versions, Article 15(2) and Recital 26 don’t seem to cover the discovery phase—although once the request is made, data providers must facilitate the navigation and usability of the accessed data through the sharing of “relevant metadata and documentation describing the data made available, such as codebooks, changelogs and architectural documentation.”

2. Secure processing environments (Article 9)

Article 9 requires that the Digital Services Coordinator (DSC) “determine in the reasoned request the modalities according to which access to the data is to be granted by the data provider.” Within the list of in-scope modalities one finds ‘secure processing .’

Importantly, the draft delegated act seems to assume that the secure processing environment will be operated by the data provider, or by a certified third-party provider that arguably would act on behalf of the data provider as per Article 2(10). This would seem to exclude the possibility of using a secure processing environment provided by the research organization with which a potential vetted researcher is affiliated.

Within the DSA’s data access framework, why shouldn’t universities be allowed to provide secure processing environments? There is already a trend towards the building of trusted research environments within universities (see e.g., here, here), which should be further encouraged.

The use of secure processing environments to process personal data should be a key consideration for many access requests, as the definition of personal data is context dependent (see the recent Court of Justice of the European Union (CJEU) case law on the concept of personal data such as the IAB Europe judgment), and such environments are likely to be a prerequisite for a claim of functional anonymization or acceptable residual re-identification risk, whatever the level of sensitivity of the personal data.

The moment personal data is being processed, the principle of purpose limitation is applicable and purpose-based access control (with its audit trail) should be put in place. In any case, secure processing environments are relevant for all types of sensitive data beyond special category data governed by Article 9 of the General Data Protection Regulation (GDPR).

Looking in more detail at the framing of the role to be played by DSCs, the delegated act lists several key components secure processing environments should comprise. Under Article 9(4), the operator of a secure processing environment should be in position to demonstrate that it has the capabilities to secure, minimize, monitor, and audit access to data, and that researchers are given the computing power that they need.

Therefore, when the secure processing environment is operated by the data provider, the latter will have access to the audit trail and most likely the very content of queries made against the data, I question whether this constitutes the most enabling environment for research, especially considering platforms’ historical reluctance to share data with researchers.

3. Who decides what is substantial public interest, and who is a data controller?

Going further, the draft delegated act specifies in Recital 14 key steps to make sure the data access framework is compliant with the GDPR. In particular, the DSC of establishment should make sure that the researcher identifies its legal basis within the research application.

This makes sense, because if a researcher wants to access personal data, she clearly needs a legal basis under Article 6 GDPR; if the data is sensitive, she also needs to meet the requirements set out in Article 9 GDPR.

However, when the data is of a special category within the meaning of Article 9 GDPR, Recital 28 seems to be imposing upon data providers an obligation to demonstrate that the processing is necessary for reasons of substantial public interest under Article 9(2)(g) GDPR.

if the goal is to preserve researchers’ autonomy, a data provider’s determination as to whether the research is in the public interest should not be conclusive. And we should question whether we want data providers to be asked to make such a determination in the first place. It should be for the DSC of establishment to make or endorse such a determination, potentially once having received the advice of independent experts.

Article 4(3) of the draft delegated act states that “Digital Services Coordinators shall be separate controllers within the meaning of Article 4(7) of Regulation (EU) 2016/679 with respect to the processing of personal data they carry out to manage the data access process

In data protection parlance, the role played by DSCs is not so dissimilar from the role played by Health Data Access Bodies, as foreseen by the proposed Health Data Space Regulation (HDSR)—although, unlike the Health Data Access Bodies, DSCs themselves do not directly provide researchers with access to the data in question.

In its Wirtschaftsakademie Schleswig-Holstein and the CJEU made it clear that an entity with decision-making power over the data processing can become a data controller—regardless of whether or not they have direct access to that data. Under Article 9 of the delegated act, there is an argument that the DSC of establishment determines the means of the access, as it shall “determine in the reasoned request the modalities according to which access to the data is to be granted by the data provider.”

In addition, Recital 16 specifies that the DSC of establishment should assess the appropriateness of the data access modalities to achieve data security, data confidentiality and protect personal data while meeting the research objectives. Furthermore, the DSC of establishment obliges the data provider to share data with vetted researchers when it issues a reasoned request under Article 40(4) DSA.

While Recitals 49 and 55—read in conjunction with Article 51 of the compromised text of the HDSR—seem to view Health Data Access Bodies as data controllers for the specific segment of the data flow (pertaining to the provided data) they oversee, there may be an argument that DSCs should also be viewed as data controllers for a similar or shorter segment. Some additional considerations on the role of DSCs in this regard would be welcome, in particular to clarify why the DSCs’ decision-making power is considered irrelevant for assigning data protection roles.

Despite its obvious merits, it thus remains unclear whether the draft delegated act has struck the right balance between the commercial interests of data providers and the public interest. The role of DSCs is not fully developed and raises questions. Given the analytical capabilities at the disposal of data providers, it is crucial that the details in the delegated act be more clearly defined to avoid uncertainty, as uncertainty may ultimately favor VLOs.

 

Acknowledgements: The author would like to thank Paddy Leerssen and John Albert for their comments on the draft. Any errors remain her own.