Researcher access to platform data: Experts weigh in on the Delegated Act

by John Albert, DSA Observatory

 

A watercolour illustration in two strong colours showing the silhouettes of four people, two of whom have dogs on leads. They all cast shadows, and vary between realistic representations and those formed by representations of algorithms, data points or networks. The people and their data become indistinguishable form each other.

 

This post shares insights from a DSA Observatory workshop held on 18 November 2024, where researchers and legal experts met to discuss what’s new in the draft delegated act, what’s missing, and how to approach the Commission’s call for feedback.  

 


 

On 29 October 2024, and after many months of delay, the European Commission published its long-awaited draft delegated act on access to online platform data for vetted researchers under Article 40 of the Digital Services Act (DSA). The Commission also launched a public consultation (the deadline was recently extended to 10 December 2024) to gather feedback on the draft; a final version is scheduled to be adopted in the first quarter of 2025.  

The delegated act helps clarify the process by which a researcher can apply for access to non-public data held by “Very Large” online platforms and search engines (VLOs) operating in Europe (Article 40.12—concerning researcher access to publicly available data—was out of scope for the delegated act). It includes the establishment of a “DSA data access portal”, as well as some guidance around the types of data researchers can request, data formats and access modalities, as well as mediation and independent advisory mechanisms under Article 40.  

In response to the publication of the draft delegated act, the DSA Observatory recently hosted an online workshop in which researchers, legal experts, and civil society advocates gathered to critically discuss the strengths and weaknesses of the draft and help prepare feedback for the public consultation. The workshop included presentations from Paddy Leerssen (DSA Observatory), Mathias Vermeulen (AWO), and Rebekah Tromble (George Washington University), followed by questions and discussion with workshop participants held under Chatham House rules.  

This post summarizes insights shared during the workshop presentations and discussion. These have been edited for clarity and are not attributed to individual speakers. Following a brief account of how we arrived at this version of the delegated act, it highlights some of the draft’s key provisions—noting what should be celebrated, what could be improved, and areas for concern.  

While there are many positives in the draft delegated act, workshop participants pointed out significant questions in the DSA’s data access framework which were left unanswered and would benefit from additional clarity. The post concludes with a discussion of these open questions which demand further regulatory guidance.  

How did we get here, what took so long, and when can researchers actually submit access requests?  

Before diving into the text of the delegated act, it’s worth briefly revisiting the process that led us to this moment and refreshing the timeline for implementing the DSA’s complex data access rules, whereby researchers can apply to access data held by VLOs to contribute to our understanding of so-called systemic risks and assess VLOs’ risk mitigation measures.   

The Commission launched their first public consultation on the delegated act in the Spring of 2023. This call for evidence yielded 133 responses from a range of both European and non-European stakeholders including academic institutions, NGOs, and individuals, as well as a handful of companies, business associations, and public authorities (a summary report analyzing the submissions was written by the DSA Observatory’s Paddy Leerssen). 

Following the call for evidence, the Commission was public in their ambitions to try to get a draft out by early 2024—an ambition that proved overly optimistic, given the draft did not arrive until late October.  

This timeline means the delegated act will likely be adopted in the first quarter of 2025 (according to the Commission’s plan); once that’s done, researchers eager to take advantage of the DSA’s data access regime can seriously think about submitting their own access requests. The delegated act must be adopted before Digital Services Coordinators (DSCs) can begin processing access requests and vetting researchers and their applications.  

What’s new in the delegated act, and what could be improved? 

The following synthesizes discussions from the DSA Observatory’s workshop and highlights specific provisions in the draft delegated act along with pressing issues to be taken up in the Commission’s public consultation.  

Quick win: NGOs and non-EU organizations in scope 

Workshop participants with legal expertise agreed that there are many things to celebrate in the delegated act. One of the foremost among them: this draft seems to confirm researchers from NGOs as well as non-EU organizations are within scope of the regulation and thus may be eligible to apply for data access. 

Good, with room for tweaks: Data inventories 

In the first call for evidence, many researchers noted that it’s difficult to formulate data access requests without first knowing what data platforms have available or how the data is structured. The delegated act responds to this concern by making clear that a wide variety of data will potentially be available to vetted researchers, and that these are to be specified by VLOs via data inventories.  

Workshop participants highlighted Recital 12, which provides a very long list of the kinds of data that might be eligible for access under DSA Article 40. This list is far more comprehensive and detailed than a similar list in the DSA, which is a significant win for researchers. For empirical researchers, however, there are still questions about how this wording works in the recital, and whether it could be expanded, refined, or otherwise tweaked in some way.

One approach to feedback on data inventories might be to say, “what makes a data inventory useful?”; i.e. when a researcher gets a data codebook, what information does it need to contain to assess whether a data point is useful or not? 

There may be practical challenges in making such an inventory comprehensive (given that different platforms have a diversity of data that they collect); Researchers should nevertheless demand data inventories be as thorough as possible, and be empowered to request data that are not listed in the inventory if they believe them to be in-scope of Article 40.  

Good, with moderate concerns: Access modalities 

The delegated act also includes guidance on a range of potential data access modalities for researchers and acknowledges that relevant data will vary over time. The list of modalities is not exhaustive but includes, at minimum, transmission of data to the researcher (typically for less sensitive data), and access via a secure processing environment (for highly sensitive data).  

When access to more high-risk or highly sensitive data is requested, the delegated act mentions the use of secure processing environments which are to be operated by data providers themselves or by a third party in line with the most recent technological standards.  

There were legitimate concerns raised in the workshop about relying on data providers (or third parties handpicked by those providers) to build and maintain such secure processing environments. One fear is that platforms will create environments where analysis is seriously limited; another is that platforms operating these environments could potentially generate an audit trail that reveals what a researcher is doing (for more on this, see Sophie Stalla-Bourdillon’s recent analysis on the draft delegated act).  

What about using secure processing environments which are already established at European universities? Workshop participants discussed whether these university-operated environments are currently fit for purpose—there is nevertheless an argument for building their capacities to act as potential third parties within the access framework, as this could sidestep the problems associated with relying on platform-operated environments when such access modalities are warranted. 

Good, with moderate concerns: DSA data access portal 

The delegated establishes a new “DSA data access portal” to be maintained by the Commission, which describes it as “a one-stop-shop for researchers, data providers, and DSCs to exchange information on data access requests.” This portal does not transmit data itself but rather helps manage the bureaucracy of the application process, for example by informing applicants on the status of their requests (one can imagine how much more useful this might be than, for instance, receiving emails with PDFs). 

The public side of the portal will provide relevant parties with an overview of where applications stand in the pipeline, as well as general information about the applications themselves (e.g. the systemic risk being studied, the data and access modality requested, and the legal and technical safeguards the research team has put in place).  

While this portal could potentially constrain researchers in how they formulate requests, the Commission hopes it will provide a learning function for applicants by offering them a baseline for comparison. However, researchers raised concerns that specifying what systemic risk they are researching could problematically tip off VLOs, giving platforms an opportunity to preemptively alter their systems or otherwise obstruct the research.  

On the other hand, VLOs may need relatively detailed guidance to disclose data in a way that meets the researcher’s needs. Article 10 of the delegated act, which governs the information provided in the DSC’s official request to the platform, must now strike a balance between these two objectives. 

Good, and should keep developing: Independent advisory mechanisms  

Workshop participants were optimistic about how the delegated act emphasizes independent expertise and advisory mechanisms within the data access framework. Although no such experts or advisory bodies have been formally recognized yet, their inclusion in this framework has the potential to increase DSC capacities—for example, by advising DSCs on vetting procedures and evaluating amendment requests, helping to set standards, and otherwise helping facilitate meaningful researcher access to platform data. It was also noted that independent experts and advisory bodies, if effectively integrated into the data access framework, could importantly reduce the likelihood of regulatory capture. 

Red flag: Dispute settlement procedures (Article 13) 

For cases in which a platform wishes to challenge a reasoned request for data access from a DSC and is dissatisfied with the result of an amendment request, Article 13 of the delegated act provides for the possibility of a mediation whereby the parties involved can agree to some mutually acceptable outcome.  

Multiple speakers in the workshop stressed this provision as being the most problematic. Specifically, under Article 13, only the data provider can initiate a dispute settlement; the data provider initiating the dispute also selects the mediator.  

On the first point, it’s remarkable that researchers cannot, according to the draft delegate act, initiate a dispute settlement in the same way. Whereas platforms are empowered to request certain amendments to data access requests and seek to limit the scope of access, the delegated act doesn’t provide researchers with any clear remedy if they are unhappy with the data provided (e.g. because the data is not fit for purpose, incomplete, or otherwise totally inadequate).  

It’s unclear why the delegated act is designed this way. There may be an intention from the drafters to limit the burden on researchers to do their own advocacy (i.e. shielding researchers by leaving disputes to platforms and the DSCs); there is nevertheless a strong argument for fixing this provision to ensure researchers have similar access to mediation should platforms fail to provide adequate data access. Workshop participants noted that such involvement should not be mandated but provide for more open communication and feedback channels between researchers and DSCs. 

It is also unclear why the data provider should be responsible for selecting the mediator in cases when disputes arise. To ensure a fair process, it would make sense that the mediator be chosen in collaboration with the affected researcher and/or the DSC, possibly with advisement via an independent advisory mechanism. 

Open questions demanding further guidance

Although the delegated act is a major step forward to implement the DSA’s data access provisions, workshop participants agreed there are a host of open questions which, if left unanswered by the Commission, may effectively be kicked down the road to DSCs tasked with overseeing the data access regime.  

These gaps include a lack of substantive requirements for DSCs in vetting research applications (e.g. how should DSCs verify researchers’ independence from commercial interests?), as well as ambiguity over how DSCs should evaluate VLOs’ requests to “amend” or deny access requests (e.g. when can platforms invoke trade secrecy or service security?).  

Workshop participants also highlighted potential bottlenecking issues given the extraordinary responsibility placed on the DSC of establishment (especially the Irish DSC). It could therefore be helpful to involve other actors—like local DSCs where the principal researcher is based or independent intermediary bodies—to help streamline vetting procedures.  

Broadly speaking, the delegated act creates many new tasks (e.g. to administer secure processing environments, mediate disputes between VLOs and researchers, or provide independent expertise through formal advisory mechanisms), yet doesn’t always specify who the appropriate actors should be to execute these tasks. It’s important to start resolving such open questions—if not in the delegated act itself, then through other fora—to ensure that public interest researchers can effectively support the DSA’s framework for risk management and accountability.  

 

Title Image: “Data People” by Jamillah Knowles
Jamillah Knowles / Better Images of AI / Data People / CC-BY 4.0
www.jemimahknightstudio.com/work/ai