Shortcomings of the first DSA Audits — and how to do better

By Daniel Holznagel

At the end of 2024, the first audit reports under the Digital Services Act were published. Most were produced by Big Four accounting firms, and were, in many ways, not very ambitious. This post collects impressions from digesting most (not all) of these reports — focusing on five structural shortcomings that severely limit their usefulness: from illegitimate audit gaps to auditors’ apparent reluctance to interpret the law or meaningfully assess systemic risks (especially around recommender systems). The post also highlights a few useful disclosures — including platform-defined compliance benchmarks — and outlines where auditors, regulators, and civil society should push for improvements in future rounds.

At the end of 2024, 19 providers of very large online platforms and search engines (VLOPS/VLOSEs) published the “first round” of audit reports under the Digital Services Act.

These reports are mandated by Article 37 of the Digital Services Act (DSA), under which providers of VLOPs/VLOSEs must, at least once a year, contract an independent auditor to assess their compliance with relevant DSA obligations.

Auditors must then establish an audit report, which includes the main findings drawn from the audit. Where an audit opinion is not positive, auditors shall include operational recommendations on specific measures to achieve compliance. The audit reports must then be published (in accordance with Art. 37(4), 42(4) DSA).

The framework is further specified through Delegated Regulation (EU) 2024/436 (hereinafter: DRA), laying down rules on the performance of audits (For further reading on this, see Anna Morandini, DSA Audits: Procedural Rules Leave Some Uncertainties).

Now, what are the takeaways from this first round of reports?

First of all: Most major platforms have chosen the same Big Four public accounting firms as auditors. Of the eight social media platforms covered in this first round, for example:

Ernst & Young (EY), audited Facebook, Instagram, YouTube, Pinterest, Snapchat;
KPMG audited TikTok;
Deloitte audited LinkedIn.
X is the exception, being audited by FTI Consulting.

These auditors have produced hundreds upon hundreds of pages of reporting. However, it is not so hard to parse through it. Most of all, it’s repetitive. It’s also, in many ways, not very ambitious.

In this blog-post, I want to focus on a few aspects which I personally found most interesting from digesting most (not all) of the audit reports (any reference to audit reports refers to the “first round” of reporting). For the purposes of consistency, I will use EY’s audits conducted for Facebook and YouTube as standard examples, and comment on other reports only where necessary for explanation.

I’ve collected impressions under five main angles: 1) illegitimate audit gaps connected to active regulatory investigations; 2) failure to set up “audit criteria”; 3) limited testing; 4) shallow auditing of systemic risk management obligations under Article 34 f. DSA, and 5) a surprising reluctance of auditors to test and form opinions on recommender systems.

These are major shortcomings in the audits’ substance, which severely limit their usefulness – and where regulators, as well as civil society, should push for future improvements.

Observation 1:
Illegitimate auditing gaps where regulators investigate

One interesting, eye catching aspect of major audits I found quite astonishing: Gaps. As an example, for Facebook, EY did not conduct auditing on Articles 14(1), 16(1), 16(5), 16(6), 17(1), 20(1), 20(3), 24(5), 25(1), 28(1), 34(1), 34(2), 35(1), and 40(12). For TikTok, KPMG has similarly skipped 34(1), 34(2), 35(1), 28(1), 39(1) and 40(12) DSA.

The argument for these gaps goes as follows (citing EY for Facebook):

Ongoing European Commission investigations against the platform might “present potentially significant contrary evidence as to whether … [platform] is in compliance … and whether we have sufficient … information and understanding … Because of the significance of the matter described above, we have not been able to obtain sufficient appropriate evidence to form an opinion on the audited providers compliance … Accordingly, we do not express a conclusion …”

So the argument is: Our job is “to assess compliance” (Art. 37(1) DSA). But since someone else – a regulatory authority investigating in this area – might (!) one day (!) know better, we don’t know enough to continue our assessment.

Art. 37 DSA does not contain any provision that would explicitly allow skipping provisions under investigation, and I do not see any legitimate defense for such refusal to audit:

No sensitive-information-defense: Under Art. 42(5) DSA, auditor reporting might be redacted to prevent the publication of sensitive information. But this would only allow to redact the public reports, not to skip auditing in the first place. Furthermore: Auditors here are not raising the defense of sensitive information.
No platform-did-not-cooperate-defense: Auditors might have a point for skipping auditing if platforms – e.g. with a view to ongoing investigations – had not provided sufficient cooperation and assistance (see Art. 37(2) S. 1 DSA). But: The auditors mostly white-wash the platforms here (citing EY for Facebook: “The audited provider did not impose any limitations on our procedures and cooperated fully with our requests related to the Investigation”).
No nemo-tenetur-defense: Another theoretical defense might be the (principle of) prohibition of self-incrimination. But: this should only protect from using the self-incriminating information with a view to repressive sanctions (fines), not regarding preventive measures. Moreover, rights of legal persons against self-incrimination are very limited in the first place (see, e.g., German Constitutional Court, Decision of 13 January 1981 – 1 BvR 116/77; for comparable discussion under the NetzDG see BT-Drs. 19/18792 p. 53 f.). However, these issues were not even raised by auditors.

We might expect auditing to be of extra relevance for important obligations where regulators are actively investigating platforms for potential infringements. Consequently, these gaps are truly substantive.

Interestingly, while it seems that Big Four auditors share a common, industry-friendly, understanding here, FTI Consulting, auditing X, has demonstrated how to do better: Though the Commission – during the auditing period – had opened a proceeding concerning Art. 34, 35 DSA, FTI Consulting did not skip auditing and instead actually documented shortcomings, e.g., that it “has not seen evidence that … [risk] assessments have been conducted in a robust and effective way” (audit report p. 232).

Observation 2:
Hollow “audit criteria” — auditors don’t want to interpret the DSA

One thing I was keen to find out when parsing through the audit reports: To which level of detail would auditors describe the “audit criteria”? (i.e. the criteria against which the auditor assesses compliance with each audited obligation and which the auditors must include in their audit reports, see Art. 2(5), 10(2)(a) DRA).

What I expected was that auditors would give a lawyerly interpretation of the DSA, and explain what certain elements of the DSA obligations – in the auditor’s opinion – actually mean or require as a minimum standard.

For example: If you want to audit compliance with Art. 23(1) DSA, the auditor would have to interpret which behavior amounts to “frequently” providing manifestly illegal content. And so on. Because: How can you aim to “assess compliance” with the DSA’s obligations (Art. 37(1) DSA), if you do not interpret the actual imperative of these obligations?

But if you look at the reports, auditors are very silent on explaining how they understand the audited obligations.

For example: EY, for Facebook, when discussing compliance with the DSA’s obligations, copy-pastes the respective DSA norm text as “Audit criteria”. If available, they supplement this by citing the “Management’s definition” of (aspects of) the audited obligation (e.g., how Facebook interprets “promptly” in Art. 18(1) DSA).

Auditor rationales

The audit reports are somewhat transparent about this hands-off approach: In their Service Agreement (Appendix 3 of the audit report), EY and Meta agreed that “EY will not provide … opinions of legal interpretation” and in the “Assurance Report of Independent Accountants”, EY announces that “making interpretations, defining ambiguous terms” would be Meta’s responsibility.

As a result: In EY’s auditing of Facebook, we cannot find a single instance where EY is substantively interpreting the DSA, or where they explain that Meta’s interpretation/definition is insufficient and why. Therefore, we must conclude that they either indeed followed a “no legal interpretation” approach – or, at least, do not talk about it.

Is the legislative framework to blame?

One might wonder whether this problem (auditors’ failure to set-up meaningful audit criteria) is rooted in the DRA, which leaves it to auditors to set up audit criteria in the first place.

In my view, this architecture is not the problem. One must be clear about the consequences: If you want to audit compliance with the DSA, then the audit criteria is the DSA itself and auditors cannot walk away from operationalizing it through a convincing interpretation.

Granted, the DRA is somewhat ambiguous here: According to Art. 10(2)(a) of the DRA, the audit criteria shall be “defined on the basis of information pursuant to Article 5(1), point (a)”, which refers to data that the provider transmits to the auditor; amongst others: “benchmarks used by the audited provider”.

However, in my view, this can only refer to using compatible metrics, not aligning on the interpretation of the audited DSA obligations. Otherwise, auditors will end up assessing compliance with the provider’s interpretation of the DSA, and not with, as they are required under Art. 37(1) DSA (which is the ultimate imperative here), to “assess compliance” with the DSA (!).

Sure, if auditors are more active in interpreting the DSA to establish audit criteria, audits might become more subjective (think of differing first instance court opinions). But in my view, we must see this as somewhat inherent to the audit regime – and as an asset, rather than a problem.

Think of the DSA as a regulatory “hot potato approach”: tossing the question of what constitutes due diligence at various stakeholders with various perspectives can help cool down the problem and reach a consensus; indeed, the DSA envisions such an approach.

Consequences

When auditors basically fail to set up audit criteria, then audits must get weaker, because auditors aren’t zeroing in on the standards that platforms must actually fulfill.

This has real consequences. For example, I was keen to find out how EY would audit Art. 38 DSA for Facebook and Instagram, under which VLOPs are required to offer users the option of non-profiling-based recommender systems. As NGOs have pointed out, the platforms are seemingly non-compliant on substance here – having implemented a “non-sticky” no-profiling option.

Interestingly, the auditor did not mention this as problematic. No wonder, as they had not elaborated that Art. 38 DSA would require to somewhat effectively choose between algorithm options.

And so on: You will find more examples where auditors are blind to non-compliance, supposedly because they did not think about the imperative of a DSA-obligation in the first place (and were seemingly cautious in inviting outside expertise or engaging in stakeholder discussions).

X’s audit by FTI Consulting can – again – serve as a slightly better example. Though FTI seemingly fails to explicitly explain “audit criteria”, they implicitly do so by, in some instances, at least somewhat explaining against which criteria they come to their conclusions.

For example, when auditing Art. 14(1) DSA and the norm’s imperative of “clear, plain, intelligible, user-friendly and unambiguous language” of the provider’s terms and conditions, they do so by applying specific readability tests and are transparent about the benchmark. Or look at their audit for Art. 16(2) DSA where they observe “convenience that logged-in users have some fields auto-populated…”, which implicitly hints towards at least one reasonable “audit criteria” for Art. 16(2) DSA (a ka “not to require reporters to provide unnecessary input”).

Observation 3: Limited testing

Another thing I was keen to find out: How much would auditors actually test platforms? That is to say: take a smartphone, log in, and try to find out how a provider is implementing the DSA?

What about using even more advanced methods, like attempting to circumvent platform measures, or conducting broad empirical testing (like, e.g., sending a relevant amount of notices and watching platform performance under Art. 16 DSA – something that took place under the NetzDG, which also had kind of an audit provision in § 3(5) – see evaluation in BT-Drs. 19/22610 p. 79).

One can say that auditors are indeed reporting about some kind of testing, as required by Art. 10(5)(c) DRA.

E.g., EY for Facebook reports on testing for Art. 16(4) DSA: “Inspected a submission through the notice mechanism for Facebook and ascertained a confirmation receipt was sent to the individual or entity automatically and without undue delay.”

However, my impression is very much that of a very cautious approach. Reporting on these “tests” (or “inspections”, as in EY’s example) remains abstract. We do not learn whether they tested various kinds of interfaces, tried various input, dummy users, click-paths, etc. (a good example for what you miss if you don’t test with some ambition is Facebook’s implementation of Art. 38 DSA, see above observation 2).

Especially, as it seems, auditors did not conduct broad empirical testing.

Observation 4: Limited auditing of systemic risk assessments

Another interesting question for me was: How would auditors deal with auditing Art. 34 DSA? Art. 34 DSA, in itself, implies a kind of “self-audit,” as the norm’s imperative is to self-assess the risks stemming from the service’s design.

So the question is: How would auditors deal with this kind of “auditing an auditing-obligation”? Would they restrict themselves to essentially analyzing the providers’ processes of risk assessment (and mitigation), e.g. through reviewing assessment procedures, methodologies etc.? Or would they also review on substance – that is, the sufficiency of a provider’s conclusions?

Auditors seem to have mostly chosen the former, more procedural approach.

Take EY’s auditing of YouTube’s risk assessment under Art. 34 DSA: From EY’s auditing report, one can conclude that EY relied on assessing the procedures of YouTube’s risk assessment. EY “Inspected the scoring criteria, thresholds, and rationale for the probability and severity associated with each risk” (audit report, p. 94). But it is unclear whether EY questioned these criteria in the first place, or whether EY made its own determinations about the systemic risks stemming from YouTube’s design and appropriate mitigation measures. The wording of the audit reporting seems to imply that EY did not.

True, auditing Art. 34 f. DSA seems challenging, given the many uncertainties of how to apply this very broad and abstract regime. But limiting auditing to procedural steps cannot be good enough. Because Art. 37(1) DSA is clear: The auditor is tasked with “assess(ing) compliance” with the DSA, not just assessing steps taken in an attempt to comply with the law.

Admittedly, Art. 13 f. DRA, which lays out specific methodologies for auditing compliance with Art. 34 f. DSA, creates some ambiguities: Somewhat hidden in details, Art. 13(1)(a) DRA requires auditors to analyze “whether the audited provider has diligently identified, analyzed, and assessed the systemic risks … referred to in Article 34(1) [DSA]”, and Art. 14(1)(c) DRA requires to analyze “whether the mitigation measures put in place by the audited provider are reasonable, proportionate and effective …”.

However, all the other details of the (lengthy) provisions of Art. 13 f. DRA lead auditors to focus on assessing how the provider assessed risks and measures, which could support an understanding that auditors do not need to assess on substance here (has the provider found all the risks, and taken appropriate measures?).

Observation 5: Failing on algorithms

One of the DSA’s weightiest promises is to address risks stemming from recommender systems, especially through such systems’ engagement-rewarding designs and their vulnerability to disinformation campaigns.

If you think how the DSA can deliver on that promise, Art. 34 f. DSA is the potential lever (Art. 27, 38 DSA are – in my view – a “nice to have”, but cannot substantially limit the risks stemming from the platforms’ use of algorithms/recommender systems). Therefore, it is of high relevance how auditors audit Art. 34 f. DSA with a view to recommender systems.

Furthermore: recommender systems are complicated. Understanding them requires outstanding expertise and data access. Having the auditors zooming in on recommender systems seems most promising, for a few reasons:

Auditors must “have proven expertise in the area of risk management, technical competence and capabilities” (Art. 37(3)(b) DSA), which also means “to audit algorithms” (Recital 92 DSA);
Providers must give auditors “access to all relevant data” (Art. 37(2) DSA), which includes data related to algorithmic systems (Recital 92 DSA),
and, according to Art. 10(5)(b)-(c) DRA, audit procedures shall include at least: the performance of substantive analytical procedures to assess compliance “including as regards algorithmic systems”; and, where the auditor is left with “reasonable doubts”, the performance of tests, including with respect to algorithmic systems.

So, do the audits deliver on scrutinizing recommender systems?

I think not. I will explain this with the example of EY’s auditing of YouTube. Here, EY, in its Assurance Report of Independent Accountants, in subsection “Inherent limitations” (p. 7 of the audit report), surprisingly explains:

“Our examination was limited to certain aspects of the audited service’s algorithmic systems, to the extent needed to obtain evidence of the audited service’s compliance with the Specified Requirements as required by the Act. … algorithms may not consistently operate in accordance with their intended purpose or at an appropriate level of precision. Because of their nature and inherent limitations, algorithms may introduce biases of the human programmer resulting in repeated errors or a favoring of certain results or outputs by the model. Accordingly, we do not express an opinion, conclusion, nor any form of assurance on the design, operation, and monitoring of the algorithmic systems.”

(similar statements appear in the other EY audits, as well as KPMG’s audit report for e.g. TikTok, p. 10; Deloitte for LinkedIn, p. 3).

In my view, this statement must be interpreted as a white flag: If auditors do not express opinions on the “design, operation, and monitoring” of algorithmic systems, then they basically withdraw from doing substantive auditing in this area to begin with. And the given explanation (“algorithms may not consistently operate in accordance with … intended purpose”) should not serve as an explanation for not auditing. It should – to the contrary – invite to follow up on such suspicions.

But indeed, in the audit report for YouTube, on Art. 34 f. DSA, we do not find much “opinion, conclusion … on the design … of the algorithmic systems” regarding recommender systems. Instead, EY check-boxes more or less formalistic steps (e.g. p. 95: “Inspected the audited service’s recommender system model documentation and code, and determined that the main parameters used in recommender systems matched with the information included in the Transparency Center.”) or abstract concepts (e.g. p. 97: “per inspection of the rationale documented within the risk assessment, determined that … service appropriately identified and assessed how the risks identified for the audited service are influenced by intentional manipulation of their service …”).

One could argue that the DRA allows this “don’t look at algorithms” approach, since under Art. 10(5)(b)-(c) DRA, testing the algorithmic systems is mandatory only where the auditor has “reasonable doubts” about sufficient risk assessment and risk mitigation.

But when it comes to recommender systems, and all the available indicators (researchers conclusions, journalistic coverage, whistleblower information etc.) for systemic risks stemming from there (disinfo vulnerability, confirmation biases, addictive behaviors, provoking challenges …), how should the auditor not have reasonable doubts whether systemic risks have been assessed diligently (given the moderate level of risk assessment we have seen so far)?

Furthermore, let us not forget that the (questionable) “reasonable doubt” threshold in Art. 10(5) DRA is only a trigger for minimum auditor standards (“include at least”). So, no auditor is hindered to test algorithms irrespective of doubts. Moreover, according to Art. 10(4) DRA, reasonable doubts shall arise in the presence of external indications (like regulators’ reports and guidelines or, in my view, the opening of investigatory proceedings and substantive stakeholder feedback).

Revelations despite audit shortcomings

Most of the audits don’t reveal many substantive instances of non-compliance. For large parts, this may be caused by the audits’ shortcomings – especially those conducted by the Big 4 – as outlined in the above observations 1-5. It is noteworthy that FTI Consulting, which has taken a more ambitious approach, revealed a substantive amount of compliance issues for X (of course, as a factor, X might be less compliant in the first place).

More horizontally, audit reports bring other revelations: communication of the provider’s internal benchmarks (see Art. 5 DRA).

For example, we now have learned that Facebook’s definition of “timely” and “without undue delay” in Art. 20(4) DSA is 72 hours, while, as the audit report tells us, no such definition has been set up for the term in Art. 22(1) DSA. We also learn that, by Facebook’s definition for “reasonable period of time” and “frequently” in Art. 23(1) DSA, the threshold starts with 7 infringements over the course of one year, which will result in a geo-block for 24 hours (which is a surprisingly infringer-friendly benchmark).

Providers have communicated their internal benchmarks or “management’s definition” for some other provisions of the DSA as well, and, parsing through the audit reports, readers can now understand a little better the companies’ decision making in respective areas.

Conclusions

Overall, I find the shortcomings described above (illegitimate gaps, failure to set up “audit criteria”, limited testing, shallow auditing of Art. 34 f. DSA) of high relevance, as these shortcomings heavily restrict the audits’ substance.

Besides the shortcomings, it is true that the audit reports carry some fruitful information, e.g., the provider’s internal benchmarks. The (limited) findings of instances of non-compliance are noteworthy and deserve follow-up observations. All this will help regulators to push for (a little more) compliance with the DSA.

But as an overall impression, audits must get much better. We should not accept the insufficient “first audits” as a practice shaping later standards in future auditing rounds.

In my view, improving the audits requires, first and foremost, the following steps:

Gaps: No more skipping of auditing for obligations under investigation.
Audit Criteria: Auditors must define meaningful audit criteria, this requires interpreting the DSA.
Testing: Auditors must get more serious on testing, which should be seen as a bread and butter of auditing.
Risk assessment and mitigation, especially for algorithmic systems: Auditing of Art. 34 f. DSA cannot restrict itself to reviewing procedural steps here. Auditors must review on substance, and, especially with a view to risks stemming from the use of algorithmic systems, must meet much higher expectations.
Invite external input: To widen their horizon, auditors should engage in stakeholder conversation and invite third party input.

Regulators should push for respective improvements, which might happen through various angles. E.g., insufficient auditing might amount to a violation of Art. 37(1) DSA in itself, and the Commission could initiate respective investigations. It might also fall at the auditors’ feet, as regulators might question an auditor’s qualifications as required under Art. 37(3) DSA.

As a more horizontal approach, the Commission could support voluntary standards for auditing (Art. 44(1)(e) DSA), or even revise the DRA. However, all of this will require a certain degree of consensus on how auditors should do better – which means that academia and civil society should also invest in the topic.

The author is grateful for valuable feedback and discussion on the topic of this article with experts Anna Morandini, Alexander Hohlfeld, Fernando Hortal Foronda.