Data about research and researchers has become an integral part of the scientific publication process: when searching for literature, researchers receive recommendations for other publications that might interest them; and institutions and scientists are ranked on the basis of analyses of publications and citations – these are just two examples of how data is currently used in the global academic ecosystem. The Academic Library and Information Systems Committee of the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) issued a discussion paper in October 2021 with a critical look at the dangers inherent when such data and analyses are in the hands of commercial operators, such as publishers and analytics companies. The scientific community needs new forms of intelligent technology to work with the ever-increasing world of data. How can scientists, academic institutions and publishers find the right balance between concerns for individual privacy, academic freedom and the use of innovative, digital technologies?
The ECDF and Elsevier have taken this question as the impetus to foster an open discussion series on how digitalization is affecting the scientific enterprise: ECDF and Elsevier Conversations on Science in the Digital Future. The first conversation, entitled Data Privacy in the Digital Era, took place on November 22, 2022, with Prof. Dr. Max von Grafenstein, Professor of Digital Self-Determination at ECDF and Berlin University of the Arts, Dr. IJsbrand Jan Aalbersberg, Senior VP of Research Integrity at Elsevier, and Prof. Dr. Wolfram Horstmann, Head of Lower Saxony State and University Library Göttingen and Chair of the DFG’s Academic Library and Information Systems Committee (AWBI). The event was moderated by journalist Katharina Heckendorf.
„Whatever we do with the data, we have to make sure that we use the data to promote research and help the researcher in promoting the publication itself and the researcher in his career. For me that means that whatever we do in relation to helping researching organizations has to be fair, and that we really take care of diversity and inclusion” Aalbersberg says in his opening statement. To him, this does also include not collecting more data from the user than is actually needed in order to provide this support: „For the level of seniority of a researcher, it is not necessary for us to know his birthday because it is not related at all to the level of seniority “, he adds. In 2016 Elsevier started calling itself an information analytics business in addition to being a publisher. According to Aalsbersberg, this is also due to a change in what academic institutions started to expect from publishers like Elsevier: “They want analytical information, they want to know how their academic institution compares to other academic institutions and in that case, you have to offer some analytical tools to make comparisons”.
For Wolfram Horstmann who is one of the authors of the DFG paper, accumulating such data can be detrimental to the scientist: collected usage data could be used for creating a personal profile or for nudging the researcher towards a journal by the same publisher or away from a certain topic. To Horstmann, especially the collection of behavioral data is problematic: “What is being read, what is being clicked, what is being enlarged, this is very specific data about behavior while using the platforms. The reason why researchers feel uncomfortable with specifically sharing this data is because it is not quality reviewed and is not relevant for the scientific information system in general.” Horstmann acknowledges that the DFG paper painted a dystopian future, however, he was also surprised by the strong response of publishers like Elsevier, underlining the need for a discussion: “This discussion is relevant because this topic has strategic relevance and will influence how the whole sector develops”. One of his major concerns is the behavioral usage data of researchers: information that is not being published could then be used to influence behavior.
How can scientists, academic institutions and publishers find the right balance between concerns for individual privacy, academic freedom and the use of innovative, digital technologies?
Max von Grafenstein is especially concerned that users could lose control over their usage data and which purposes it is being used for: usage data and personal profiles might be collected for one purpose, but are later used for a completely different purpose, often by a different actor as well. He draws a comparison to the case of Facebook’s sale of usage data to Cambridge Analytica, and that company’s involvement in political micro-targeting. What should be avoided, he added, is a power imbalance between publishing houses and individual scientists, which would force the latter to disclose their personal data in order to publish scientific papers: “Subsequently, even if I do disclose this data, I don’t want that the insights about how I generate knowledge are passed on to someone like Trump, who is not fond of academics” he said, relating back to the Cambridge Analytica example. Grafenstein sees a risks that publishing houses become so strong that they shape the ways how knowledge is produced because they decide which recommendations scientists receive. This would then have the effect of diminishing the role of universities and libraries in the research ecosystem.
Aalbersberg agrees that literature recommendations have to hold the balance between supporting the users in their research without leading them into a specific direction. This is currently done by making transparent that the recommendations are based on what other users have been interested in in the past. In addition, tracking could allow publishers to subsequently inform users about retracted papers, if permission has been given. Aalbersberg added: “It is our obligation as a publisher to use modern technologies to advance science and healthcare and we are expected to use modern technologies to do those things. However, we have to make sure how we do that, we have to make sure that it is clear to the user how we do it, and that they can switch it off”. The last point is especially important to Max von Grafenstein: it is essential to be able to turn any tracking off, especially for researchers in authoritarian regimes, where the collected data could easily fall into the wrong hands.
All three discussants agree that the GDPR rightfully asks companies that the advantages of collecting the data clearly outweigh the risks. To Aalbersberg, this needs to be revaluated in every step of the collection process. He emphasizes that Elsevier complies with the current legislation; however, for Horstmann this is not necessarily enough, since the interpretation is in a grey zone for many scenarios, not just in the case of researcher’s data: “Just because something is compliant with GDPR it doesn’t necessarily say that it fulfills all the ethical standards that science specifically would pose to a system.”
“How can publishers and scientific institutions continue working together, without scientific institutions feeling the strong imbalance of power as they do now?”, Katharina Heckendorf asked before wrapping up the discussion. “While the task of the industry is to make profit, the task of science is to serve a universal purpose and to provide knowledge. These two interests are not necessarily always without conflict and this conflict of interest needs to be discussed.” In terms of governance, Horstmann suggested a scientific certification authority governed by science or jointly governed by science and publishers. In addition, publishing algorithms that are used for rankings, for example, would lead to an immense gain in trust and transparency, he added. For Max von Grafenstein, the question is more fundamental: “We should talk about the functions that different actors actually should fulfill, in the best interest of society”, he says. For him, academic publishers are now fulfilling many of the functions of university libraries in terms of helping users navigate and find information, and this trend, left unchecked, presents a potential conflict of interest.
In 2023, more panel discussion will follow. You can find all information and recordings of past events //here.
In the following, the panelists answer questions from the audience that could not be answered during the event:
Question: Do you see any competitor to the big traditional publisher houses? Are there any political and technological developments threatening the publisher?
Aalbersberg (Elsevier): We operate in a competitive environment. There are new initiatives – from small tech start-ups to large organizations, from non-profit foundations to commercial companies – that develop new or improve existing tools for the academic community all the time. We welcome competition, it’s healthy and offers great benefits to individual researchers and to the academic community as a whole.
We are confident that our products will continue to provide the academic community with relevant, high-quality information and analytics efficiently and securely. This is a continuous process of development, our teams are constantly speaking to the academic community to get a better understanding of their needs and analysing how we can improve our solutions. We do not see technology as a threat to us, but as an opportunity to serve our users and their institutions with new and better solutions to solve (some of) their problems. Likewise, with political developments: they can also take place to serve the academic community.
Does Elsevier included fingerprints in PDF files so that two PDF files of the same ScienceDirect article can be distinguished? If so - why?
Aalbersberg (Elsevier): Universities are systematically targeted by cyberattacks, so it is a top priority for Elsevier to play our part in detecting potential threats that can compromise the security of our and our customers’ systems, the security of personal student data, and the integrity of their research. Watermarking in PDFs allows us to identify potential sources of threats so we can inform our customers for them to act upon. This approach is commonly used across the academic publishing industry.
Can you give quick assessment, maybe in percentage, of how GDPR compliant your institution does work?