What is going on: The Federal Chancellor speaks of dataspaces. Instead of “cash for clunkers” as an aid for the automotive industry there will be a dataspace now (Delhaes 2020, Benrath & Löhr 2021). So, what exactly is a dataspace? We’re all familiar with data storage. Old hands still remember punch cards and magnetic tapes. And then hard disks and USB sticks. In companies, there are databases and data warehouses. And since the well-known article on the Internet of Things (IoT) by experts Porter & Heppelmann, in the Harvard Business Review, there has been a trend towards data lakes (2015).
From data to information to insight
So today, it is easy for a data scientist to drown in such data lake while searching for relevant data. Figure 1 highlights the importance of finding the right data, not any data, but the data with the information in it (information’s ore) that is required to generate insight for business impact. As a result, the search for data and data preparation to extract the information often account for more than 80% of the time budget of a data analytics project (read more here on our empirical study on “Data is broken: The data productivity crisis,” link). And the big avalanche of data is still to come (see Figure 2). So, the time seems ripe for new approaches to data storage anyway, such as dataspaces.
What is a dataspace?
The German federal government’s data strategy describes a dataspace as “a shared, trusted space for transactions with data. A dataspace is based on shared standards (or values, technologies, interfaces), for example, that permit or promote transactions with data.” (Federal government 2021; see also Figure 3 for further definitions). Essentially, dataspaces reverse the traditional logic of data storage. The OpenDEI project describes a dataspace as follows “A dataspace is defined as a decentralised infrastructure for trustworthy data sharing and exchange in data ecosystems based on commonly agreed principles” (OpenDEI project: Design principles for Dataspaces p. 23). It is no longer so important to store all data centrally. Instead it is crucial to ensure that an application, such as a correlation analysis or deep learning algorithm, receives the right data in the right quantity. Just-in-time data sharing, so-to-speak, instead of central data storage. However, the problem to date has been that those involved in a data transaction often do not trust each other to actually share data. There are various reasons for this, including worry about competitive advantage and data protection. In short, data sovereignty, the right to retain control over your own data, is often lost (see Figure 4). As soon as a file is sent, anything can happen to it. New technology, such as the International Dataspaces (IDS) standard, can help here. Even if two parties do not trust each other, because they are competitors, for example, they can still trust in a data transaction that benefits both an end customer and the two parties themselves.
From data lakes to decentralized and federated data storage such as dataspaces
“Creating complete freedom to share data” as the headline article in the “IT Director” trade journal in August 2020 (link) But how can this succeed when uncertainties about the data processing are still part of daily life in practice. In interviews with data science & analytics experts, Daniela Hoffmann from “IT Director” identified clear possible solutions to these challenges. “These days, our attention is clearly turned to the subject of data. In politics too, the right steps are being taken and a great deal of money is being spent on the issue of data, such as for GAIA-X,” said Christoph Schlueter Langdon, responsible for Mobility Dataspaces at the Telekom Data Intelligence Hub and Professor for Data Science & Analytics an the Peter Drucker School of Management of Claremont Graduate University. People have reflected on the quality and correctness of data, because they are now affected by it themselves, see R number, doubling time and cases per 100,000 inhabitants. The coronavirus crisis has not only resulted in an increased interest in valid data, but also in a rethink about central data storage and data lakes, with a new preference for decentralized structures: The best example of this is the coronavirus app, which, after some wrangling about the architecture, for data privacy reasons, ultimately ended up with a fully decentralized design (more on that here: “Corona warning app: Answers to frequently asked questions,” link).
Advantage of dataspaces: Data products just like in a supermarket
To help with decision-making for business intelligence (BI), it was necessary to consolidate as much of the correct data as possible, said Schlueter Langdon: “But today, the following analogy is increasingly frequently applied to data storage and analysis: We all slaughter our own cattle and grow our own vegetables instead of shopping at the supermarket”. So, there is demand for data supermarkets with data products on the shelves (see Figure 5 and “Data is a Product,” Crosby & Schlueter Langdon 2019, link), and data factories, which convert raw data into data products (more on this in our article “Data factories for data products,” link; Schlueter Langdon & Sikora 2020). Approaches like that of the Telekom Data Intelligence Hub provide corresponding functionality on a cloud-based platform with Open Source tools. At the same time, companies can also obtain additional context data there, such as weather or location data, in order to complement their own data.
Advantage of dataspaces: The right data in the right quantity
AI disciplines, such as deep learning, for text, image and voice recognition, increasingly often provide important results but are fully dependent on the data quantity and quality, according to the experts (more on that in our article “Data: Quantity or quality?,” link). Wherever better analysis results are achieved only through large quantities of data (Big Data), it is also the case that companies have trouble accessing adequate data volumes. “There are two contributing factors to this: On the one hand, self-interest – we think that keeping the data for ourselves brings a competitive advantage. On the other hand, GDPR requirements must be met, which many companies still find difficult,” said Schlueter Langdon. “Concepts and standards such as Industrial Dataspaces (IDS) help to generate this data: Simply, because they reduce the barriers to sharing data that we previously did not want to pass on, due to a lack of trust,” said Chris Schlueter Langdon. And timing is everything: The emergence of this technology is coinciding with proposed regulation for data sharing and governance, the proposed Data Governance Act (DGA) by the European Union (EU DGA 2020).
This article is based on a longer article “Creating complete freedom to share data” in IT Director, August 2020 (Link)
For additional insights, please check out:
- “Data factories for data products,” link
- New, connected mobility, such as intermodal travel, enabled by dataspaces, the example of RealLab Hamburg: link
- ”Catena-X With GAIA-X: Will Dataspace Be the Word of 2021?” Link
Benrath, B., and J. Löhr. GAIA-X-Initiative: Die Staats-Cloud kommt. Frankfurter Allgemeine Zeitung (2021-02-13), p. 2
Federal Government of the Federal Republic of Germany. 2021. Data strategy of the federal government – An innovation strategy for social progress and sustainable growth. Cabinet version 2021-01-27, Federal Chancellery, Berlin, www.bundesregierung.de/publikationen
Crosby, L., and C. Schlueter Langdon. 2019. Data is a Product. American Marketing Association Marketing News (April), link
Delhaes, D. 2020. Merkel drängt Autokonzerne: BMW, Daimler und VW sollen Datenschatz teilen. Handelsblatt (2020-10-28), link
Drucker, P. 1992. Be Data – Know What to Know. The Wall Street Journal (December 3)
Drucker, P. 1967. The Manager and the Moron. McKinsey Quarterly (December), link
European Union Data Governance Act (DGA). 2020. Regulation on European data governance (Data Governance Act). Proposal (November 20), link
Fraunhofer, International Dataspaces, Retrieved from https://www.dataspaces.fraunhofer.de/de/InternationalDataSpaces.html, Accessed 2021-01-26
Handelsblatt. 2019. Grenzen des Speichers. Grafik des Tages (2019-05-14): 24-25
IDC report, Worldwide Global DataSphere Forecast, 2020–2024: The COVID-19 Data Bump and the Future of Data Growth (Doc #US44797920)
Otto, B., A. Rubina, A. Eitel et al. 2021. GAIA-X and IDS – Position Paper. International Dataspaces Association, Version 1.0 (January), Dortmund, Germany, link
Porter, M. E., and J. E. Heppelmann. 2015. How Smart, Connected Products Are Transforming Companies. Harvard Business Review (October), link
Schlueter Langdon, C., and R. Sikora. 2020. Creating a Data Factory for Data Products. In: Lang, K. R., J. J. Xu et al. (eds). Smart Business: Technology and Data Enabled Innovative Business Models and Practices. Springer Nature, Switzerland
International Dataspaces Association, OpenDEI project, 2021, position paper. Design principles for Dataspaces. Link