When it comes to business information, chief information officers (CIOs) and chief data officers (CDOs) are tasked with bringing order to chaos.
As firms gather ever more data, they face both commercial pressure to do more with the information they hold and increasing regulatory burdens for managing data, especially where it relates to customers.
The situation is made more complex still by the range of tools available for storing and manipulating data, from data lakes and data hubs, to object storage, and machine learning (ML) and artificial intelligence (AI).
According to a survey by storage manufacturer Seagate, as much as 68% of business data goes unused. As a result, firms are forgoing the advantages that data should offer. At the same time, organisations face regulatory and compliance risks if they are unclear about what data they hold, and where.
To address this complexity and make data “work” for the business, companies need to look at their data architecture. At the simplest level, a data architecture is about knowing where the organisation’s data is, and mapping how data flows through it. However, given the vast range of data sources and ways that data can be manipulated and used, there is no single blueprint for doing this. Each organisation will need to build a data architecture that works for its own needs.
“Data architecture is many things to many people and it is easy to drown in an ocean of ideas, processes and initiatives,” says Tim Garrood, a data architecture expert at PA Consulting. Firms need to ensure that data architecture projects deliver value to the business, he adds, and this needs knowledge and skills, as well as technology.
However, part of the challenge for CIOs and CDOs is that technology is driving complexity in both data management and how it is used. As management consultancy McKinsey put it in a 2020 paper: “Technical additions – from data lakes to customer analytics platforms to stream processing – have increased the complexity of data architectures enormously.” This is making it harder for firms to manage their existing data and to deliver new capabilities.
The move away from traditional relational database systems to much more flexible data structures – and the ability to capture and process unstructured data – gives organisations the potential to do far more with data than ever before.
The challenge for CIOs and CDOs is to tie that opportunity back to the needs of the business. Building a data architecture should be more than just a housekeeping or compliance exercise.
“I like to ask the question, ‘what are we able to do with better data, what is it that could be different?” says PA Consulting’s Garrood. “If it doesn’t come with an articulated business problem, then that’s the next place to go.” Physical data architecture, data flows and integration of data sources and applications come after that.
What is a data architecture?
Data architecture is often described as a data management blueprint. Certainly, an effective data architecture needs to map the flow of information through the organisation.
This, in turn, relies on a good understanding of the data being collected and held, the systems it is held in, and the regulatory, compliance and security regimes that apply to the data.
Firms also need to understand which data is critical to operations, and which delivers the most value. As organisations store and process more information, this becomes ever more important. Sometimes it is more art than science.
“It’s the art of understanding that there are few principles you really need to adhere to, and understanding which data is key to the organisation,” says Tim Bowes, associate director for data engineering at data consulting firm Dufrain. “Organisations have oodles of data floating around, but not all of it is absolutely key to operating successfully. Having an understanding of which data is key is fundamental.”
Data architecture has to link to the organisation’s data strategy, and its data lifecycle – but it also relies on sound data management.
Often, organisations split their data architecture into two parts: data supply, and data consumption or exploitation.
Nick Whitfield, KPMG
On the supply side, CIOs and CDOs need to look at data sources, including transactions, business applications, customer activity and even sensors. On the consumption side, firms are looking at their reporting, business intelligence, advanced analytics and even ML and AI capabilities. Some companies will also be looking to exploit data further by selling it on or using it to create new products.
The relative importance of these parts will shape the data architecture.
Consulting firm KPMG, for example, applies what it calls a “four Cs” framework to data architecture – create, curate, consume and commercialise.
According to Nick Whitfeld, the firm’s UK head of data and analytics, create and curate fall into the supply side, separate from consumption and commercialisation. Each side might need its own data architecture.
“I don’t think any organisation can have a single, homogenous data architecture,” he says. “I think there are different types of data architecture for different types of purpose.
“It’s more than just a data model. It’s the collection of processes and the governance framework, the enabling technology and the data standards. Together, these ensure that data is well organised and well controlled, such that it flows through your business processes accurately.”
Why and how to implement a data architecture
The drive to create, or update, a data architecture can come either from changes in technology or changes in the business.
Changing a core component of an organisation’s IT or analytics systems provides an opportunity to look again at data flows. And the move to cloud technology offers a way to update data flows without the need for a “lift and shift” replacement of systems. Instead, changes can be made on an application-by-application, or project-by-project basis.
“Part of the role of the data architect is to paint that picture of what the benefits can look like,” says PA Consulting’s Garrood. “But it’s also to identify what needs to be changed, and what new flows need to be added to the pipeline.”
The switch from data warehouses to data lakes also supports this, as data should no longer be bound to specific applications.
“Firms have a lot of new sources and data,” says Roman Golod, CTO and co-founder of data ops firm Accelario. “They need to not only move to continuous integration between different sources, but to new technologies, including web services and the cloud.”
Golod notes that most, perhaps 80%, of his customers are still working with on-premise systems. But new capabilities increasingly come from the cloud, or hybrid technology.
This allows businesses to look again at that all-important blueprint or data flow, to identify new data sources and to carry out more advanced analytics and ML and AI.
But before they do so, organisations need to put their data house in order.
Data quality and master data management are not, strictly, part of data architecture, but good-quality data remains vital to deliver the business results from an architectural project. Experts who have worked on large-scale data architecture projects say that often, connecting up disparate systems can reveal data quality issues that went unnoticed before. And a clear understanding of which records are the master, or “golden”, data is essential if the business is going to trust the decisions coming from advanced analytics or ML/AI tools.
This is even more so where organisations have large numbers of systems, including older architectures and systems that have built up technical debt. As KPMG’s Whitfield points out, one of his firm’s clients, in oil and gas, has had more than 1,500 data integrations. Integrating those data points into a data lake, for example, raises both practical and compliance questions, as well as those of data standards.
“That information spectrum has to be managed according to the information type, and therefore the underpinning datasets also have to be managed,” he says. “At one end, you have data that need to be highly controlled, highly governed, very, very consistent and, broadly, not touched. At the other end, you are giving data scientists access to large pools of data and letting them go and explore whatever they want. The fact is, the data architecture has to accommodate both ends of that spectrum, which is no easy task.”
Data experts recommend an iterative approach, or looking at data architecture on a project or business case basis. Otherwise, the work risks becoming unmanageable and failing to deliver business benefits. But this still needs to tie into the overall data model the business is working towards. This will always be challenging – too many small projects create their own risks, with varying data standards and isolated silos of information.
Data architecture and the business case
Nonetheless, investing in data architecture can bring a significant, and sometimes rapid, return on investment.
Firms stand to make more use of the data they have, and will be better placed to take advantage of new and emerging platforms and applications, including AI and the cloud. And, as Dufrain’s Bowes points out, an updated data architecture gives firms a better view of their customers, not least by allowing the connection of data from cloud and software-as-a-service (SaaS) systems to existing data stores.
Organisations can also use data architecture to tackle technical debt and ensure that data acquisition and retention policies comply with regulations. But, ultimately, it is about unlocking value in the data the business has already spent money to collect.
“Fundamentally, it is there to model the world we see, and to represent that world in some way,” says PA’s Garrood. “It’s still fundamentally about modelling entities and the relationships between them. It comes down to the same basics about being clear about what you’re trying to achieve.”
But this also needs business leadership, and ongoing management or even curation, and a willingness to put the new data, and insights, to use.
“There’s no point having a data architecture, a beautifully documented thing, without the right data leadership in place,” says KPMG’s Whitfield. “Clearly, there is value from better insights and there are any number of business opportunities there. But physically how we organise our data is a relatively small part of that. It is about the business case leadership, the right operating model, the right governance framework, the right tooling, and then the right culture.”