Data newsletter

Good Morning Data #5 ☕

JacobJustCoding
5 min readSep 30, 2022

Today’s Good Morning Data will focus on Data Mesh and the various aspects associated with it. This is a collection of articles that, when read, will give a good understanding of its concepts, applications and good practices.

Grab a cup of coffee or any other favourite beverage and start the day with some noteworthy material from the world of data. Enjoy reading 📚

Photo by GuerrillaBuzz Crypto PR on Unsplash

Data in MAANG

[Netflix] Data Mesh — A Data Movement and Processing Platform

With the increase in business needs to be supported by Keystone, Netflix wanted to expand its platform to support new business cases.

To this end, Data Mesh was created and is now being used as a general-purpose data movement and processing platform for moving data between Netflix systems at scale.

It should be noted, however, that how this solution is referred to as Data Mesh is a bit confusing because not entirely consistent with Zhamaka Dehghani's definition. I would describe it as a small part of Data Mesh, which, combined with other Netflix solutions, could be seen as DM.

However, as it has been mentioned, this is a very initial stage of development of this platform, and further steps will undoubtedly allow shaping a complete system.

[Jochen, Larysa, Simon] Data Mesh Architecture. Data Mesh From an Engineering Perspective

As Data Mesh is a very new concept in approaching the organisation of structure within companies, and its idea is undoubtedly noteworthy.

Understanding the various aspects involved is difficult as we cannot analyse many production applications of this approach.

However, this website can provide a comprehensive understanding of the Data Mesh as it provides an introduction in the form of WHY, WHAT, WHEN, which is extremely important at the beginning of the Data Mesh journey.

After that, you can then begin to analyse the four main principles:

  • Domain Ownership,
  • Data as a Product,
  • Self-serve Data Platform,
  • Federated Governance.

As a result, you will be able to consider whether Data Mesh is an approach that is suitable for your organisation.

[Philipp Beyerlein] Data Mesh to Go: How to Get the Data Product

As one of the principles is Data as a Product, it is obvious that designing Data Products within the Data Mesh is crucial.

Recalling how Data Product is defined:

Data product is an autonomous, read-optimized, standardized data unit containing at least one dataset (Domain Dataset), created for satisfying user needs.

However, as is usually the case, everything is fully comprehensible in theory, and the implementation is more challenging. And it is the same here. With a large number of artefacts in the form of data sources, the creation of a design for a Data Product based on DDD-compliant artefacts is not so obvious.

Finding attributes relevant to a key business domain from a set of aggregates requires a deep understanding of the nature of the business process and its key metrics.

For this reason, it is worth looking at two case studies that show a simplified way of reasoning in Data Product identification.

[Zhamak Dehghani] How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh

Nowadays more than ever, with so much data being generated, proper organisation and management of the data is a key aspect of making the right business decisions.

The data warehousing (first generation data platform type) or data lake (second generation data platform type) architectures currently in use are based on a paradigm of centralisation of the source of truth, where the boundaries between domains are blurred and a domain agnostic data ownership approach is maintained.
Along with this have come problems in organisations with multiple domains related to:

  • increasing technical debt,
  • management of the entire data infrastructure by a dedicated (central) team,
  • proliferation of data,
  • relatively long implementation cycle of data transformations to meet customer expectations.

I encourage you to start looking at data-driven organisations in a way that is consistent with the nature of data, i.e. distributed nature of data.
A proposal for such an approach is the Data Mesh, but within the scope of the article the most relevant issues are:

  • the transition from a centralised, monolithic architecture to a distributed one as a paradigm shift,
  • changing the way of thinking “push and ingest” into “serve and pull”,
  • changing of architectural quantum from pipeline stage to single domain.

This article is a must-read as it is an insightful introduction that brilliantly covers the current limitations of architectures, the challenges of maintaining them and the introduction of Data Mesh, and gives a broad view of the shift of data ownership to domain oriented data ownership.

[Kineret Kimhi] Do’s and Don’ts of Data Mesh

As usual, it is also worth considering the guidelines on good and bad practices when implementing and using solutions here Data Mesh.

In a nutshell:

  • shifting to Data Mesh should be strongly followed according to a defined Data Governance Framework so that cooperation between teams is always possible in a similar structured way; data owners are widely known in the organization to avoid data orphans and loss of data quality and security,
  • it should be clearly emphasized that this is a completely new concept. Not everyone in the organization that is implementing it feels familiar with it at the moment, which is why it is worth doing regular updates, trainings, or Q&A sessions so that everyone can share insights and relevant conclusions at the stage of implementation and use of this approach,
  • each organization is unique in terms of the solutions it uses, technology stack, business needs, teams, time-to-market, etc. for this reason, you should always think with common sense and usually meet the main postulates of the concept, but sometimes departing from the rules having a strong justification for this is a very good idea if it allows the company to function better and more efficiently,
  • do not flood the organization with a mass of deployments related to Data Mesh. This change takes time, especially in organizations that have been operationally stable for a long time. Since it is associated with domain-driven design, it may be a good idea to start the increment by implementing one domain while leaving the others stable.

--

--

JacobJustCoding

Data engineering and Blockchain Enthusiast. Love coffee and the world of technology. https://www.linkedin.com/in/jakub-dabkowski/