Data Mesh : Is it here to kill Data Engineering jobs???? 😦🤔
Hey There, 🖐
Welcome again to my blog!! 😁
Today I want to talk about something I came across recently. The term is known as Data Mesh!
The folks from networking field would probably relate it to mesh topology where each node in any network is directly connected other node.🤓
"Is it the same concept applied to Data or more specifically Data Warehousing, Sanket?" you might ask.
Well, let's try to decode and learn about this new concept which was coined by Zhamak Dehghani from thoughtworks for better handling of data in our warehouses.
Mind you, this is still evloving and we may need to catch up with it in order to be updated(like any other field! 😅).
What is Data Mesh??
Data Mesh is applying decentralization in our traditional warehousing methods in order to have data as a product(DaaP) for individual domains.
Another way to put it would be:
"Much in the same way that software engineering teams transitioned from monolithic applications to microservice architectures, the data mesh is, in many ways, the data platform version of microservices."
So, a microservice way of a data platform, huh?? 🙄
Let's try to understand a bit more, shall we? 😁
There are certain principles behind this concept which are as follows:
1. Domain driven data ownership
"The domain teams are responsible for the data lifecycle."
2. Data as Package(DaaP)
"Instead of seeing data as a service, you see the data as a packaged product."
3. Self-service infrastructure as a platform
"Give more powers to the users as to work with the Data."
4. Federated Computational Governance
"Since we are dealing with distributed data, the governance part should also be distributed."
According to starbust, who's been in the data industry for 3 decades, DataMesh can give you these benefits:
FYI: I saw the periodic table after years in one of the introductory video of starbust!! Nostalgia!! 😄
- Business Agility and Scalability
Data mesh powers decentralized data operations, independent team performance, and data infrastructure as a service provision, resulting in improved time-to-market, scalability, and business domain agility. It eliminates the process complexities and IT backlog to reduce operating and storage costs.
- Faster Access and Accurate Data Delivery
Data mesh offers easily governable and centralized infrastructure based on a self-service model without underlying complexity for faster data access and accurate delivery. Businesses can access data from anywhere with SQL queries with much lower latency. The distributed architecture reduces the processing and intervention layers that delay time to insight.
- Flexibility and Independence
Enterprises adopting data mesh architecture are becoming vendor-agnostic businesses that are not locked in with one data platform. The distributed infrastructure allows companies unparalleled flexibility and choices due to connectors to many systems.
- Platform Connectivity and Data Security
The decentralized framework allows cloud applications to be connected to on-site sensitive data, which can be live streaming or existing on devices in real-time. Data mesh queries/compiles data analytics where the data resides, instead of requiring users to make a copy and route it through a public network to a data warehouse.
It eliminates the risk of data breach or information loss to improve security and reduces data latency to improve overall performance in various use cases including, live streaming, online gaming, financial trading, etc., through platform connectivity in a distributed model.
- Robust Data Governance for End-to-End Compliance
Distributed architecture reconciles data ingestion with its sources, formats, and volumes to allow businesses to control their security at the source system. The decentralized data operations simplify compliance with global data governance guidelines for quality data delivery and ease of data access.
- Cross-Functional Teams for Improved Transparency
The centralized data ownership of traditional data platforms isolates expert teams, creates a lack of transparency, and fails to provide contingency against data control/ownership loss. Data mesh decentralizes data ownership by distributing it among cross-functional domain teams, including domain experts, business teams, IT, and agile virtual teams through its domain-oriented approach for improved transparency and data quality.
So, now we know something about this: a natural question would be:
"Sanket, now that we are going away from a centralized way of handling/managing data will it make data engineering redundant?"
Trust me, this was the exact question I had for last 2 days, since I read about this!
(Bhai...job khatare me lagti hain!! :P Naye skills sikho!! 😕)
But, here's a broader viewpoint on this(You can find all the resources at the end):
"Data Mesh is not about removing data engineers but about better data engineering management and creating new career paths and opportunities."
Right now, data engineering skills are centralized, meaning they’re multi-domain experts for ETLs across all domains and data. Data engineers are expected to clean, aggregate, and transform the data — all of which requires deep technical expertise surrounding the technology and without any real connection about the business. That creates friction with data analysts as the dataset they receive from engineers might miss the mark with what the analysts really needed. All of this creates an environment for potential burnout for many data engineers for a position that’s already operating at a reduced capacity.
Data engineers within a Data Mesh architecture remove themselves as a bottleneck and they support an ecosystem of data products within each business domain rather than for the entire organization. When data engineers work within the mesh, they understand the data itself within their business domain. They’ll also have all the necessary expertise around their data and its uses, and are able to react quickly to changing market conditions or internal requirements.
With a decentralized approach, data engineers in the domains can access and manipulate data easily. Data Mesh enables data engineers and consumers friction-less access to data, across both cloud providers and on-premise data, creates an ease of viewing data in different formats, eliminating copying data from one technology stack to another, and connecting to data, wherever it is.
Not only data engineers, but data consumers can be responsible for all aspects of the data product lifecycle, including correcting missing or incorrect data.
So in a nutshell, job bach gayi..lekin need to be very adaptive! 😅🤭
Please note that DataMesh is not a technological change, rather a cultural shift that any organization has to adapt to, like DevOps in order to be more responsive to the business needs of data!
Now, when can any organization/team conclude that they need not move to DataMesh from traditional DataLake or DataWarehouse?
This blog tries to answer that exactly:
In conclusion, this paradigm shift is needed to look at the human side of technology and to close the time and space gap between an event happening and its consumption/process for analysis.
I am eagerly following this space 🤩 in order to be relevant! Are you or will you??
Let me know your thoughts!
As always thanks for the read!😃 Any feedback is most welcome!
Oh..yeah! If we still haven't met on LinkedIn, do hola 🖐 at me: https://www.linkedin.com/in/sanketmehta7
For more information please refer to below reference links:
Comments
Post a Comment