How are Data Engineering & Data Science related (if at all)? Which has good scope in the future?

If you landed here from LinkedIn, then you probably have the right context! Let's continue on the topic directly:

 As per my understanding and experience in #datafield so far, data like a software has lifecycle!🚲

Few stages:

i) Inception (collection of data from disparate systems e.g., transactional systems, sensors, etc.)

ii) Collection (the generated data is collected through different methods and stored at a place)

iii) Cleaning (generally the data collected isn’t processing ready hence some sort of preparation is required)

iv) Processing (using various business logics and/or logical transformations the data is processed so that it gives out some information)

v) Presenting (the well-processed data is then presented using various dashboarding tools/techniques e.g., Tableau, Power-BI)

vi) Intelligence (using the data, the machine learning models are built which can identify patterns and predict the future patterns using mathematical/statistical methods)

 

Normally, first few stages like iii) & iv) are owned by Data Engineers, final one is owned by Data Scientists and the second last by Data Analysts, BUT with time these responsibilities are blurred and there may not be a distinct separation between them.

As per numerous studies done, “on an average a Data Scientist spends good amount of her/his time in just cleaning the data!”  ðŸ˜®

(https://www.reddit.com/r/datascience/comments/bupmyf/data_scientists_spend_up_to_80_of_time_on_data/)

This can give idea that how much the pre-processing of data is important to derive any value out of it!

Hence, it would be safe to say that data engineering is backbone of data science which provides a base ready to do fancy ‘AI/ML modelling and predicting’.

 

As a data engineer, I’ve to understand the business context of the data I interact with while doing cleaning/transformations.

Few of them are:

  • removing duplicate values/NULL values,
  • prepare fields based on biz logic & requirement,
  • validation of data if it is suitable for further stages or not,
  • how will data handling at each stage affect my first-hand customers (i.e., downstream applications which consume data from my pipelines)

So, in a nutshell, a data engineer must be aware of Business Context of data, different mechanisms to handle/process data, pipelining of end-to-end data flow (technical field knowledge)

 

A Data Scientist, on the other hand, must be well versed with Maths & Statistics as they play major role in building ML models on the data provided by data engineer.

Few major activities of a Data Scientist, according to me, may include:

  • Exploratory Data Analysis, 
  • Hypothesis testing,
  • Model building and running,
  • Getting feedback and work on that

Here, again, the business context is of utmost importance. Along with, that having in depth knowledge of Mathematical/Statistical methods for model building and training is important.

So, if you’re still in love with that integration techniques from elementary math days, you belong here to use them to derive values and make predictions out of the data.

 

As mentioned earlier, this distinction of responsibility isn’t always visibly clear. You may find a data engineer working on a dashboard/report for a user using his SQL skills! 😉

OR you may find data scientist doing data cleaning tasks! (This is quite common!) 😅

 

An Important point to consider here is that cloud knowledge is now A MUST! (Same is the case in software word as well I guess!) Pick up any cloud tech and learn about it…this will benefit a lot in the long run! 😃

 

 Which one should one go for and which has good scope?

As the data volume grows even more (we already have BIG DATA!), handling of it will become more and more tricky and skilled data engineers in this task WILL BE NEEDED.

Because as they say: “Your machine learning model is as good as your data!”

 

Having said that, skilled data scientist will churn out more fine details from the data using her knowledge and math and have more powerful data insights and predictions.

 

Ultimately, it all boil downs to your area of interest & skill set.

If you’re someone who loves to play around with data (dirty data!) and come up with some plan to make it processing ready and have got programming background knowledge, you can go for data engineer role.

OR

If you’re someone who loves stats/math and would like to use that to predict the future, go for data scientist role.

 

I personally feel all the roles in any #datadriven industry are of equal importance as they all come together to get value out of data which otherwise wouldn’t have been visible directly!

Thanks for the read & Do share your thoughts! :)

If we haven't met already; linkedin.com/in/sanketmehta7/ we can meet here! (:


Comments

Popular posts from this blog

Should Data Engineers care only about technical knowledge???

Web 3.0!!! What is it?? Why should we learn about it??