Getting Data Engineering practice right…
Post 2010 data professionals.
Today no one wants to be a DBA for several reasons. One, the role is archaic and has become less significant as most DBA tasks are now automated with the rise of platform engineering and cloud DbaaS services. Plus database programming and scripting has become less prominent as software developments moved on to microservices architecture where compute, data and app layers are separated for efficient, agile design.
Secondly, post 2010 we entered the progressive digital age of the Internet and Internet of Things (Iot) where more data was digitised and even more was generated. Cloud computing, web, mobile applications, data driven digital businesses and big data are all common themes of this data obsessed period that is still at its peak. That started to change the scope of tasks for data tooling professionals as data driven digital businesses thrive by extracting value from big data coming from various sources such as Web Logs; Social Media; OLTP system such as ERP and CRM; Documents; emails; Machine generated (IoT); and Cloud Services such as Saas.
Data Engineering (DE) practice in Internet Companies.
The number of data users too have broadened as every employee in the organisation is now an empowered data user, though with varying levels of technical grasp. This means data tooling professionals are now expected to serve multiple groups such as analytics, machine learning, data science, digital services development and novice business users with ready data for further downstream use and analysis.
Instead of RDBMS which used to be the central focus of any enterprise data project prior to 2010, data tooling professionals now master in the fabrication of technologies and techniques involved in the process of capturing, moving, organising, cleaning, formatting and serving data from source to where it’s needed (business users, applications, analytics, data science, ML models). They integrate tools and technologies for ingestion, process, transform and store data to build data pipelines that are repeatable, compliant to policies, consistent in performance, scalable, optimal and reliable in executing data jobs. Due to this shift in the definition of the role, the term ‘Data Engineering’ was introduced by internet companies such as Facebook and Airbnb, and adoption grew thereafter.
The tumultuous transition of DE for large corporations.
Digital startups are nimble, most have strong technology leadership, making DE practise transition seamless. On the contrary, many technology organisations in large corporations in sectors such as financial services, communication and Healthcare still struggle to morph into this new design, where DE is separated from IT as an autonomous division in order to support data and ML driven business approach. Miscommunication on goals, expectation and task ownership is common, causing data projects to fail frequently, erode confidence and value of DE practice to the overall business. Weak executive sponsorship, lack of understanding in DE operational strategies, underdevelopment of skills, poor job descriptions, inconsistent change management and unaligned goals are just some of the problems plaguing DE transformations in large corporations.
In this post, I discuss three critical pillars of high performing DE practice, i) DE operational framework; ii) work culture; and iii) executive support and sponsorship to build the foundation of a DE practice that meets organisational growth goals and effectively support success of other data projects lead by Data Science, Machine Learning, Advance Analytics, Digital Services and Lines of Businesses.
The Data Engineering Framework - Holistic view of your DE practice.
The Data Engineering Framework is composed of 4 critical tiers. i ) the Digital Services tier which describes the digital services ii) the operational tier which describes the stages and steps of DELC , iii) the methodology tier to describe the DELC delivery mechanism through adoption of Agile Software Engineering, DevOP, Platform Engineering and Chaos Engineering; and lastly iv) the Center Of Excellence tier which maximises and speeds applications of learnings and findings extracted from day to day operation.
The service tier. The service tier describes the ready to consume data services delivered to users and subscribers. The aim of this layer is to increase access and utility of existing data sets across the business. Aside from access to fully formatted data, mature digital businesses can extend frontline services by empowering business units to build, curate and publish their own domain specific data products. For instance via a central Data-as-a-service platform and Center of Excellence (COE) resources facilitated by DE.
Keep reading with a 7-day free trial
Subscribe to Mindful Geek to keep reading this post and get 7 days of free access to the full post archives.


