Data Warehouse & Data Lake
Enabling your intelligent automation journey.
Gauri has been at the forefront of architecting and delivering data warehouse and data lake solutions for our clients for more than 5 years. Powered by our robust digital platforms, business insight and technology understanding, we have been our customers preferred partner in this space.
Why Data Warehouse or Data Lake
Big data infrastructure such as data lake provides required data pipes where relevant data within the company and external (i.e. structured, unstructured, streaming) can be stored, processed, and aligned for data-driven insights and decisions.
There is a big move towards the cloud and also enhancing the scope of analytics beyond maintaining data warehouses for OLAP activities which work on a limited set of preconfigured data sets.
What is a Data Lake
Data lake allows collection of raw data structure, unstructured, streamed and realtime data in huge quantity. This data can then be processed and used by machine learning and AI algorithms to unravel intelligent insights, create predictive models and data driven solutions.
There are many options available for creating data lakes including open source Hadoop and Spark. Other major players are Microsoft Azure, AWS, Google cloud, and Cloudera.
What is the difference between a Data Lake and Data warehouse
In the past data analysis (OLAP) involved building data-warehouse these collected structured data cleansed and transformed in predefined reporting models, either aggregated or timestamped. Hence data-warehouse contains limited and regulated data. Whereas Data lake can have raw data generated and relevant for any organisation, transformation can be done at a later stage based on the analysis undertaken.
Hence, unlike ETL (Extract Transform Load) functions required in the Data warehouse, Data Lake follows the ELT (Extract Load Transform) route.
Insights generated from the data warehouse is limited by the reporting models for storing data, whereas data lakes have no such limitations. It is important to recognise that only 20% of the data generated today is structured and 80% is semi-structured or unstructured. Where data warehouses can guarantee governance, performance, and security, in case of data lakes this has to be incorporated in the data strategy and design.
Data lake implementation
Data challenge is quite unique as there is a choice between going big bang or doing it in smaller increments. The majority of the data projects are deemed a failure, this is because of all the hype surrounding possibilities, the current lack of expertise in this area and hence choosing the right approach. Also, since most of the data lakes are implemented using open source technologies, having the right strategy and design in place is key to success.
Request for a call
This masterclass event was delivered by Ajit Jaokar on 26th Feb 2020. Ajit is a Principal Data Scientist/AI Designer at the University of Oxford, and a top-rated influencer in the World Economic Forum. We are pleased to present a brief profile of Mr. Jaokar and key...
CRM offers a definitive customer view across functions, channels, products and customer data types. It drives every customer interaction. CRM aspires to recreate the ‘traditional corner shop’ experience to millions of clients. Quality data is required to achieve...
The Steering Committee plays a key role to Drive, Vision, Governance, and Budget and hence overall success on large implementation projects. Typically, Steering committees will have key stakeholders, the Programme Sponsor - CIO/IT Director, Senior Business...