Lecture 1 - Introduction to DW

Slides:



Advertisements
Liknande presentationer
Page 1 GADD Software en introduktion Publik version, September 2013, gaddsoftware.com.
Advertisements

PETER Nicks Product Marketing Manager
U can’t buy happiness BUT and that is pretty close
Datamining i SQL Server 2005
Create a stunning dashboard and keep your job Patrik Sundqvist.
Business Intelligence
För att uppdatera sidfotstexten, gå till menyfliken: Infoga | Sidhuvud och sidfot Fondbolagsträff 2015.
Hållbara Konsumtions- och Produktionsmönster Varför Jordbruk - Vatten?? 70% av uttaget av vatten från sjöar/vattendrag/grundvatten för jordbruksbevattning-
Beslutsfattande och kunskapshantering
Systemdesign som process
Smarter Analytics 22 maj Waterfront. Är det idag möjligt att bygga rapporter med information från textdokument, Sociala medier, FAQ, Kundenkäter osv?
Microsoft Dynamics AX (fd Axapta)
Användar profiler Analytiker Informationsanvändare Specialistfunktioner 5-10% av användarna 15-25% av användarna 65-80% av användarna Reporting Services.
TUG Konferens Djurönäset 12:e April Patrik Zander, Sr Sales Engineer.
ITO Bild 8-1 Supply Chains Chapter Eight Overview SECTION 8.1 – SUPPLY CHAIN FUNDAMENTALS –Basics of Supply Chain –Information Technology’s Role.
System arbetssystem informationssystem
Anläggnings- & vägmodeller
Samordning inom EU Statusrapport från arbetet inom EUs Expert Grupp för elektroniska fakturor Leif Karlsson Chef Betalningar.
Arkitektrollen. Ansvar och uppgifter Architecture notebook Mycket intensivt elaboration – inception Mål: en stabil arkitektur i slutet på elaboration.
Anything else? Yes, a Windows client "To Go", please! Tim Nilimaa.
DIS 9001:2008 Vilka förändringar kommer i nya standarden Gabriel Bosaeus.
Maximizing windows 8 performance, Troubleshooting tips Johan Arwidmark.
Navision och SQL Server 2005
To practise speaking English for 3-4 minutes Genom undervisningen i ämnet engelska ska eleverna ges förutsättningar att utveckla sin förmåga att: formulera.
Windows Vista: Utrullning Maria Johansson Windows Imaging WIM-filer Flera images per WIM-fil Filbaserat Hårdvaruoberoende Komprimerade Fånga systemläget.
© Gunnar Wettergren1 IV1021 Project models Gunnar Wettergren
1-1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-1 Programmering 7.5 hp Programmering är... creativ, fascinerande, roligt,
Lab Contact 1  Lab Assistants:  Meng Liu, Group B  Sara Abbaspour, Group A
IMPLEMENTING GENDER EQUALITY: SIMPLE METHODS INGENJÖRSKARRIÄR
Speciellt intressant i kundbelåtenhet i Finland Tuovi Päiviö-Leppänen
THINGS TO CONSIDER WHILE PLANNING A PARTY Planning an event can take an immense amount of time and planning. Even then, the biggest problem that arises.
STEPS TO FOLLOW FOR BECOMING A SHIP CAPTAIN A career as a ship captain can be a tedious task. Ship captains take care of business, navigation and operation.
Digitization and Management Consulting
Why you should consider hiring a real estate attorney!
Law abiding grounds of filing a divorce Jagianilaw.com.
Types of Business Consulting Services Cornerstoneorg.com.
Bringapillow.com. Online Dating- A great way to find your love! The words ‘Love’ and ‘Relationship’ are close to every heart. Indeed, they are beautiful!
Work of a Family law attorney Jagianilaw.com. A Family Law Attorney basically covers a wide range spectrum of issues that a family may face with difficulty.
Positioning CM responsibilities in the organisation
Waste management on export
Strategic Sustainable Development
My role model.
Marcus Grindange, COO Abe Zachariah, Backend-utvecklare
You Must Take Marriage Advice to Stop Divorce! Dontgetdivorced.com.
Experience and development of the NOX charge in Sweden
Viveka Palm Deputy Director Regions and Environment, Statistics Sweden
Accounts + SD = ♥? SD indicators generated from an integrated statistical account New report financed by Eurostat, DG Environment and Statistics Sweden.
Eunis Research and Analysis Initiative
Reporting of indicators i Sweden step by step
National Implementation of the GSBPM – The Swedish Experience
Lecture 1 - Introduction to DW
Lecture 1 - Introduction to DW
Applying Analysis Patterns
Applying Analysis Patterns
Publish your presentations online we present SLIDEPLAYER.SI.
Publish your presentations online we present SLIDEPLAYER.RS.
Publish your presentations online we present SLIDEPLAYER.IN.
Publish your presentations online we present SLIDEPLAYER.VN.
Publish your presentations online we present SLIDEPLAYER.RO.
Publish your presentations online we present SLIDEPLAYER.EE.
Publish your presentations online we present SLIDEPLAYER.CO.IL.
Publish your presentations online we present SLIDEPLAYER.AE.
Publish your presentations online we present SLIDEPLAYER.BG.
Publish your presentations online we present SLIDEPLAYER.AFRICA.
Publish your presentations online we present SLIDEPLAYER.MX.
Publish your presentations online we present SLIDEPLAYER.LT.
Publish your presentations online we present SLIDEPLAYER.LV.
Publish your presentations online we present SLIDEPLAYER.SK.
Packaging that makes life easier!
Presentationens avskrift:

Lecture 1 - Introduction to DW Reading Requirements [EN] chapter 26 [CB] chapter 25 [AS] paper 1 ”An overview of Data Warehousing and OLAP Technology” by Chaudhuri & Bayal, Keywords DW, DSS, OLTP, OLAP, MDM, ROLAP, MOLAP, Data Mart

The Data Warehouse - definition B. Imnon: ”A data warehouse is a subject oriented, integrated, non-volatile, and time-variant collection of data in support of manadement’s decisions”. En data lager är en verksamhetsorienterat, integrerat, icke-ombytlig och tids-beroende samling av data ämnat att stödja beslutsfattande på strategisk nivå. S. Chaudhiri & U. Dayal: Verksamhetsorienterat eftersom en datalager är organiserat runt de objekt som finns i verksamheten (så som kund, anställd, leverantör), snarare än kring de applikationsområde som fins ( så som förjälning, lönehantering och inköp), kring vilka de system som används i det operativa verksamnten är byggda. Detta beror just på syftet med en datalager vilket är att stödja beslutsfattande för vilket verksamhetsorienterat - och inte applikations-orienterat data behövs. Integrerat p.g.a att den använder data ur olika skällor (olika applications-orientrade system) Dessa skällor innehåller ofta inconsistent data t.ex. genom att de använder sig utav olika format för att presentera en och samma typ av data. Detta gör att data från de olika skällorna behöver integreras och göras konsistent för att ens kunna arbeta med den och presentera den för användarna. Icke-ombytlig eftersom datalagern uppdateras inte on-line, utan den istället regelbunden uppdateras genom att lägga till data från de operationella systemen. Det är också så att befintlig data ersätts inte utav ny data, utan ny data läggs bara häla tiden på till den befintliga datan. Datalagern integrerar den nya datan till den befinltliga datan. Tidsberoende pga datat i datalagret är korrekt och giltig endast under en viss tidpunkt eller en viss tidsintervall. Det är också så att tiden som man håller data är betydligt längre och man accosierar all data med något slags tidsangivelse (direkt eller indirekt) Slutligen kan man säga att datalagret representerar helt enkelkt ett antal ögonbliksbilder av verksamheten. ”Data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker (executive, manager, analyst) to make better and faster decisions.”

Subject-oriented Operational Systems Informational Systems Sales Customer Data Employee Data Payroll System Purchasing System Vendor Data

Integrated Operational Systems Informational Systems Marketing System Order System Customer Data Billing System

Time variant Operational Systems Informational Systems 60-90 days Customer Data Order System 60-90 days 5-10 years

Non-volatile Operational Systems Create Informational Systems Update Delete Order System Load Access Customer Data Insert

Decision Support and OLAP (by Navathe) Information technology to help the knowledge worker (executive, manager, analyst) make faster and better decisions. Will a 10% discount increase sales volume sufficiently? Which of two new medications will result in the best best outcome: higher recovery rate & shorter hospitality rate? How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years? On-Line Analytical Processing (OLAP) is an element of decision support system (DSS).

Data Warehouse (Navathe) A decision support database that is maintained separately from the organisation’s operational databases. A data warehouse is a subject oriented, integrated, time-varying, non-volatile collection of data that is used primarily in the organisational decision making.

OLTP vs. OLAP holds current data stores detailed data data is dynamic repetitive processing high level of transaction throughput predictable pattern of usage transaction driven application oriented support day-to-day decisions serves large number of operational users holds historic data stores detailed and summarised data data is largely static ad-hoc, unstructured and heuristic processing medium or low-level of transaction throughput unpredictable pattern of usage analysis driven subject oriented supports strategic decisions serves relatively lower level of managerial users

Why separate data warehouse? Performance The operational DBs are tuned to support known OLTP workloads Supporting OLAP requires special data organisations, access methods and implementation methods Function The decision support requires data that may be missing from the operational DBs Decision support usually requires consolidating data from many heterogeneous sources

Architecture Monitoring & Administration Tools Metadata Data sources repository Data sources OLAP servers Analysis Data warehouse External sources Extract Transform Load Refresh Query/Reporting Serve Operational DBs Data mining Falö aöldf flaöd aklöd falö alksdf Data marts

OLAP for Decision Support (Navathe) Goal of OLAP is to support ad-hoc querying for the business analyst Business analysts are familiar with spreadsheets Extend spreadsheet analysis model to work with warehouse data Large data set Semantically enriched to understand business terms (e.g., time, geography) combined with reporting features Multidimensional view of data is the foundation for OLAP

Data Modelling for Data Warehouses See the examples in [EN] chapter 26

Data Modelling for Data Warehouses? A data cube: product p125 fiscal quarter p124 qtr3 qtr2 qtr1 p123 reg1 reg2 region reg3

Data Modelling for Data Warehouses? Pivoted version of the data cube: region region product fiscal quarter fiscal quarter product

Data Modelling for Data Warehouses See the examples in [EN] chapter 26

Star-Join Schema A single fact table and a single table for each dimension Every fact points to one tuple in each of the dimensions and has additional attributes Does not capture hierarchies directly Generated keys are used for performance and maintenance reasons Fact constellation: Multiple Fact tables that share many dimension tables

Snowflake Schema Represent dimensional hierarchy directly by normalising the dimension tables Save storage Reduces the effectiveness of browsing

Approaches to OLAP Servers Relational OLAP (ROLAP) Relational and Extended Relational DBHS to store and manage warehouse data schema design extended SQL Multidimensional OLAP (MOLAP) Array-based storage structure (n-dimensional array) Direct access to array data structure Good indexing properties Poor storage utilisation when the data is sparse.

Mullet-dimensional OLAP (MOLAP) Relational DB server and/or legacy systems End-user access tools MOLAP server data request load result set Database & application logic layer Presentation layer

Relational OLAP (ROLAP) db server ROLAP server End-user access tools SQL data request result set result set Database layer Application logic layer Presentation layer

Managed Query Environment (MQE) Relational DB server End-user access tools SQL result set MOLAP server data request load result set

DB2’s Integration Server Architecture Desktop OLAP Model OLAP Metaoutline Integration Server desktop TCP/IP DB2 OLAP server TCP/IP Server ODBC Relational data source ODBC TCP/IP OLAP Metadata Catalog OLAP Command Interface DV2 OLAP database

Architecture Monitoring & Administration Tools Metadata Data sources repository Data sources OLAP servers Analysis Data warehouse External sources Extract Transform Load Refresh Query/Reporting Serve Operational DBs Data mining Falö aöldf flaöd aklöd falö alksdf Data marts

Back End Tools and Utilities Extract & Transform data selection data cleaning Data migration: “replace the string gender by sex” Data scrubbing: based on domain specific knowledge Data auditing: a variant of data mining data enrichment data aggregation

Back End Tools and Utilities Load full loading: a long batch transaction, takes a long time incremental loading: during refresh Refresh when: periodically e.g., daily or weekly how: extracting the entire source: sometimes the only way when dealing with legacy data sources incremental refresh: supported by replication servers data shipping transaction shipping

Front End Tools - Basic Functionality Pivoting Rollup (drill-up) and Drill-down Slice-and-dice Ranking (sorting) Selection Computed attributes

Metadata Repository warehouse schema view & derived data definitions predefined queries and reports data marts locations and contents data partitions data extraction, cleaning, transformations rules, defaults data refresh and purging rules user profiles, user groups security: user authorisation, access control

Problems of Data Warehousing Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands Data homogenisation High demand of resources Data ownership High maintenance Long duration projects Complexity of integration

Data Warehouse vs. Data Mart (Navathe) Enterprise warehouse: collects all information about subject (customer, products, sales, assets, personnel) that span the entire organisation Requires extensive business modelling May take years to design and build Data Mart: Departmental subsets that focus on selected subjects: Marketing data mart: customer, product, sales Faster roll-out Complex integration in the long term