Databaser Chapter Six Overview SECTION 6.1 – DATABASE FUNDAMENTALS Understanding Information Database Fundamentals Database Advantages Relational Database Fundamentals Database Management Systems Integrating Data Among Multiple Databases SECTION 6.2 – DATA WARAEHOUSE FUNDAMENTALS Accessing Organizational Information History of Data Warehousing Data Warehouse Fundamentals Business Intelligence Data Mining Chapter 6 introduces: Data Information quality Databases Data mining Data warehouses in detail and highlights why and how information adds value to an organization
Understanding information Information is everywhere in an organization Employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions Successfully collecting, compiling, sorting, and analyzing information can provide tremendous insight into how an organization is performing Kunskap och information har ett värde Kostnad för att byta ut en skruv 1.101 kr Skruven 1 kr Arbetskostnad 100 kr Kunskap och information om arbetet 1.000 kr Granularity refers to the extent of detail within the information (fine and detailed or “coarse” and abstract information) Have you ever had to correlate two different formats, levels, or granularities of information? How did you correlate the information? Taking a hard look at organizational information can yield exciting and unexpected results such as potential new markets, new ways of reaching customers, and even new ways of doing business
Understanding information Information granularity – refers to the extent of detail within the information (fine and detailed or coarse and abstract) Levels Formats Granularities This is a good place to discuss the Samsung Electronics and Staples examples from the text Students should understand that information varies and different levels, formats, and granularities of information can be found throughout an organization CLASSROOM EXERCISE Organizing Information Break your students into groups and assign each group a different information type from Figure 6.2 Ask the students to find examples of the different kinds of information they might encounter in an organization for their information type For example, information formats for a spreadsheet might include a profit and loss statement or a market analysis Ask your students to determine potential issues that might arise from having different types of information Ask your students what happens if the information does not correlate For example, the customer letters sent out do not match the customers and customer addresses in the database For example, the total on the customer’s bill does not add up to the individual line items
Information Quality Business decisions are only as good as the quality of the information used to make the decisions Characteristics of high quality information include: Accuracy Korrekt? Completeness Saknas något? Consistency Hänger det ihop? Är det logiskt? Uniqueness Dubellagrar vi information? Kan vi komma åt information direkt? Timeliness Är informationen aktuell? + Är informationen spårbar Do you have any examples of a time when you encountered a problem due to low quality information? For example, you did not receive a package because the address was incorrect or missing List the business ramifications that can occur for an organization that maintains low quality information Characteristics of High Quality Information Accuracy Are all the values correct? For example, is the name spelled correctly? Is the dollar amount recorded properly? Completeness Are any of the values missing? For example, is the address complete including street, city, state, and zip code? Consistency Is aggregate or summary information in agreement with detailed information? For example, do all total fields equal the true total of the individual fields? Uniqueness Is each transaction, entity, and event represented only once in the information? For example, are there any duplicate customers? Timeliness Is the information current with respect to the business requirements? For example, is information updated weekly, daily, or hourly? CLASSROOM EXERCISE Inquiring about Information Break your students into groups and ask each group to provide an additional example of each of the five common characteristics of high quality information that is not provided in the above figure For example, Accuracy – does a purchase price on a bill match the item description on the bill? Item 1: Kids juice cup, cost $10,000 Chances are a kids juice cup would not cost $10,000 and this is an inaccurate item
Information Quality Low quality information example Walk-through each of the six issues and have your students extrapolate a potential business problem that might be associated with each issue. The example does not state what type of database or spreadsheet this information is contained (sales, marketing, customer service, billing, etc), so allow your students use their imagination when they are extrapolating the potential business problems Issue 1: Without a first name it would be impossible to correlate this customer with customers in other databases (Sales, Marketing, Billing, Customer Service) to gain a compete customer view (CRM) Issue 2: Without a complete street address there is no possible way to communicate with this customer via mail or deliveries. An order might be sitting in a warehouse waiting for the complete address before shipping. The company has spent time and money processing an order that might never be completed Issue 3: If this is the same customer, the company will waste money sending out two sets of promotions and advertisements to the same customers. It might also send two identical orders and have to incur the expense of one order being returned Issue 4: This is a good example of where cleaning data is difficult because this may or may not be an error. There are many times when a phone and a fax have the same number. Since the phone number is also in the e-mail address field, chances are that the number is inaccurate Issue 5: The business would have no way of communicating with this customer via e-mail Issue 6: The company could determine the area code based on the customer’s address. This takes time, which costs the company money. This is a good reason to ensure that information is entered correctly the first time. All incorrect information needs to be fixed, which costs time and money
Understanding the Costs of Poor Information The four primary sources of low quality information include: Online customers intentionally enter inaccurate information to protect their privacy Information from different systems have different entry standards and formats Call center operators enter abbreviated or erroneous information by accident or to save time Third party and external information contains inconsistencies, inaccuracies, and errors Addressing the above sources of information inaccuracies will significantly improve the quality of organizational information Determine a few additional sources of low quality information A customer service representative could accidentally transpose a number in an address or misspell a last name
Understanding the Costs of Poor Information Potential business effects resulting from low quality information include: Inability to accurately track customers Difficulty identifying valuable customers Inability to identify selling opportunities Marketing to nonexistent customers Difficulty tracking revenue due to inaccurate invoices Inability to build strong customer relationships Can you list any additional business effects resulting from poor information? (focus on organizational strategies such as SCM, CRM, and ERP) Poor information could cause the SCM system to order too much inventory from a supplier based on inaccurate orders Poor information could cause a CRM system to send an expensive promotional item (such as a fruit basket) to the wrong address of one of its best customers What occurs when you have the inability to build strong customer relationships? Increase buyer power Gartner podcasts are excellent course resources, there is current a good podcast on the cost of poor data to an organization http://www.gartner.com/it/products/podcasting/about_gartner_voice.jsp
Konceptuell modellering Objekt Relationer Egenskap Entiteter Så här långt har vi tagit upp modeller i allmänhet. Men nu kommer vi att i mer detalj gå in på en viss sorts modeller, s.k. Konceptuella modeller eller begreppsmodeller. Konceptuella modeller försöker att beskriva en verksamhet m.h.a. Ett litet antal grundbegrepp, objekt, relationer, attribut och klasser. Ett objekt är ting eller företeelser, t.ex. personer eller bilar eller byggnader. Objekten är relaterade till varandra, personer kan t.ex. äga bilar. Objekten har olika egenskaper, vilket man här kallar attribut. En person kan t.ex. ha en längd eller en hårfärg. Vi grupperar också samman likartade objekt i klasser, t.ex. alla personer i en klass som man kan kalla Person. I de följande bilderna går vi in i mer detalj på vart och ett av dessa grundbegrepp.
Objekt Intressanta företeelser Konkreta Abstrakta Icke existerande 7 Det viktigaste grundbegreppet i konceptuell modellering är Objektet. Med ett objekt avses ett ting, en företeelse, ett fenomen som är av intresse i ett visst sammanhang. När man håller på och modellerar en verksamhet är det ofta konkreta objekt man identifierar först. Konkreta objekt är fysiska objekt som man kan se och röra vid. Det kan t.ex. vara en person som Napoleon, eller en byggnad som Eiffeltornet, eller en viss bil. Men också mer abstrakta företeelser kan vara objekt. Beethovens femte symfoni är ett objekt, olika valutor är objekt, tal är objekt. När man modellerar en verksamhet är det vanligen så att det för vissa företeelse är uppenbart att man vill göra dem till objekt i sein verksamhet. Modellerar man t.ex. ett bibliotek är det självklart att man vill betrakta böcker som objekt. Men för många andra företeelser är det inte alls självklart om de skall betraktas som objekt eller inte. Man måste då helt enkelt bestämma sig och välja om man vill se de företeelserna som objekt. Det som brukar avgöra om något skall bli objekt i den konceptuella modellen är om det har intressanta egenskaper eller har många relationer till andra objekt. Icke existerande Evighetsmaskin Cirkelns kvadratur
Relationer Relationer beskriver samband mellan objekt Har_far äger Gift_med Det här tar oss vidare till nästa grundbegrepp: Relationerna mellan objekt. Objekt existerar inte i isolering utan de är alltid relaterade till varandra. En person kan t.ex. äga en bil En person kan också ha flera olika typer av relationer till andra objekt. En person kan t.ex. ha en far, ha en mor, och vara gift med en annan person - tre olika slags relationer. Mellan två objekt kan det också finnas flera relationer, t.ex. kan en person både bo i ett hus och äga samma hus. Det finns alltså två olika relationer mellan de här båda objekten. Har_mor
Mängder/klasser & Attribut reg. no model year colour Gruppering av liknande objekt name age salary hair colour Det sista grundbegreppet i konceptuell modellering är Klasser. En klass grupperar samman ett antal objekt som liknar varandra. Vi kan t.ex. samla ihop bilar och skapa en klass för dem som vi kallar för BIL. Eller vi kan samla ihop personer och bilda en klass som kan heta PERSON. Klasserna hjälper oss att strukturera upp beskrivningen av en verksamhet så att man får en bättre överblick. Man kan inte göra beskrivningen på en så låg nivå att man ser varje enskilt objekt i modellen utan man måste upp på en högre, mer abstrakt nivå. Och då behöver man använda klasser.
Example Conceptual Model married to PERSON CAR name age salary hair colour reg.no. model year colour owns Vi har nu kommit så långt att vi kan rita upp en första enkel konceptuell modell. Ofta uttrycker man konceptuella modeller som grafer. Man ritar t.ex. en rektangel för varje klass, man skriver upp attributen inuti rektangeln och man ritar pilar mellan rektanglarna för relationerna. Om vi fortsätter på det exempel vi haft tidigare så handlade ju det om personer och bilar. Så det verkar naturligt att införa två klasser, två rektanglar, en för PERSON och en för BIL. Vi skriver sedan upp attributen för PERSON inuti rektangeln, och på samma sätt för BIL-klassen. Sedan är det dags att rita ut relationerna som pilar. En person kan äga en bil. En person kan vara gift med en annan person. Det kan här vara på sin plats att reflektera något över den här modellen. Är det en bra modell? Har man nytta av den? Ja, en styrka med modellen är att den snabbt ger en överblick över verksamheten - man ser att det centrala är personer och bilar och inget annat. Man ser också direkt hur personerna och bilarna är relaterade till varandra och vilka egenskaper de har. Det här ser man betydligt enklare än om man haft en lång text i naturligt språk som beskrivit samma sak. Att modellen är gratisk och tvådimensionell gör att den också fungerar bra som ett hjälpmedel för kommunikation, det är lätt för andra personer att förstå modellen, att kritisera den och det är lätt att föreslå ändringar genom att rita direkt i modellen.
Tabeller/filer och databaser Artiklar Tabell (Register, File, Fil) Kolumn (Attribute, field, fält) Art.nr Namn Pris LagerAntal 1 Skruv 1.50 1.000 2 Mutter 1.20 300 3 Bricka 0.50 850 Rad (Record, Post) DATABAS = Alla tab. Och relationer Operationer från program Ny artikel (4, Spik, 0.70, 0) Nytt pris (1, 1.60) Antal i lager (2) Ta bort (3) ARTIKLAR BESTÅR-AV LEVERERAR INGÅR -I LEVERANTÖRER ORDER Vilka artiklar har kund nr 51 köpt det senaste året ? HAR KUNDER
Olika typer av tabeller REGISTER ANVÄNDNING PERSON KUND ARTIKEL BOKNINGS LÄKEMEDELS löneberäkning fakturering lagerhållning tid- och platsbokning recept
Nyckel/ Primary Key Unikt sökbegrepp i ett register: Primärnyckel, ID-begrepp EX Personnummer i ett personregister: PersNr, Namn, Adress, Lön Kundnummer i ett kundregister: KundNr, Namn, Rabatt, Kategori Undvik klassificerande ID-begrepp De spricker ofta ex xxxxx xx xxx Typ av sko Storlek Färgkod Vilken bonus har jag ? Vilket är Ditt försäkringsnummer ??
En övning GUTE CYKELUTHYRNING på soliga Gotland har4.500 cyklar till uthyrning. Administrationen har har blivit så betungande att man beslutat att anskaffa en PC. Vi skall nu konstruera systemet åt dem. Vi börjar med databasen. Det grundläggande är att varje cykel har ett ingraverat ramnummer. I samband med uthyrning betalar man en depositionsavgift som varierar mellan olika cyklar. Vid återlämningen skall systemet skriva ut ett specificerat kvitto. Kostnaden är cykelns dagspris gånger antalet dagar. Naturligtvis skall cykelregistret uppdateras vid exempelvis nyanskaffning. Två listor skall kunna tas fram: 1. Cyklar som just nu är uthyrda. Skall framgå vem som hyr. 2. En totallista över samtliga cyklar som skall uppta anskaffningsår och hur många dagar den totalt varit uthyrd sen anskaffningen. UPPGIFT: Gör en postbeskrivning för cykelregistret, vilket skall vara det enda förekommande registret.
Relationer i en databas Fall 1 (ett till ett) En fastighet kan bara ägas av en person och en person kan bara äga en fastighet person PNR ** NAMN ADRESS FNR* fastighet FNR ** KOMMUN PRIS PNR* Fall 2 (ett till många) En person kan äga fler fastigheter men en fastighet kan bara ha en ägare person PNR ** NAMN ADRESS fastighet FNR ** KOMMUN PRIS PNR* Fall 3 (många till många) En person kan äga fler fastigheter och en fastighet kan ha fler delägare person PNR ** NAMN ADRESS fastighet FNR ** KOMMUN PRIS ** Primärnyckel * Sekundärnyckel, referens
Exempel på relationer PERSON FASTIGHET 1 - M ** ** * ** ** * PNR NAMN ADR FNR KOMMUN PRIS PNR 1 KALLE A-GATAN A STHLM 1.000 1 2 PELLE B-GATAN B GBG 2.000 3 NISSE C-GATAN C MALMÖ 1.500 1 4 ULLA D-GATAN D LUND 5.000 4 PERSON FASTIGHET
M - M kräver ett relationsobjekt person PNR ** NAMN ADRESS ägarför- hållande PNR* FNR* % ANDEL fastighet FNR ** KOMMUN PRIS 1 KALLE A-Gatan. 1 A 100 A STHLM 1.000 2 PELLE B-Gatan 1 D 50 B GBG 2.000 3 NISSE C-Gatan 4 D 50 C MALMÖ 1.000 4 ULLA D-Gatan D LUND 2.000 Kalle äger hela A och halva D Ulla äger halva D
Kopplingstabell - Ytterligare ett exempel Person Företag Person Anställning Företag Pnr Fnr Lön Pnr Namn Fnr Namn Namn Adress Pnr Namn 1 Kalle 2 Pelle 3 Stina 4 Lisa A ABB B HM C Univ. D Ericsson A ABB B HM C Univ. D Ericsson 1 Kalle 2 Pelle 3 Stina 4 Lisa 1 A 10.000 2 A 20.000 1 B 30.000 3 C 15.000 LÖN ????
Increased Flexibility A well-designed database should: Handle changes quickly and easily Provide users with different views Have only one physical view Physical view – deals with the physical storage of information on a storage device Have multiple logical views Logical view – focuses on how users logically access information The separation between logical and physical views is what allows each user to access database information differently What would happen if a new database called “RealData” hit the market and allowed only one logical view? The “RealData” database simply would never sell. With only one logical view every person in an entire organization would have the same view Define two database views for your school’s student database (one for students, and one for instructors) What does the student view display when a student accesses the school’s student database? Courses enrolled Grades Tuition Credits for graduation What does the instructor view display when an instructor accesses the school’s student database? Courses teaching Students in each course Payment information Vacation time
Increased Scalability and Performance A database must scale to meet increased demand, while maintaining acceptable performance levels Scalability – refers to how well a system can adapt to increased demands Performance – measures how quickly a system performs a certain process or transaction What happens to a business if its suddenly experienced a 60 percent growth in sales and its IT systems fail with all of the increased activity? Remind your students that a big part of developing successful IT systems is being able to anticipate future growth CLASSROOM EXERCISE Building an ER Diagram Break your students into groups and ask them to create an entity relationship diagram similar to the one in Figure 6.5 for a company or product of their choice. If the students are uncomfortable with databases, you should recommend that they stick to a company similar to the TCCBCE, perhaps a snack food producer, mountain bike equipment producer, or even a footwear producer. If your students are more comfortable with databases, ask them to choose a company that would challenge them such as a fast food restaurant, online book seller, or even a university’s course registration system. The important part of this exercise is for your students to begin to understand how the tables in a database relate. Be sure their ER diagrams include primary keys and foreign keys. Have your students present their ER diagrams to the class and ask the students to find any potential errors with the diagrams.
Reduced Redundancy Databases reduce information redundancy Redundancy – the duplication of information or storing the same information in multiple places Inconsistency is one of the primary problems with redundant information One of the primary goals of a database is to eliminate information redundancy by recording each piece of information in only one place This is a good time to tie the discussion back to the material in the previous chapter, low quality information Recall what happens when a single customer is stored twice with different phone numbers, addresses, or order information in a single database
Increased Security Information is an organizational asset and must be protected Databases offer several security features including: Password – provides authentication of the user Access level – determines who has access to the different types of information Access control – determines types of user access, such as read-only access Why you would want to define access level security? Access levels will typically mimic the hierarchical structure of the organization and protect organizational information from being viewed and manipulated by individuals who should not have access to the sensitive or confidential information Low level employees typically have the lowest levels of access High level employees typically have access to all types of database information For example: You would not want analysts viewing all salary information for the entire company - in general: Analysts can usually only view their own salary Managers have higher access and can view the salaries of all their team members, but cannot view other managers’ salaries Directors can view all of their managers’ and analysts’ salaries, but not other directors’ salaries The CFO and CEO can view every employee’s salary
database for Coca-Cola Potential relational database for Coca-Cola Walk your students through the relational database model in Figure 6.5 To ensure your students are grasping the concepts, ask them to answer the following: How many orders have been placed for T’s Fun Zone? Ans: 1 Order IT 34563 How many orders have been placed for Pizza Palace? Ans: None How many items are included in Dave’s Sub Shop’s two orders? Ans: Order 34561 has 3 items and order 34562 has one item for a total of 4 items in both orders. Who is responsible for distributing Dave’s Sub Shop’s orders? Ans: Hawkins Shipping Which products are included in Order 34562? Ans: 300 Vanilla Coke Primary Key Foreign Key
Operationer mot ett register Pull Ad Hoc eller fördefinierade frågor LÄSA EN ELLER FLER POSTER SELEKTERA UPPDATERA/ AKTUALISERA FÖRSÄKRINGAR ALLA ARTIKLAR SOM BÖRJAR PÅ S 1 SKRUV 5 SKIVA FÖRSNR 7 ALLA FÖRSÄKRINGAR NY KUND TA BORT KUND ADRESSÄNDRING Push MATCHA SORTERA PERS.REG PERS.REG F-KASSA BOSTADSBIDRAG Motor NR ART 1 SKRUV 2 MUTTER 3 VINKEL 4 BRÄDA 5 SKIVA NR ART 4 BRÄDA 2 MUTTER 5 SKIVA 1 SKRUV 3 VINKEL JFR INKOMST UPPGIFTER Hastighets- mätare
Database management systems Database management systems (DBMS) – software through which users and application programs interact with a database Discuss the two primary forms of user interaction with a database Direct interaction – The user interacts directly with the DBMS The DBMS obtains the information from the database Indirect interaction User interacts with an application (i.e., payroll application, manufacturing application, sales application) The application interacts with the DBMS
Database management systems Four components of a DBMS The components of the DBMS are discussed in detail on the following slides A DBMS contains: Data definition component – helps create and maintain the data dictionary and the structure of the database Data manipulation component – allows users to create, read, update, and delete information in a database Application generation component – includes tools for creating visually appealing and easy-to-use applications Data administration component – provides tools for managing the overall database environment by providing faculties for backup, recovery, security, and performance
Data Definition Component Data dictionary essentially defines the logical properties of the information that the database contains Logical properties displayed in the figure vary depending on the type of information A typical address field can accept numbers, letters, and special characters (Relational integrity constraint) The validation rule requiring that a discount cannot exceed 100 percent (Business-critical integrity constraint) CLASSROOM EXERCISE Properties of Logic Break your students into groups and assign a different logical property from the above figure to each group (Do not assign the Field Name property) Ask your students to define a few additional examples of the logical property they were assigned Have your students present their answers to the entire class
Data Manipulation Component Data manipulation component – allows users to create, read, update, and delete information in a database A DBMS contains several data manipulation tools: View – allows users to see, change, sort, and query the database content Report generator – users can define report formats Query-by-example (QBE) – users can graphically design the answers to specific questions Structured query language (SQL) – query language Views and report generators are the most common data manipulation tools used by non-IT personnel A query is a simple question such as “How many orders were placed today?” QBE tools are popular because users manipulate a drag-and-drop GUI to graphically build a question There are many different types of QBE tools including BRIO What benefits might you receive from using a tool such as BRIO? Without knowing a QBE tool, a person will have to wait for someone else to gather the information to answer their questions This could take days in some organizations SQL is rarely used by non-IT personnel
Data Manipulation Component Sample report using Microsoft Access Report Generator The above figure displays a sample report created with Microsoft Access An IT specialist is typically required to build the report Once the report is built, any user can run the report as frequently as they wish by simply clicking on a button
Data Manipulation Component Sample report using Access Query-By-Example (QBE) tool The above figure displays a QBE graphical query The above figure displays the call-outs that explain the fields in the query The results of the query are displayed in Figure on the next slide
Data Manipulation Component Results from the query in Figure 6.10 Explain to your students that the results from this query have not been placed in a report, hence they are not formatted and do not look visually appealing If this query was built into a report, the user could simply run the report, which would run the query, and display the results in a nice formatted report
SQL Structured Query Language Kommandon i program som används för data- åtkomst & manipulation Lägga in ny data INSERT person Ändra data UPDATE adress Radera data DELETE person Hämta data SELECT alla personer som är födda i Vingåker och som varit stadsministrar mfl
SQL-exempel INSERT INTO Medlem (medlemsnr, fnamn, enamn, gadr, lön) VALUES (10004, ’Anna’, ’Andersson’, ’Kungsgatan 3’, ’40.000’); DELETE Medlem WHERE medlemsnr = 10005; UPDATE Medlem SET gadr = ’Byggvägen 3’ WHERE medlemsnr = 10004; SELECT fnamn, enamn FROM Medlem WHERE lön > 1.000.000 ;
SQL – lite mer avancerat KUND-TABELL ORDER-TABELL KundNr OrderNr KundNamn KundNr KundKategori OrderBelopp SELECT KUNDNAMN, ORDERNR FROM KUND-TABELL, ORDER-TABELL WHERE ORDERBELOPP > 10.000 AND KUNDKATEGORI = 7 OR 11 <--- villkor AND KUND-TABELL.KundNr = ORDER-TABELL.KundNr <--- join SORT BY KUNDNAMN Resulterar i KundNamnn OrderNr Abelén 4711 Adolfsson 193 Bengtsson 10.023 ………..
Application Generation and Data Administration Components Application generation component – includes tools for creating visually appealing and easy-to-use applications Data administration component – provides tools for managing the overall database environment by providing faculties for backup, recovery, security, and performance IT specialists primarily use these components IT specialists directly interact with the data administration component Typically, higher level individuals oversee the use of the data administration component For example, the CPO is responsible for ensuring the ethical and legal use of information, therefore, he or she would direct the use of the security features of the data administration component and implement policies and procedures concerning who has access to different types of information
Integrating data among multiple databases Integration – allows separate systems to communicate directly with each other Forward integration – takes information entered into a given system and sends it automatically to all downstream systems and processes Backward integration – takes information entered into a given system and sends it automatically to all upstream systems and processes One of the biggest benefits of integration is that organizations only have to enter information into the systems once and it is automatically sent to all of the other systems throughout the organization This feature alone creates huge advantages for organizations because it reduces information redundancy and ensures accuracy and completeness Without integrations an organization would have to enter information into every single system that requires the information from marketing and sales to billing and customer service For example, customer information would have to be manually entered into the marketing, sales, ordering, inventory, billing, and shipping databases. (Each of these systems are separate and would have their own database – if the company doesn’t have a complete ERP installed.) Entering the same customer information into multiple systems is redundant, and chances of making a mistake in one of the systems is high Integrations offer many advantages, but for the most part, the automated flow of information among separate systems is the biggest benefit
Integrating data among multiple databases Forward and backward integration Identify the arrows along the top of the figure when explaining forward integrations Basically, all information flows forward along the business process Sales enters the information when it is negotiating the sale (looking for opportunities) The information is then passed to the order entry system when the order is actually placed The order fulfillment system picks the products from the warehouse, packs the products, labels boxes, etc Once the order is filled and shipped, the customer is billed What would happen if users could enter order information directly into the billing system? The systems would quickly become out-of-sync. There might be bills for nonexistent orders, or orders that do not have any bills (if someone deleted a bill) For this reason organizations typically place a business-critical integrity constraint on integrated systems: With a forward integration the information must be entered in the sales system, you could not enter information directly into the billing system Integrations are expensive to build and maintain Integrations are difficult to implement For these reasons many organizations only build forward integrations and use business-critical integrity constraints to ensure all information is always entered only at the start of the integration (one source of record) Identify the arrows along the bottom of the figure when explaining backward integrations Basically, all information flows backward along the business process Billing enters information and this information is passed back to the order system The order fulfillment system passes the information back to the order entry system The order entry system passes the information back to the sales system Why would an organization want to build both forward and backward integrations? This allows users to enter information at any point in the business process and the information is automatically sent upstream and downstream to all other systems For example, if order fulfillment determined that they could not fulfill an order (the product had been discontinued), they could simply enter this information into the database and it would be sent automatically upstream to the sales representative who could contact the customer and downstream to billing to remove the item from the bill
Integrating data among multiple databases Building a central repository specifically for integrated information The above figure displays an example of customer information integrated using this method Users can create, read, update, and delete in the main customer repository, and it is automatically sent to all of the other databases This method does not follow the business process when building the integrations Business-critical integrity constraints still need to be built to ensure information is only ever entered into the customer repository, otherwise the information will become out-of-sync
Att integrera system Företag A Företag B Format Tidpunkter Regler System B1 System A1 Logg System A2 System B2 System A3 Systeminegrator System B3 Företag B Format Tidpunkter Regler/Kontroller Historik Backup m.m. System C1 Istället för att bygga om de olika systemen System C2 Företag C
Övning i datamodellering FAKTURA BILVERKSTADEN AB uppdrag-nr 1 kund-nr 2 namn Olle adress O-gatan faktnr 2453 regnr ABC123 datum 95-01-10 Arbeten (Fördefinierade) Nr A1 Namn Byta avgasrör Á-pris 1.000 kr Antal 1 st Summa 1.ooo kr Material (Fördefinierade) Nr M1 M2 M3 M4 Namn Avgasrör Fäste Skruv Mutter Á-pris 500 kr 10 kr 1 kr 2 kr Antal 1 st 2 st 4 st Summa 500 kr 20 kr 4 kr 8 kr summa 532 kr moms 133 kr att betala 1.665 kr Gör en datamodell för denna verksamhet
Lösning UPPDRAG KUND UNR** KNR** KNR* NAMN DATUM ADRESS MATERIAL REGNR FAKTNR 1 M MATERIAL MNR** NAMN À-PRIS 1 ARBETEN I UPPDRAG UNR* ANR* ANTAL 1 M 1 M M MATERIAL I UPPDRAG UNR* MNR* ANTAL M 1 ARBETEN ANR** NAMN À-PRIS
Lösning, forts KUND KNR NAMN ADRESS 1 KALLE K-GATAN 2 OLLE O-GATAM 3 ULLA U-GATAN 4 SIV S-GATAN 5 INGVAR I-GATAN UPPDRAG UNR KNR DATUM REGNR FAKTNR 001 2 950110 ABC123 2453 002 4 950111 XYZ789 2454 ARBETEN ANR NAMN Á-PRIS A1 BYTA AVGASRÖR 1.000 A2 BYTA TÄNDSTIFT 200 A3 SERVICE 1.500 A4 MOTORTVÄTT 600
Lösning, forts MATER MNR NAMN Á-PRISD M1 AVGASRÖR 500 M2 FÄSTE 10 M3 SKRUV 1 M4 MUTTER 2 M5 TÄNDSTIFT 10 M6 TVÄTTMEDEL 100 ARBETEN UNR ANR ANTAL I UPPDR. 001 A1 1 002 A4 1 MATR UNR MNR ANTAL I UPPDR 001 M1 1 001 M2 2 001 M3 4 001 M4 4 002 M6 1
6.2. DATA WAREHOUSE FUNDAMENTALS Data warehouses extend the transformation of data into information In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations CLASSROOM OPENER GREAT BUSINESS DECISIONS – Bill Inmon – The Father of the Data Warehouse Bill Inmon, is recognized as the "father of the data warehouse" and co-creator of the "Corporate Information Factory." He has 35 years of experience in database technology management and data warehouse design. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for every major computing association and many industry conferences, seminars, and tradeshows. As an author, Bill has written about a variety of topics on the building, usage, and maintenance of the data warehouse and the Corporate Information Factory. He has written more than 650 articles, many of them have been published in major computer journals such as Datamation, ComputerWorld, DM Review and Byte Magazine. Bill currently publishes a free weekly newsletter for the Business Intelligence Network, and has been a major contributor since its inception. http://www.b-eye-network.com/home/
Data warehouse fundamentals Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes What is the primary difference between a database and data warehouse? The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository Data warehouses support only analytical processing (OLAP)
Data Warehouse
Data warehouse fundamentals Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse Data mart – contains a subset of data warehouse information The ETL process gathers data from the internal and external databases and passes it to the data warehouse The ETL process also gathers data from the data warehouse and passes it to the data marts
Data warehouse fundamentals The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETL It then send subsets of information to the data marts through the ETL process Ask your students to distinguish between a data warehouse and a data mart? Ans: A data warehouse has an enterprisewide organizational focus, while a data mart focuses on a subset of information for a given business unit such as finance
Multidimensional Analysis Cube – common term for the representation of multidimensional information Users can slice and dice the cube to drill down into the information Cube A represents store information (the layers), product information (the rows), and promotion information (the columns) Cube B represents a slice of information displaying promotion II for all products at all stores Cube C represents a slice of information displaying promotion III for product B at store 2 CLASSROOM EXERCISE Analyzing Multiple Dimensions of Information Jump! is a company that specializes in making sports equipment, primarily basketballs, footballs, and soccer balls. The company currently sells to four primary distributors and buys all of its raw materials and manufacturing materials from a single vendor. Break your students into groups and ask them to develop a single cube of information that would give the company the greatest insight into its business (or business intelligence) given the following choices: Product A, B, C, and D Distributor X, Y, and Z Promotion I, II, and III Sales Season Date/Time Salesperson Karen and John Vendor Smithson Remember you can pick only 3 dimensions of information for the cube, they need to pick the best 3 Product Promotion These give the three most business-critical pieces of information
Information Cleansing or Scrubbing Information cleansing allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse
Multidimensional Analysis Data mining – the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making Data mining can begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down), or the reverse (drilling up) Data-mining tools include query tools, reporting tools, multidimensional analysis tools, statistical tools, and intelligent agents Ask your students to provide an example of what an accountant might discover through the use of data-mining tools Ans: An accountant could drill down into the details of all of the expense and revenue finding great business intelligence including which employees are spending the most amount of money on long-distance phone calls to which customers are returning the most products Could the data warehousing team at Enron have discovered the accounting inaccuracies that caused the company to go bankrupt? If the did spot them, what should the team have done?
Business intelligence Business intelligence – information that people use to support their decision-making efforts Principle BI enablers include: Technology People Culture Technology Even the smallest company with BI software can do sophisticated analyses today that were unavailable to the largest organizations a generation ago. The largest companies today can create enterprisewide BI systems that compute and monitor metrics on virtually every variable important for managing the company. How is this possible? The answer is technology—the most significant enabler of business intelligence. People Understanding the role of people in BI allows organizations to systematically create insight and turn these insights into actions. Organizations can improve their decision making by having the right people making the decisions. This usually means a manager who is in the field and close to the customer rather than an analyst rich in data but poor in experience. In recent years “business intelligence for the masses” has been an important trend, and many organizations have made great strides in providing sophisticated yet simple analytical tools and information to a much larger user population than previously possible. Culture A key responsibility of executives is to shape and manage corporate culture. The extent to which the BI attitude flourishes in an organization depends in large part on the organization’s culture. Perhaps the most important step an organization can take to encourage BI is to measure the performance of the organization against a set of key indicators. The actions of publishing what the organization thinks are the most important indicators, measuring these indicators, and analyzing the results to guide improvement display a strong commitment to BI throughout the organization.
Data mining Common forms of data-mining analysis capabilities include: Cluster analysis Association detection Statistical analysis Can you explain the difference between cluster analysis, association detection, and statistical analysis? Cluster analysis - a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis Cluster analysis, association detection, and statistical analysis are covered in detail over the next few slides
Cluster Analysis Cluster analysis – a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible CRM systems depend on cluster analysis to segment customer information and identify behavioral traits Some examples of cluster analysis include: Consumer goods by content, brand loyalty or similarity Product market typology for tailoring sales strategies Retail store layouts and sales performances Corporate decision strategies using social preferences Control, communication, and distribution of organizations Industry processes, products, and materials Design of assembly line control functions Character recognition logic in OCR readers Data base relationships in management information systems
Association Detection Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information Market basket analysis – analyzes such items as Web sites and checkout scanner information to detect customers’ buying behavior and predict future behavior by identifying affinities among customers’ choices of products and services Maytag uses association detection to ensure that each generation of appliances is better than the previous generation Maytag’s warranty analysis tool automatically detects potential issues, provides quick and easy access to reports, and performs multidimensional analysis on all warranty information
Statistical Analysis Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis Forecast – predictions made on the basis of time-series information Time-series information – time-stamped information collected at a particular frequency Kraft uses statistical analysis to assure consistent flavor, color, aroma, texture, and appearance for all of its lines of foods Kraft evaluates every manufacturing procedure, from recipe instructions to cookie dough shapes and sizes to ensure that the billions of Kraft products that reach consumers each year taste great (and the same) with every bite Nestle Italiana uses data mining and statistical analysis to determine production forecasts for seasonal confectionery products The company’s data-mining solution gathers, organizes, and analyzes massive volumes of information to produce powerful models that identify trends and predict confectionery sales