Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Mining Data Mining Spring 2012. Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.

Similar presentations


Presentation on theme: "Association Mining Data Mining Spring 2012. Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database."— Presentation transcript:

1 Association Mining Data Mining Spring 2012

2 Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

3 Item = {Milk}, {Cheese}, {Bread}, etc. Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk} Doesn’t have to be in the dataset Can be of size 1 – n Items and Itemsets Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

4 The Support Measure 

5 Support Examples Support({Eggs}) = 3/5 = 60% Support({Eggs, Milk}) = 2/5 = 40% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

6 Minimum Support Minsup – The minimum support threshold for an itemset to be considered frequent (User defined) Frequent itemset – an itemset in a database whose support is greater than or equal to minsup. Support(X) > minsup = frequent Support(X) < minsup = infrequent

7 Minimum Support Examples Minimum support = 50% Support({Eggs}) = 3/5 = 60%  Pass Support({Eggs, Milk}) = 2/5 = 40%  Fail Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

8 Association Rules 

9 Confidence Example 1 {Eggs} => {Bread} Confidence = sup({Eggs, Bread})/Sup({Eggs}) Confidence = (1/5)/(3/5) = 33% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

10 Confidence Example 2 {Milk} => {Eggs, Cheese} Confidence = sup({Milk, Eggs, Cheese})/sup({Milk}) Confidence = (2/5)/(3/5) = 66% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

11 Strong Association Rules Minimum Confidence – A user defined minimum bound on confidence. (Minconf) Strong association rule – a rule X=>Y whose conf > minconf. - this is a potentially interesting rule for the user. Conf(X=>Y) > minconf = strong Conf(X=>Y) < minconf = uninteresting

12 Minimum Confidence Example Minconf = 50% {Eggs} => {Bread} Confidence = (1/5)/(3/5) = 33%  Fail {Milk} => {Eggs, Cheese} Confidence = (2/5)/(3/5) = 66%  Pass

13 Association Mining Association Mining: - Finds strong rules contained in a dataset from frequent itemsets. Can be divided into two major subtasks: 1. Finding frequent itemsets 2. Rule generation

14 Some algorithms change items into letters or numbers Numbers are more compact Easier to make comparisons Transactional Database Revisited Transactional dataset 123 35 2714 68 86132

15 Basic Set Logic Subset – a subset itemset X is contained in an itemset Y. Superset – a superset itemset Y contains an itemset X. example: X = {1,2} Y = {1,2,3,5} Y X

16 Apriori  Arranges database into a temporary lattice structure to find associations  Apriori principle – 1. itemsets in the lattice with support < minsup will only produce supersets with support < minsup. 2. the subsets of frequent itemsets are always frequent.  Prunes lattice structure of non-frequent itemsets using minsup.  Reduces the number of comparisons  Reduces the number of candidate itemsets

17 Monotonicity Monotone (upward closed) - if X is a subset of Y, then support(X) cannot exceed support(Y). Anti-Monotone (downward closed) - if X is a subset of Y, then support(Y) cannot exceed support(X). Apriori is anti-monotone. - uses this property to prune the lattice structure.

18 Itemset Lattice

19 Lattice Pruning

20 Lattice Example 12345 24 124 14 Count occurrences of each 1-itemset in the database and compute their support: Support = #occurrences/#rows in db Prune anything less than minsup = 30%

21 Lattice Example 12345 24 124 14 12345 24 124 14 12345 24 124 14 Count occurrences of each 2-itemset in the database and compute their support Prune anything less than minsup = 30%

22 Lattice Example ABCDE BD ABD AD Count occurrences of the last 3-itemset in the database and compute its support. Prune anything less than minsup = 30%

23 Example - Results 12345 24 124 14 Frequent itemsets: {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}

24 Apriori Algorithm

25 Frequent Itemset Generation ItemsetSupportFrequent {1}75%Yes {2}50%No {3}75%Yes {4}25%No {5}100%Yes Transactional Database 12345 235 135 15 1.Minsup = 70% 2.Generate all 1-itemsets 3.Calculate the support for each itemset 4.Determine whether or not the itemsets are frequent

26 Frequent Itemset Generation ItemsetSupportFrequent {1,3}50%Yes {1,5}75%Yes {3,5}75%Yes Transactional Database 12345 235 135 15 Generate all 2-itemsets, minsup = 70% {1} U {3} = {1,3}, {1} U {5} = {1,5} {3} U {5} = {3,5}

27 Frequent Itemset Generation ItemsetSupportFrequent {1,3,5}50%Yes Transactional Database 12345 235 135 15 Generate all 3-itemsets, minsup = 70% {1,3} U {1,5} = {1,3,5}

28 Frequent Itemset Results All frequent itemsets generated are output: {1}, {3}, {5} {1,3}, {1,5}, {3,5} {1,3,5}

29 Apriori Rule Mining

30 Rule Combinations: 1. {1,2} 2-itemsets {1}=>{2} {2}=>{1} 2. {1,2,3} 3-itemsets {1}=>{2,3} {2,3}=>{1} {1,2}=>{3} {3}=>{1,2} {1,3}=>{2} {2}=>{1,3}

31 Strong Rule Generation RuleConfidenceStrong {1}=>{3}No {3}=>{1}No {1}=>{5}Yes {5}=>{1}No {3}=>{5}Yes {5}=>{3}No Transactional Database 12345 235 135 15 1.I = {{1}, {3}, {5}} 2.Rules = X => Y 3.Minconf = 80%

32 Strong Rule Generation RuleConfidenceStrong {2}=>{3,5}Yes {3,5}=>{2}No {2,3}=>{5}Yes {5}=>{2,3}No {2,5}=>{3}Yes {3}=>{2,5}No Transactional Database 12345 235 135 15 1.I = {{1}, {3}, {5}} 2.Rules = X => Y 3.Minconf = 80%

33 Strong Rules Results All strong rules generated are output: {1}=>{5} {3}=>{5} {2}=>{3,5} {2,3}=>{5} {2,5}=>{3}

34 Other Frequent Itemsets Closed Frequent Itemset – a frequent itemset X who has no immediate supersets with the same support count as X. Maximal Frequent Itemset – a frequent itemset whom none of its immediate supersets are frequent.

35 Itemset Relationships Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets

36 Targeted Association Mining

37 * Users may only be interested in specific results * Potential to get smaller, faster, and more focused results * Examples: 1. User wants to know how often only bread and garlic cloves occur together. 2. User wants to know what items occur with toilet paper.

38 Itemset Trees * Itemset Tree: - A data structure which aids in users querying for a specific itemset and it’s support. * Items within a transaction are mapped to integer values and ordered such that each transaction is in lexical order. {Bread, Onion, Garlic} = {1, 2, 3} * Why use numbers? - make the tree more compact - numbers follow ordering easily

39 Itemset Trees An Itemset Tree T contains: * A root pair (I, f(I)), where I is an itemset and f(I) is its count. * A (possibly empty) set {T 1, T 2,..., T k } each element of which is an itemset tree. * If I j is in the root, then it will also be in The root’s children * If I j is not in the root, then it might be in the root’s children if: first_item(I) < first_item(I j ) and last_item(I) < last_item(I j )

40 Building an Itemset Tree Let c i be a node in the itemset tree. Let I be a transaction from the dataset Loop: Case 1: c i = I Case 2: c i is a child of I - make I the parent node of c i Case 3: c i and I contain a common lexical overlap i.e. {1,2,4} vs. {1,2,6} - make a node for the overlap - make I and c i it’s children. Case 4: c i is a parent of I - Loop to check c i ’s children - make I a child of c i Note: {2,6} and {1,2,6} do not have a Lexical overlap

41 Itemset Trees - Creation Dataset 24 1235 39 126 2 29

42 Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.

43 Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.

44 Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.

45 Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Lexical overlap

46 Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Parent node.

47 Itemset Trees - Creation Dataset 24 1235 39 126 2 29 Child node.

48 Itemset Trees – Querying Let I be an itemset, Let c i be a node in the tree Let totalSup be the total count for I in the tree For all s.t. first_item(c i ) < first_item(I): Case 1: If I is contained in c i. - Add support to totalSup. Case 2: If I is not contained and last_item(c i ) < last_item(I) - proceed down the tree

49 Example 1

50 Itemset Trees - Querying Querying Example 1: Query: {2} totalSup = 0

51 Itemset Trees - Querying Querying Example 1: Query: {2} 2 = 2 Add to support: totalSup = 3

52 Itemset Trees - Querying Querying Example 1: Query: {2} 1,2 contains 2 Add to support totalSup = 3 + 2 = 5

53 Itemset Trees - Querying Querying Example 1: Query: {2,9} 3 > 2, and end of Subtree. Return support totalSup = 5

54 Example 2

55 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0

56 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0 2 < 2 2 < 9 continue

57 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0 2 < 2 4 < 9 {2,4} doesn’t contain {2,9}, go to next sibling

58 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 {2,9} = {2,9} Add to support!

59 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 2 < 9 continue

60 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 5 < 9 {1,2,3,5} doesn’t contain {2,9}, go to next sibling

61 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 6 < 9 {1,2,6} doesn’t contain {2,9}, go to next node

62 Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 3 < 2 <= fail 9 < 9 End of tree, totalSupp = 1 Nodes = 8


Download ppt "Association Mining Data Mining Spring 2012. Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database."

Similar presentations


Ads by Google