Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

Main Menu

Journal Search

[

Research article

]

The Transactions of the Korean Institute of Electrical Engineers

KIEE Vol. 68, No. 11, p.1417-1424

ISSN (print) :

1975-8359

ISSN (online) :

2287-4364

Received : 24 September 2019Accepted : 14 October 2019

DOI :

http://doi.org/10.5370/KIEE.2019.68.10.1169

BHUI: A New Bitwise Operation-based High Utility Itemset Mining

BHUI: 이진 논리 연산을 활용한 하이 유틸리티 마이닝 기법에 대한 연구

황정희 (Jeong-Hee Hwang) ¹iD 지정희 (Jeonghee Chi) ²iD 박소영 (Soyoung Park) ^†iD

(Dept. of Computer Software, Namseoul University, Korea)
(Dept. of Software, Konkuk University, Korea)

^†Corresponding Author: Department of Software, Konkuk University, Korea E-mail : soyoungpark@konkuk.ac.kr

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.kiee.or.kr).

Abstract

Pattern mining finding frequent patterns are widely used to search useful information from large scale data. General pattern mining, however, have the problem of not applying the characteristics of the individual items because all items have the same importance. High utility itemset mining that solves this problem retrieves meaningful information by applying a relative weight to each item. In this paper, we propose a new high utility itemset mining algorithm based on bitwise operations. The bitwise operation predetermines the likelihood of joining two itemsets to reduce the join operation. Since the join operation is the most time consuming operation in the high utility itemset mining, the proposed algorithm significantly reduces the mining time. We analyzed the performance of the proposed algorithm with execution time and the number of generated utility itemsets and compared the performance with HUI-Miner. Our experimental results show that the proposed algorithm outperforms HUI-Miner.

Key words

High utility mining, Utility mining, Bitwise operation, Utility list structure

1. 서 론

Frequent pattern mining to search useful and important information from large scale data can be applied to a variety of applications including genome analysis, condition monitoring, cross marketing, inventory prediction, e-commerce personalization, and so on ^(1-³⁾. Frequent pattern means useful data of interest in an organization, the frequency of which satisfies a given frequency threshold. For an example, it can be used to find a bundle of items purchased together more frequently in the transaction database of shopping cart. However, from a business perspective, a combination of products leading higher profits or revenues is more interesting. That is, there may be a combination of more profitable purchase items even if the frequency of the combination is relatively low. In this case, not only the frequency, but also the quantity or profit of item should be taken into account in the purchase pattern mining ^(3,⁴⁾.

In order to address the limitations of the frequent itemset mining, high utility itemset (HUI) mining ^(4-¹⁴⁾ has been proposed. In the HUI mining, the purchase quantities are taken into account as well as the unit profit of each item. Hence, the HUI mining finds itemsets (group of items) that generate a high profit in a database, when they are sold together. When a minimum utility threshold is giv`en, the HUI mining outputs all the high utility itemsets that generate at least the given minimum profit. The generalized notion of utility can be applied to diverse areas including network traffic, web-server log, sensor network data, records of communication and calls, and so on. In particular, in the emerging local energy market system ⁽¹⁵⁾ that enables energy trading between users at self-assessed prices, users can receive energy from a variety of energy sources including local renewable energy prosumers, local energy producers, central energy supply, and so on. Consumers can find a combination of energy sources to minimize energy use costs using the HUI mining. The central energy supply can determine a set of distributed energy providers to supply balanced energy to a local community using the HUI mining.

In the HUI mining, super-patterns of ineffective patterns can be effective patterns or high utility itemsets. Therefore, it is hard to apply anti-monotone property into the HUI mining. Existing HUI mining algorithms ^(8,⁹⁾ find candidate itemsets that have potential to become high utility itemsets in the first step, and then, high utility itemsets are identified from the candidates by scanning raw data more than once in the second step. In order to determine candidate itemsets, utility calculation for all super-patterns (or joined itemsets) of joinable itemsets is performed. And then, high utility itemsets are identified by real utility values on the candidates in the second step. However, it degrades mining performance since a large number of candidate itemsets are extracted.

In this paper, we propose an efficient bitwise operation-based high utility itemset mining algorithm using the utility list structure (ULS). In the proposed scheme, item information contained in every transaction is represented in bits by one-time database scanning. Then, a bitwise operation on the bit representation determines the join likelihood of two itemsets immediately. Thus, the cost for unnecessary joins is significantly reduced. In addition, the utility calculation on a joined itemset is performed only when two joinable itemsets are actually joined. At this time, since the predetermined utility values are stored in the utility list structure, the utility calculation can be performed very quickly.

The rest of this paper is organized as follows. In Section 2, related studies are described. The proposed Bitwise High Utility Itemset (BHUI) mining algorithm is given in Section 3. The performance analysis and experimental results are described in Section 4. Finally, we conclude our paper in Section 5.

2. Related Works

In the previous studies on high utility mining, UMining that used branching-out strategy based on the maximum values of utility and UMining_H that used branching-out based on heuristic approach were suggested ⁽¹²⁾. However, they had the problems to generate lots of unnecessary patterns during the process of candidate generations. Two-phase algorithm that has been widely applied in utility mining is the method to seek the item groups with high utility efficiently based on Apriori. In UP-Growth+ algorithm by ⁽¹¹⁾ based on utility models of ^(4,⁵⁾, Apriori attributes were applied using the concept of Transaction Weighted Utility (TWU). It is processed by the ways that TWU values of all the item groups with 1-item are calculated upon scanning database and TWU item groups with 2-item (itemset by 1-items join) are generated. In the last scanning, the item groups with the highest utility are found from high TWU item groups. However, it has the problem to generate lots of database, scanning frequency, and unnecessary candidate item groups. ⁽¹³⁾ and ⁽¹⁴⁾ improved the performance of UP-Growth which was an existing algorithm using pruning method. Yet, they kept the basis of two-phase. ⁽¹⁶⁾ proposed the mining method of high utility pattern without generation of candidate items. The performance was improved by pruning from utility upper bounding and searching reverse set enumeration tree with depth-first way. In utility itemset mining, utility values can be increased or decreased when the length of itemset is expanded. Therefore, it is hard to use the pruning strategy applying the characteristics of downward closure. In Two-Phase algorithm ⁽⁸⁾, overestimation concept was applied to meet the anti-monotone property with the improved algorithm on the limitation of high utility mining using Transaction-Weighted Downward Closure (TWDC) which was a monotone decreasing characteristic that the number of transactions including the relevant itemset was smaller or the same as before adding items as the itemset were expanded on TWU values. That means the patterns to meet the threshold are found by calculating actual utility upon scanning database after finding candidate patterns by Apriori, using the monotone decreasing characteristic of TWU. Also, ⁽⁹⁾ proposed IHUP algorithm that searched the patterns rapidly by application of Two-Phase into FP-Growth algorithm. To solve the cost issue according to the database scanning, HUI-Miner ⁽¹⁷⁾ used the list structure to be organized with transaction-id, item-utility, and remaining utility on each item. Since list information on each item and actual utility values were used in a way of pattern expansion by item joins, candidate items could be lessened compared to the method to use estimated utility resulting in the lowering database scanning frequency. For efficient pruning, an algorithm ⁽¹⁸⁾ was proposed using triangle matrix structure. It improved the performance by lowering the join operation frequency because it determined the possibility of join upon organizing it with estimated utility co-occurrence structure matrix on 2-item that met the threshold to prevent from generating unpromising candidate items. BAHUI⁽¹⁹⁾ used a bitmap representation to retrieve and count the itemsets. The algorithm generated high transaction-weighted utilization itemset(HTWUI)s with maximal length from the bitmap and searched the subsets of them. However, the BAHUI still needs to scan the database repeatedly to calculate the actual utility of itemsets. In contrast, the proposed scheme, which combines both advantages of the bitwise operation and utility list structure, does not need to scan the database whenever calculates the actual utility values of the joinable itemsets because the utility list structure keeps previously calculated actual utility values.

3. A Bitwise Operation-based High Utility Itemset Mining (BHUI) Algorithm

We provide definitions to describe the basic concept of high utility itemset mining, and then, describe the proposed utility list structure and bitwise operation-based BHUI algorithm in detail.

3.1 Problem Definition

Database D = {T1,T2,..,Tn} is a set of transactions, and I = {i1,i2,..,il} is a set of all items used in D. Each transaction consists of a set of items, Ti={i1,i2,..,im}. Each transaction is identified with an individual identifier, and each item ip included in the transaction Tj is characterized with the purchase quantity and profit value. The purchase quantity, which is denoted as q(ip,Tj), represents the total quantity of ip in Tj. The profit value, which is described as pr(ip), means the profit of ip. An itemset X={i1,i2,..,ik} is a set of joined items. If the length of X is k, then it is called k-item.

Table 1. A transaction database

TID	a	b	c	d	e	f	g	tu
T1	1			2			1	9
T2	2	2	1	2				22
T3		3	1	2	2			28
T4	2			1		1		10

Table 2. Profit value

Item	a	b	c	d	e	f	g
Price	3	6	2	1	3	3	4

Table 1 shows an example of database consisting of 4 transactions and 7 items. Table 2 shows the profit value of each item. We use this example to explain our proposed high utility itemset (HUI) mining algorithm.

Definition 1 (Item utility): Utility of an item $i_{p}$ in a transaction $T_{j}$ is denoted by $iu(i_{p},\:T_{j})$ and defined as the following equation (1):

(1)

$iu(i_{p},\:T_{j})= q(i_{p},\:T_{j})\times pr(i_{p})$

Definition 2 (Itemset utility): Utility of an itemset $X$ in a transaction $T_{j}$ is denoted by $iu(X,\: T_{j})$ and defined as the following equation (2):

(2)

$$ i u\left(X, T_{j}\right)=\sum_{i_{p} \in X \wedge X \subseteq T_{j}} i u\left(i_{p}, T_{j}\right) $$

For instance, if itemset X={a,b}, iu(ab, T2) = iu(a,T2) + iu(b,T2) = 6 + 12 = 18.

Definition 3 (Sum of item utility): Utility of an itemset X in the database D is denoted by iu(X) and defined as the following equation (3):

(3)

$$ i u(X)=\sum_{X \subseteq T_{j} \wedge T_{j} \in D} i u\left(X, T_{j}\right) $$

Definition 4 (Remaining utility): Items following the itemset that belongs to $X$ in a transaction $T_{j}$ are the remaining items. The utility of the remaining items is denoted by $ru(X,\: T_{j})$ and defined as the following equation (4):

(4)

$$ r u\left(X, T_{j}\right)=\sum_{X \subseteq T_{j} \wedge i_{p} \in R} i u\left(i_{p}, T_{j}\right), $$

where $R$ is a set of items following $X$. For instance, $ru(bcd,\: T_{3})$$= iu(e,\: T_{3})= 6$.

Definition 5 (Transaction utility): Transaction utility of $T_{j}$ is denoted by $tu(T_{j})$ and defined as the following equation (5):

(5)

$$tu(T_{j})=\sum_{i_{p} \in R} i u\left(i_{p}, T_{j}\right)$$

For instance, T4 = {a,d,f}, tu(T4) = iu(a,T4) + iu(d,T4) + iu(f,T4) = 6 + 1 + 3 = 10.

Definition 6 (Transaction weighted utility): Transaction weighted utility of $X$ in $D$ is the total transaction utility of all transactions containing $X$. It is denoted by $TWU(X)$ and defined as the following equation (6):

(6)

$$ T W U(X)=\sum_{X \subseteq T_{j} \wedge T_{j} \in D} t u\left(T_{j}\right) $$

3.2 BHUI Algorithm

Now, we describe our BHUI algorithm. The core function of BHUI is to find all set of high utility itemsets, which is denoted as HUI. HUI is initialized as ∅. Without loss of generality, the sum of all transaction utilities in $D$, which is $TU(D)=\sum tu(T_{j})$ for all $T_{j}\in D$, is the utility of $D$. Min_util is a minimum threshold to become a high utility itemset and it is determined as min_util = $TU(D)\times\delta$ for a predefined threshold value $\delta$. $\delta$ can be either systemically predefined or given from an expert user. Suppose that $\delta$ is 0.3. Thus, min_util = $69 * 0.3 = 20.7$ in our example.

First of all, the proposed algorithm initially calculates the $TWU(i_{p})$ for each $i_{p}∈ I$ in order to discard items that do not satisfy min_util. This is to find high utility itemsets with a length of 1, called 1-item. In our example, $TWU(a)= tu(T_{1})+$ $tu(T_{2})+ tu(T_{4})= 41$, $TWU(b)= 50$, $TWU(c)= 50$, $TWU(d)= 69$, $TWU(e)= 28$, $TWU(f)= 10$, and $TWU(g)= 9$. As a result, {e}, {a}, {c}, {b} and {d} are determined as high utility itemsets of length 1, and added to HUI. For the chosen itemsets, $tu(T_{i})$ of each transaction $T_{i}$ is reconstructed as shown in Table 3. $tu(T_{1})$ are $tu(T_{4})$ are updated because $f$ and $g$ are discarded from items.

Table 3. Reconstructed database

TID	e	a	c	b	d	tu
T1		3			2	5
T2		6	2	12	2	22
T3	6		2	18	2	28
T4		6			1	7

Fig. 1. ULS of 1-item

Once high utility itemsets are determined, the utility list structure (ULS) of each itemset is generated. As shown in Fig. 1, the ULS of each itemset consists of four attributes: tid, iu, ru, and cu. For an itemset X, tid indicates a transaction identifier containing X, iu is determined by iu(X, Ttid), ru is determined by ru(X, Ttid), and cu denotes a common utility of X that is defined as follows:

Definition 7 (Common utility): Common utility of an itemset $X^{k}$ of length $k$ is denoted by $cu(X^{k})$ and defined as $iu(X^{k-1})$, where $X^{k-1}$ consists of the first $k-1$ items of $X^{k}$. $cu(X)$ for $X$ with a length of 1 is initialized as 0.

For instance, $cu(ab)$ of $X=\{a,\:b\}$ is $iu(a)$.

The ULS is created in an ascending order of $TWU(X)$, such as $e < a < c < b < d$. Once ULS constructing for initial high utility itemset is completed, the proposed high utility itemset mining (HUI_Mining) algorithm is performed repeatedly. The mining algorithm finds the next level of high utility itemsets. That is, 2-item sets that are itemsets of length 2 are generated by joining (or combining) two 1-item itemsets. A joined itemsest $X$ of $X_{i}$ and $X_{j}$ is defined as $X = X_{i}∪ X_{j}$. The join process is also peformed in an ascending order of TWU. Since $TWU(e)< TWU(a)< TWU(c)< TWU(b)< TWU(d)$, all possible combinations for 2-item are $\{e,\:a\}$, $\{e,\:c\}$, $\{e,\:b\}$, $\{e,\:d\}$, $\{a,\:c\}$, $\{a,\:b\}$, $\{a,\:d\}$, $\{c,\:b\}$, $\{c,\:d\}$, and $\{b,\:d\}$.

At this point, constructing ULS of all possible itemsets of 2-item is very time consuming. If we can extract itemsets that have high join possibility, which are called joinable itemsets, the HUI mining time will be reduced. The join possibility indicates the likelihood that the joined itemset will be a high utility itemset. In order for a joined itemset to become a high utility itemset, a common transaction, which contains all of the joined items, must exist. Thus, the suggested strategy is to use bitwise AND operation (∧) to check if there is a common transaction involving the joined itemset. To do this, for an itemset $X$, a bit representation for all transactions is required. Let $b(X)$ be the bit representation of $X$. As shown in Fig. 2, if a transaction containing $X$ is set to 1, otherwise, it is set to 0. Consequently, a joined itemset $X$ of $X_{i}$ and $X_{j}$ is a joinable itemset if $b(X_{i})∧ b(X_{j})≥ 1$. In addition, if $TWU(X)≥$min_util, then the ULS of $X$ is finally constructed.

Fig. 2. Example of Bitwise AND operation and n-item utility list structure

We explain the proposed HUI mining process with our examples. Fig. 2(a) shows bit representations of $\{e\}$ and $\{c\}$, and $b(e)∧ b(c)= 0010 ≥ 1$. Hence, a joined itemset $\{e,\: c\}$ is a joinable itemset. After the bit operation, common transactions that are corresponding to 1 in the bitwise AND result are determined. Since only $T_{3}$ is 1 in the bitwise AND result, $TWU(ec)$ is calculated only for $T_{3}$. Thus, $TWU(ec)= tu(T_{3})=$ $28 ≥$min_util. So, ULS of $\{e,\: c\}$ is constructed. $iu(ec,\: T_{3})= 8$, $ru(ec,\: T_{3})= 20$, and $cu(ec)= iu(e)= 6$. $iu(e)$ can be obtained from the ULS of $\{e\}$. Here, since $iu(ec)= 8 ≤$min_util, $\{e,\: c\}$ is not a high utility itemset. However, since $iu(ec,\: T_{3})+ ru(ec,\:$ $T_{3})= 28 ≥$min_util, it is still a joinable itemset. In Fig. 2(b), since $b(c)∧ b(b)= 0110 ≥ 1$, $\{c,\: b\}$ is a joinable itemset. Since common transactions are $T_{2}$ and $T_{3}$, $TWU(cb)= tu(T_{2})+$ $tu(T_{3})= 50 ≥$min_util. Therefore, ULS of $\{c,\: b\}$ is created. Since $iu(cb)= iu(cb,\: T_{2})+ iu(cb,\: T_{3})= 14 + 20 = 34 ≥$min_util, it is added to SHU and joinable itemsets. In Fig. 2(c), $b(b)∧$ $b(d)$$= 0110 ≥ 1$ and $TWU(bd)= tu(T_{2})+ tu(T_{3})= 50 ≥$min_util. ULS of $\{b,\: d\}$ is constructed. Since $iu(bd)= 34 ≥$min_util4, $\{b,\: d\}$ is added to SHU and joinable itemsets. Fig. 2(d) shows the generation of ULS of 3-item $\{e,\:c,\:b\}$ from $\{e,\: c\}$ and $\{c,\: b\}$. $b(ec)$$∧ b(cb)= 0010 ≥ 1$ and $TWU(ecb)= tu(T_{3})= 28 ≥$min_util. Since $iu(ecb)= iu(ec)+ iu(cb)- cu(cb)= 8 + 20 - 2 = 26 ≥$min_util, it is added to SHU and joinable itemsets. $cu(ecb)$ is determined as $cu(ecb)= iu(ec)=8$. In 3-item $\{c,\:b,\:d\}$ of Fig. 2(e), $b(cb)∧$ $b(bd)= 0110$, common transactions $T_{2}$ and $T_{3}$ exist. Since $iu(cbd)= iu(cb)+ iu(bd)- cu(bd)= 34 + 34 - 30 = 38 ≥$min_util$, it is added to HUI and joinable itemsets. Fig. 2(f) shows ULS of 4-item $\{e,\:c,\:b,\:d\}$. Since $iu(ecbd)= iu(ecb)+ iu(cbd)- cu(cbd)=$$26$ $+ 22 - 20 = 28 ≥$min_util, it is a high utility itemset and a joinable itemset, as well. $cu(ecbd)= iu(ecb)= 26$. The Mining process stops since no common transactions exist by bitwise AND operation with 4-item $\{e,\:c,\:b,\:d\}$ and $\{a,\:c,\:b,\:d\}$, which are high utility itemsets.

The detailed descriptions of the BHUI and HUI_mining algorithms are given below. BHUI gets two parameters D and min_util as inputs, and initiates our mining process. BHUI initially scans D to create a bitmap to indicate whether each item exists in the transaction, and then, computes the TWU and item utility of each item in I. Then, the items are sorted by TWU in an ascending order. The TWU of each item is used to discard unpromising items from further processing. If TWU of each sorted itemset is greater than min_util, then ULS of the itemset is constructed. The ULS of 1-item are used to explore the search space and to mine the next level of high utility itemsets. After ULS of 1-item is constructed, the HUI_mining is performed repeatedly so as to mine a set of high utility itemsets (HUI). It performs bitwise AND operation for two itemsets in ULS to create a joined itemset and checks if the joined itemset satisfies min_util. It creates ULS of meaningful joined itemsets and outputs new high utility itemsets. The HUI_mining is repeated until no more joined itemset is created. Finally, HUI includes all high utility itemsets.

4. Performance evaluation

In this section, we analyze the performance of the proposed BHUI algorithm by comparing with HUI-Miner algorithm ⁽¹⁷⁾. The experiments were conducted on window 10, 16GB RAM, 3.40GHz CPU system.

4.1 Experimental Environment

We used four types of synthetic dataset, such as T10I1000D100K, T10I2000D100K, T10I3000D100K and T10I 4000D100K, to evaluate the performance of the proposed algorithm. The datasets were generated by the data generator of IBM Almaden Quest Research Group, which are widely employed in the pattern mining area. The datasets do not have their own external and internal utility information, so we have set the utility values randomly between 1 and 100, as in the previous studies ^(3,^5,^6,⁷⁾.

The dataset used in the experiments were T10I1000D100K, T10I2000D100K, and T10I3000D100K generated by data generator of IBM Almaden Quest Research Group and those generated by adding 1 to 100 randomly to the utility values by items to T10I4000D100K data. Parameters for the experiments are described in Table 4. Number of transactions by each dataset is 100K and the number of distinct items varies from 1K to 4K. Average length of transactions is 10. We experimented it upon changing minimum utility threshold s from 0.01% to 0.1%.

Table 4. Simulation parameters

Parameter	Description	Value
D	Number of transactions	100K
T	Average length of transactions	10
I	Number of distinct items	1K ~ 4K
s	Minimum utility thresholds	0.01%~0.1%

4.2 Experimental Results

A ratio of 1-item with high utility in the whole dataset was reviewed first. The ratio of 1-item with high utility affects join count to influence the number of high utilities and performance time accordingly. Hence, the ratio of 1-item with high utility was investigated among the whole dataset.

Fig. 3. Ratio of 1-item with high utility in the all dataset

Fig. 3 shows the experimental results. In the case of 1K, a relatively small number of items, 75.8% of total items were shown as high utility items in low minimum utility s=0.01%, and 38.1% of them in s=0.1%. In addition, 43.1% of them were shown as high utility items in s=0.01%, and 6.3% of them in s=0.1% in the case of the 4K dataset, a large number of items. This experimental result, those on data in case of 100K of total transactions, demonstrates a rapid lowering number of high utility items as the number of items is increasing.

The number of join counts by a number of items was compared as shown in Figure 4. This showed the results in case of minimum utility, s=0.01%. HUI and BHUI showed the difference in 44*103 in case of 1K that the number of items is small and 1.2*106 in case of 4K that the number of items is large. As the number of items is increased, the share of total 1-item high utility items is lowered, however, the total number itself is increased, hence, the join counts among high utility items are increased. Yet, since the BHUI technique proposed in this paper saves the transaction information of each item by bit, the comparison not on the total transactions but on the items with common transactions only is executed, which saves join counts, significantly. Figure 5 shows the number of high utility patterns from each dataset in case of minimum utility, s=0.01%. The results show relatively large number of high utility patterns in the data of item number 2K and 3K, and the lowest number of high utility patterns in the data of item number 4K.

Fig. 4. Number of join count comparison when s=0.01%

Fig. 5. High utility pattern according to number of items

Fig. 6. Comparison of execution time when s=0.01%

Fig. 7. Comparison of join counts of each dataset according to minimum utility

Fig. 8. Comparison of execution time of each dataset according to minimum support rate

Also, execution time by data items was compared as Figure 6. It shows the results of minimum utility s=0.01% demonstrating the execution time on the data from 1K to 4K of the number of items. These results reflected the proposed BHUI algorithm provided the execution from 1.4 to 2.4 times as fast as the HUI technique. This is considered caused by dramatic lowered unnecessary comparison operations that BHUI lessened the number of join counts significantly.

Fig. 7 shows join counts of each dataset by the minimum utility. In all datasets, it shows the trend to be a similar number of join counts between HUI and BHUI as the minimum utility is higher, which may be from the fact that a number of join counts are rapidly lowered as the minimum utility is higher. As the number of 1-item high utility items is greater in BHUI, it can lower the number of join counts remarkably, demonstrating the good performance of BHUI.

Fig. 8 shows the execution time according to the minimum utility. This shows the execution speed of BHUI from 1.4 to 2.4 times as fast as that of HUI as the minimum utility is lowered in all datasets. Also, it shows the tendency to become similar performance between BHUI and HUI since a number of high utility items is lowered as the minimum utility is greater which lowers the number of join counts rapidly.

In this experiment, maximum 2.4 times faster execution speed with BHUI was shown than that with HUI in case of high numbers of the items and high utility items although the memory consumed more than HUI because it should sustain additional bits by the number of items in transactions.

5. Conclusions

Compared to frequent pattern mining considering the frequency of item with minimum threshold the user designates, utility pattern mining which considers weight and quantity of each item simultaneously has the practicality that can be applied in the business directly. In this paper, the algorithm using bitwise AND operation was proposed as a method of checking join possibility in advance to lower join operation cost which was the heaviest in utility mining. Join possibility of an item is checked by saving item information of transaction with bit and using bitwise operation. Also, the utility list structure is generated by items that can join practically and high utility pattern is extracted. From several experimental results, the proposed algorithm showed a greater effect to lower join operation cost as the number of 1-items is bigger, which affects the improvement of overall execution time speed.

High utility itemset mining requires a user specified minimum utility threshold. This is quite challenging problem because the threshold selection depends on the underlying dataset characteristics. We will continue to develop an algorithm efficiently mining top-k high utility itemsets. In addition, we will apply the proposed algorithm to the problem of finding high utility energy providers that can provide a balanced energy supply with a minimum cost in the smart grid field.

Acknowledgements

Funding for this paper was provided by Namseoul University.

References

R. Agrawal, R. Srikant, 1994, Fast algorithms for mining association rules in large databases, in Proc. of the 20th International Conference on Very Large Databases, pp. 487-499

J. Su, W. Chang, V. Tseng, 2017, Integrated mining of social and collaborative information for music recommend- ation, Data Science Patterns Recognition, Vol. 1, No. 1, pp. 13-30

U. Yun, D. Kim, 2017, Mining of High Average-utility Itemsets using novel list Structure and pruning Strategy, Future Generation Computer System, Vol. 68, pp. 346-360

H. Yao, H. J. Hamilton, L. Geng, 2006, A unified framework for utility-based measures for mining itemsets., in Proc. of ACM SIGKDD 2nd Workshop Utility-Based Data Mining, pp. 28-37

S. Krishnamoorthy, 2015, Pruning strategies for mining high utility itemsets, Expert Systems with Applications, Vol. 42, No. 5, pp. 2371-2381

S. Krishnamoorthy, 2017, Hminer: Efficiently mining high utility itemsets, Expert Systems with Applications, Vol. 90, No. c, pp. 168-183

J. C. W. Lin, T. Li, P. Fournier-Viger, T. P. Hong, J. Zhan, M. Voznak, 2016, An efficient algorithm to mine high average-utility itemsets, Adv. Eng. Inform., Vol. 30, No. 2, pp. 233-243

Y. Liu, W. Liao, A. Choudhary, 2005, A two-phase algorithm for fast discovery of high utility itemsets, in Proc. of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Vol. 3518, pp. 689-695

C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, Y. K. Lee, 2009, Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. on Knowledge and Data Engineering, Vol. 21, No. 12, pp. 1708-1721

Y. C. Li, J. S. Yeh, C. C. Chang, 2008, Isolated items discarding strategy for discovering high utility itemsets, Data Knowl. Eng., Vol. 64, No. 1, pp. 198-217

V. S. Tseng, B. E. Shie, C. W. Wu, P. S. Yu, 2013, Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. on Knowledge and Data Engineering, Vol. 25, No. 8, pp. 1772-1786

H. Yao, H. J. Hamilton, C. J. Butz, 2004, A foundational approach to mining itemset utilities from databases, in Proc. of the Fourth SIAM International Conference on Data Mining, pp. 482-486

U. Yun, H. Ryang, K. H. Ryu, 2014, High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates, Expert System with Applications, Vol. 41, No. 8, pp. 386-3878

V. Goyal, S. Dawar, 2015, UP-Hist tree: An efficient data structure for mining high utility patterns from transaction databases, in Proc. of the 19th International Database Engineering and Applications Symposium, pp. 56-61

E. Mengelkamp, J. Garttner, C. Weinhardt, 2018, Decent- ralizing energy systems through local energy markets: the LAMP-project, Proc. of Multikonferenz Wirtschaftsin- formatik, pp. 924-930

J. Liu, K. Wang, B. C. M. Fung, 2015, Mining High Utility Patterns in One Phase without Generating Candidates, IEEE Trans. on Knowledge and Data Engineering, Vol. 28, No. 5, pp. 1245-1257

M. Liu, J. Qu, 2012, Mining high utility itemsets without candidate generation, in Proc. of the 21st ACM International Conference on Information and Knowledge Management, pp. 55-64

P. Fournier-Viger, C. W. Wu, S. Zida, V. S. Tseng, 2014, FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning, Foundations of Intelligent Systems, pp. 83-92

W. Song, Y. Liu, J. Li, 2014, BAHUI: Fast and Memory Efficient Mining of High Utility Itemsets Based on Bitmap, International Journal of Data Warehousing and Mining, Vol. 10, No. 1, pp. 1-15

저자소개

황정희 (Jeong-Hee Hwang)

She received her Ph.D. in computer science from Chungbuk National University in 2005.

In 2006 joined the faculty of Namseoul university, where she is now an associate professor.

Her research interests include data mining, deep learning, stream data processing and big data analysis.

지정희 (Jeonghee Chi)

She received her Ph.D. in computer science from Chungbuk National University in 2006.

She is now an assistant professor at Konkuk university.

Her research interests include data mining, machine learning, and big data analysis.

박소영 (Soyoung Park)

She received her Ph.D. in computer science from Ewha Womans University in 2006.

She is now an assistant professor at Konkuk university.

Her research interests include computer security, cryptography, and big data analysis.

KIEEThe Transactions of
the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

Journal Search

Journal XML

Journal Information

BHUI: 이진 논리 연산을 활용한 하이 유틸리티 마이닝 기법에 대한 연구

Abstract

Key words

1. 서 론

2. Related Works

3. A Bitwise Operation-based High Utility Itemset Mining (BHUI) Algorithm

3.1 Problem Definition

(1)

(2)

(3)

(4)

(5)

(6)

3.2 BHUI Algorithm

4. Performance evaluation

4.1 Experimental Environment

4.2 Experimental Results

5. Conclusions

Acknowledgements

References

저자소개

황정희 (Jeong-Hee Hwang)

지정희 (Jeonghee Chi)

박소영 (Soyoung Park)

Article Information (continued)

Key words

KIEEThe Transactions ofthe Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

Journal Search

Journal XML

Journal Information

BHUI: 이진 논리 연산을 활용한 하이 유틸리티 마이닝 기법에 대한 연구

Abstract

Key words

1. 서 론

2. Related Works

3. A Bitwise Operation-based High Utility Itemset Mining (BHUI) Algorithm

3.1 Problem Definition

(1)

(2)

(3)

(4)

(5)

(6)

3.2 BHUI Algorithm

4. Performance evaluation

4.1 Experimental Environment

4.2 Experimental Results

5. Conclusions

Acknowledgements

References

저자소개

황정희 (Jeong-Hee Hwang)

지정희 (Jeonghee Chi)

박소영 (Soyoung Park)

Article Information (continued)

Key words

KIEEThe Transactions of
the Korean Institute of Electrical Engineers