98 (1) 2010

  1. Special Issue on Intelligent Data Analysis in Granular Computing. Preface i-ii
  2. Analysis of the Effectiveness of the Genetic Algorithms based on Extraction of Association Rules
    Jesus Alcala-Fdez, Nicolo Flugy-Pape, Andrea Bonarini,   Francisco Herrera 1-14
    Data Mining is most commonly used in attempts to induce association rules from transaction data which can help decision-makers easily analyze the data and make good decisions regarding the domains concerned. Most conventional studies are focused on binary or discrete-valued transaction data, however the data in real-world applications usually consists of quantitative values. In the last years, many researches have proposed Genetic Algorithms for mining interesting association rules from quantitative data. In this paper, we present a study of three genetic association rules extraction methods to show their effectiveness for mining quantitative association rules. Experimental results over two real-world databases are showed.
  3. Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets
    Kun-Che Lu,  Don-Lin Yang 15-32
    Clustering is useful for mining the underlying structure of a dataset in order to support decision making since target or high-risk groups can be identified. However, for high dimensional datasets, the result of traditional clustering methods can be meaningless as clusters may only be depicted with respect to a small part of features. Taking customer datasets as an example, certain customers may correlate with their salary and education, and the others may correlate with their job and house location. If one uses all the features of a customer for clustering, these local-correlated clusters may not be revealed. In addition, processing high dimensions and large datasets is a challenging problem in decision making. Searching all the combinations of every feature with every record to extract local-correlated clusters is infeasible, which is in exponential scale in terms of data dimensionality and cardinality. In this paper, we propose a scalable 2-Leveled Approximated Hyper-Image-based Clustering framework, referred as 2L-HIC-A, for mining local-correlated clusters, where each level clustering process requires only one scan of the original dataset. Moreover, the data-processing time of 2L-HIC-A can be independent of the input data size. In 2L-HIC-A, various well-developed image processing techniques can be exploited for mining clusters. In stead of proposing a new clustering algorithm, our framework can accommodate other clustering methods for mining local-corrected clusters, and to shed new light on the existing clustering techniques.
  4. Discovering the Radio Signal Coverage Hole and Weak Coverage Area in Mobile Network by Spatiotemporal Data Mining on Location-Based Services
    Wei-Hsun Lee, Shian-Shyong Tseng,  Ching-Hung Wang,   Shen-Lung Tung 33-47
    Locating the radio signal coverage hole (SCH) and signal weak coverage area (SWCA) in the mobile network and taking appropriate actions to improve network quality are the major tasks for mobile network operators (MNO). A novel approach, Signal Coverage Hole and weak coverage Area DIscovering Model (SCHADIM), is proposed based on spatiotemporal data mining on the raw data collecting from location based service (LBS) to achieve this goal. It reuses the communication raw data in LBS-based applications and transforms it into mobile network communication quality monitoring information by integrating the GIS (geographical information system), road network database and mobile network base station database. By this way, the vehicles in the LBS-based applications can be regarded as the mobile network signal probing vehicles, which provides plentiful information for discovering the SCH/SWCA. Comparing to the traditional mobile network signal probing vehicle method, which is known as radio site verification (RSV) method, the proposed SCHADIM has the spatiotemporal coverage as well as cost advantages. A mobile network monitoring system, cellular network quality safeguard (CQS), has been implemented based on the proposed model which combines with the domain expert heuristics to provide decision support information for optimizing the mobile network.
  5. Predicting Website Audience Demographics for Web Advertising Targeting Using Multi-Website Clickstream Data
    Koen W. De Bock,  Dirk Van den Poel 49-67
    Several recent studies have explored the virtues of behavioral targeting and personalization for online advertising. In this paper, we add to this literature by proposing a cost-effective methodology for the prediction of demographic website visitor profiles that can be used for web advertising targeting purposes. The methodology involves the transformation of website visitors' clickstream patterns to a set of features and the training of Random Forest classifiers that generate predictions for gender, age, level of education and occupation category. These demographic predictions can support online advertisement targeting (i) as an additional input in personalized advertising or behavioral targeting, or (ii) as an input for aggregated demographic website visitor profiles that support marketing managers in selecting websites and achieving an optimal correspondence between target groups and website audience composition. The proposed methodology is validated using data from a Belgian web metrics company. The results reveal that Random Forests demonstrate superior classification performance over a set of benchmark algorithms. Further, the ability of the model set to generate representative demographic website audience profiles is assessed. The stability of the models over time is demonstrated using out-of-period data.
  6. Mining Outliers in Correlated Subspaces for High Dimensional Data Sets
    Jinsong Leng, Tzung-Pei Hong 71-86
    Outlier detection in high dimensional data sets is a challenging data mining task. Mining outliers in subspaces seems to be a promising solution, because outliers may be embedded in some interesting subspaces. Searching for all possible subspaces can lead to the problem called "the curse of dimensionality". Due to the existence of many irrelevant dimensions in high dimensional data sets, it is of paramount importance to eliminate the irrelevant or unimportant dimensions and identify interesting subspaces with strong correlation. Normally, the correlation among dimensions can be determined by traditional feature selection techniques or subspace-based clustering methods. The dimension-growth subspace clustering techniques can find interesting subspaces in relatively lower dimension spaces, while dimension-reduction approaches try to group interesting subspaces with larger dimensions. This paper aims to investigate the possibility of detecting outliers in correlated subspaces. We present a novel approach by identifying outliers in the correlated subspaces. The degree of correlation among dimensions is measured in terms of the mean squared residue. In doing so, we employ a dimension-reduction method to find the correlated subspaces. Based on the correlated subspaces obtained, we introduce another criterion called "shape factor" to rank most important subspaces in the projected subspaces. Finally, outliers are distinguished from most important subspaces by using classical outlier detection techniques. Empirical studies show that the proposed approach can identify outliers effectively in high dimensional data sets.
  7. Covariance Matrix Self-Adaptation and Kernel Regression - Perspectives of Evolutionary Optimization in Kernel Machines
    Oliver Kramer 87-106
    Kernel based techniques have shown outstanding success in data mining and machine learning in the recent past. Many optimization problems of kernel based methods suffer from multiple local optima. Evolution strategies have grown to successful methods in non-convex optimization. This work shows how both areas can profit from each other. We investigate the application of evolution strategies to Nadaraya-Watson based kernel regression and vice versa. The Nadaraya-Watson estimator is used as meta-model during optimization with the covariance matrix self-adaptation evolution strategy. An experimental analysis evaluates the meta-model assisted optimization process on a set of test functions and investigates model sizes and the balance between objective function evaluations on the real function and on the surrogate. In turn, evolution strategies can be used to optimize the embedded optimization problem of unsupervised kernel regression. The latter is fairly parameter dependent, and minimization of the data space reconstruction error is an optimization problem with numerous local optima. We propose an evolution strategy based unsupervised kernel regression method to solve the embedded learning problem. Furthermore, we tune the novel method by means of the parameter tuning technique sequential parameter optimization.
  8. Risk Mining in Medicine: Application of Data Mining to Medical Risk Management
    Shusaku Tsumoto,  Shoji Hirano 107-121
    This paper proposes an application of data mining to medical risk management, where data mining techniques are applied to detection, analysis and evaluation of risks potentially existing in clinical environments. We applied this technique to the following two medical domains: risk aversion of nurse incidents and infection control. The results show that data mining methods were effective to detection and aversion of risk factors.
  9. Communication Error Determination System for Multi-layered or Chained Situations
    Akinori Abe, Yukio Ohsawa,  Hiromi Itoh Ozaku,   Kaoru Sagara,  Noriaki Kuwahara,  Kiyoshi Kogure 123-142
    Many medical accidents and incidents occurred due to communication errors. To avoid such incidents, in this paper, we propose a system for determining communication errors. Especially, we propose a model that can be applied to multi-layered or chained situations. First, we provide an overview of communication errors in nursing activities. Then we describe the warp and woof model for nursing task that was proposed by Harada and considers multi-layered or chained situations. Next we describe a system for determining communication errors based on the warp and woof model for nursing task. The system is capable of generating nursing activity diagrams semi-automatically and compiles necessary nursing activities. We also propose a prototype tagging of the nursing corpus for an effective generation of the diagrams. Then we combine the diagram generation with the Kamishibai KeyGraph to determine possible points of the hidden or potential factors of communication errors.

98 (2-3) 2010

  1. Deterministic and Unambiguous Families within Recognizable Two-dimensional Languages
    Marcella Anselmo,  Dora Giammarresi,  Maria Madonia 143-166
    Recognizable two-dimensional languages (REC) are defined by tiling systems that generalize to two dimensions non-deterministic finite automata for strings. We introduce the notion of deterministic tiling system and the corresponding family of languages (DREC) and study its structural and closure properties. Furthermore we show that, in contrast with the one-dimensional case, there exist other classes between deterministic and non-deterministic families that we separate by means of examples and decidability properties.
  2. Feature Selection via Maximizing Fuzzy Dependency
    Qinghua Hu,  Pengfei Zhu,  Jinfu Liu,  Yongbin Yang  and  Daren Yu 167-181
    Feature selection is an important preprocessing step in pattern analysis and machine learning. The key issue in feature selection is to evaluate quality of candidate features. In this work, we introduce a weighted distance learning algorithm for feature selection via maximizing fuzzy dependency. We maximize fuzzy dependency between features and decision by distance learning and then evaluate the quality of features with the learned weight vector. The features deriving great weights are considered to be useful for classification learning. We test the proposed technique with some classical methods and the experimental results show the proposed algorithm is effective.
  3. Learning Behaviors of Functions
    Bala Kalyanasundaram,  Mahe Velauthapillai 183-198
    We consider the inductive inference model of Gold [15]. Suppose we are given a set of functions that are learnable with certain number of mind changes and errors. What properties of these functions are learnable if we allow fewer number of mind changes or errors? In order to answer this question this paper extends the Inductive Inference model introduced by Gold [15]. Another motivation for this extension is to understand and characterize properties that are learnable for a given set of functions. Our extension considers a wide range of properties of function based on their input-output relationship. Two specific properties of functions are studied in this paper. The first property, which we call modality, explores how the output of a function fluctuates. For example, consider a function that predicts the price of a stock. A brokerage company buys and sells stocks very often in a day for its clients with the intent of maximizing their profit. If the company is able predict the trend of the stock market "reasonably" accurately then it is bound to be very successful. Identification criterion for this property of a function f is called PREX which predicts if f(x) is equal to, less than or greater than f(x+1) for each x.
    Next, as opposed to a constant tracking by a brokerage company, an individual investor does not often track dynamic changes in stock values. Instead, the investor would like to move the investment to a less risky option when the investment exceeds or falls below certain threshold. We capture this notion using an identification criterion called TREX that essentially predicts if a function value is at, above, or below a threshold value. Conceptually, modality prediction (i.e., PREX) and threshold prediction (i.e., TREX) are ëasier" than EX learning. We show that neither the number of errors nor the number of mind-changes can be reduced when we ease the learning criterion from exact learning to learning modality or threshold. We also prove that PREX and TREX are totally different properties to predict. That is, the strategy for a brokerage company may not be a good strategy for individual investor and vice versa.
  4. Adaptive Rough Entropy Clustering Algorithms in Image Segmentation
    Dariusz Małyszko,  Jarosław Stepaniuk 199-231
    High quality performance of image segmentation methods presents one leading priority in design and implementation of image analysis systems. Incorporating the most important image data information into segmentation process has resulted in development of innovative frameworks such as fuzzy systems, rough systems and recently rough - fuzzy systems. Data analysis based on rough and fuzzy systems is designed to apprehend internal data structure in case of incomplete or uncertain information. Rough entropy framework proposed in [12,13] has been dedicated for application in clustering systems, especially for image segmentation systems. We extend that framework into eight distinct rough entropy measures and related clustering algorithms. The introduced solutions are capable of adaptive incorporation of the most important factors that contribute to the relation between data objects and makes possible better understanding of the image structure. In order to prove the relevance of the proposed rough entropy measures, the evaluation of rough entropy segmentations based on the comparison with human segmentations from Berkeley and Weizmann image databases has been presented. At the same time, rough entropy based measures applied in the domain of image segmentation quality evaluation have been compared with standard image segmentation indices. Additionally, rough entropy measures seem to comprehend properly properties validated by different image segmentation quality indices.
  5. Design of a Hybrid Quantizer with Variable Length Code
    Zoran H. Perić,  Milan R. Dinčić,  Marko D. Petković 233-256
    In this paper a new model for compression of Laplacian source is given. This model consists of hybrid quantizer whose output levels are coded with Golomb-Rice code. Hybrid quantizer is combination of uniform and nonuniform quantizer, and it can be considered as generalized quantizer, whose special cases are uniform and nonuniform quantizers. We propose new generalized optimal compression function for companding quantizers. Hybrid quantizer has better performances (smaller bit-rate and complexity for the same quality) than both uniform and nonuniform quantizers, because it joins their good characteristics. Also, hybrid quantizer allows great flexibility, because there are many combinations of number of levels in uniform part and in nonuniform part, which give similar quality. Each of these combinations has different bit-rate and complexity, so we have freedom to choose combination which is the most appropriate for our application, in regard to quality, bit-rate and complexity. We do not have such freedom of choice when we use uniform or nonuniform quantizers. Until now, it has been thought that uniform quantizer is the most appropriate to use with lossless code, but in this paper we show that combination of hybrid quantizer and lossless code gives better performances. As lossless code we use Golomb-Rice code because it is especially suitable for Laplacian source since it gives average bit-rate very close to the entropy and it is easier for implementation than Huffman code. Golomb-Rice code is used in many modern compression standards. Our model can be used for compression of all signals with Laplacian distribution.
  6. Tree Structure Based Data Hiding for Progressive Transmission Images
    Piyu Tsai 257-275
    Progressive image transmission (PIT) is supported by several encoders such as SPIHT, JPEG2000 and so on. However, few of data hiding scheme for progressive transmission images is designed. In this paper, tree-structure-based data hiding for set partitioning in hierarchical trees (SPIHT) images is proposed. The bit stream of SPIHT multi-stage encoded was structured of tree. The secret image can be progressively embedded into SPIHT bit trees.
    Experimental results showed the progressive image data hiding is achieved. The secret image is progressively embedded/extracted in SPIHT encoded images. Furthermore, a higher hiding capacity was provided in an earlier encoding which helped the secret image to be identified earlier. Also, an adaptive hiding capacity could be developed by using different tree structures. In comparison with Tsai et al.'s scheme, the proposed scheme had a higher hiding capacity and a better secret image quality.
  7. On the Insecurity of an Identity Based Proxy Re-encryption Scheme
    Xu An Wang,  Xiaoyuan Yang 277-281
    At Pairing'07, Matsuo proposed two proxy re-encryption schemes: proxy re-encryption from CBE to IBE and IBE to IBE. Now both schemes have been standardized by P1363.3 workgroup. In this paper, we show that their identity based proxy re-encryption scheme is insecure. We give two attacks to this scheme. The first attack shows that the proxy can re-encrypt any IBE user's ciphertext to be the delegatee's ciphertext. The second attack implies that, if the proxy colludes with any delegatee, the proxy and this delegatee can derive any other IBE user's secret key.
  8. Free-Choice Petri Nets without Frozen Tokens, and Bipolar Synchronization Systems
    Joachim Wehler 283-320
    Bipolar synchronization systems (BP-systems) constitute a class of coloured Petri nets, well suited for modelling the control flow of discrete dynamical systems. Every BP-system has an underlying ordinary Petri net, a T-system. It further has a second ordinary net attached, a free-choice system. We prove that a BP-system is safe and live if the T-system and the free-choice system are safe and live and the free-choice system in addition has no frozen tokens. This result is the converse of a theorem of Genrich and Thiagarajan and proves an old conjecture. As a consequence we obtain two results about the existence of safe and live BP-systems with prescribed ordinary Petri nets. For the proof of these theorems we introduce the concept of a morphism between Petri nets as a means of comparing different Petri nets. We then apply the classical theory of free-choice systems.
  9. Capacity-Raising Steganography Using Multi-Pixel Differencing and Pixel-Value Shifting Operations
    Cheng-Hsing Yang, Shiuh-Jeng Wang, Chi-Yao Weng 321-336
    In order to provide large embedding capacity and to minimize distortion for the stego-image, a steganographic method using multi-pixel differencing is presented in this paper. It takes into consideration four pixels of a block, and the differences between the lowest gray-value pixel and its surrounding pixels are used to determine the degree of smoothness and sharpness for embedding the secret data. If the difference values are large in a block, and the block is located in the sharp area then more data can be embedded. On the other hand, if the block is located in the smooth area less data can be embedded. The multi-pixel differencing method is employed in our scheme. We also propose the pixel-value shifting method to increase the image quality. The experimental results show that our scheme has a large embedding capacity without creating a noticeable distortion.

98 (4) 2010

  1. Bridging Logic and Computer Science: to Johann A. Makowsky for his 60th birthday. Preface i-i
  2. On the Borel Complexity of MSO Definable Sets of Branches
    Mikołaj Bojańczyk,  Damian Niwiński,   Alexander Rabinovich,  Adam Radziwończyk-Syta,  Michał Skrzypczak 337-349
    An infinite binary word can be identified with a branch in the full binary tree. We consider sets of branches definable in monadic second-order logic over the tree, where we allow some extra monadic predicates on the nodes. We show that this class equals to the Boolean combinations of sets in the Borel class S02 over the Cantor discontinuum. Note that the last coincides with the Borel complexity of w-regular languages.
  3. Properties of Almost All Graphs and Generalized Quantifiers
    Anuj Dawar, Erich Grädel 351-372
    We study 0-1 laws for extensions of first-order logic by Lindström quantifiers. We state sufficient conditions on a quantifier Q expressing a graph property, for the logic FO[Q] - the extension of first-order logic by means of the quantifier Q - to have a 0-1 law. We use these conditions to show, in particular, that FO[Rig], where Rig is the quantifier expressing rigidity, has a 0-1 law. We also show that extensions of first-order logic with quantifiers for Hamiltonicity, regularity and self-complementarity of graphs do not have a 0-1 law. Blass and Harary pose the question whether there is a logic which is powerful enough to express Hamiltonicity or rigidity and which has a 0-1 law. It is a consequence of our results that there is no such regular logic (in the sense of abstract model theory) in the case of Hamiltonicity, but there is one in the case of rigidity. We also consider sequences of vectorized quantifiers, and show that the extensions of first-order logic obtained by adding such sequences generated by quantifiers that are closed under substructures have 0-1 laws. The positive results also extend to the infinitary logic with finitely many variables.
  4. A Most General Edge Elimination Polynomial - Thickening of Edges
    Christian Hoffmann 373-378
    We consider a graph polynomial x(G;x,y,z) introduced by Ilia Averbouch, Benny Godlin, and Johann A. Makowsky (2008). This graph polynomial simultaneously generalizes the Tutte polynomial as well as a bivariate chromatic polynomial defined by Klaus Dohmen, André Pönitz, and Peter Tittmann (2003). We derive an identity which relates the graph polynomial x of a thickened graph (i.e. a graph with each edge replaced by k copies of it) to x of the original graph. As a consequence, we observe that at every point (x,y,z), except for points lying within some set of dimension 2, evaluating x is #P-hard. Thus, x supports Johann A. Makowsky's difficult point conjecture for graph polynomials (2008).
  5. A Note on Two-pebble Automata Over Infinite Alphabets
    Michael Kaminski,  Tony Tan 379-390
    It is shown that the emptiness problem for two-pebble automata languages is undecidable and that two-pebble automata are weaker than three-pebble automata.
  6. Tree-width in Algebraic Complexity
    Klaus Meer 391-409
    The paper surveys some of the author's work studying the algorithmic importance of the tree-width notion in algebraic frameworks. Two approaches are described. The first gives an algorithmic meta-theorem for certain logically characterized properties within the Blum-Shub-Smale BSS model of computation over the reals. The second reports on recent joint work with P. Koiran relating Boolean complexity and Valiant's approach to study families of polynomial systems over infinite fields and their complexity. We define particular families of polynomials via bounding the tree-width of suitably attached graphs and study the expressive power of the resulting families.
    The work described here is partially co-authored with and partially very much influenced by previous work of Janos A. Makowsky.