Advances in Knowledge Discovery and Data Mining (eBook)

11th Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22-25, 2007

Zhi-Hua Zhou, Hang Li, Qiang Yang (Eds.) (Herausgeber)

eBook Download: PDF

2007 | 1. Auflage
1186 Seiten
Springer-Verlag
978-3-540-71701-0 (ISBN)

This book constitutes the refereed proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007, held in Nanjing, China in May 2007.

The 34 revised full papers and 92 revised short papers presented together with four keynote talks or extended abstracts thereof were carefully reviewed and selected from 730 submissions. The papers are devoted to new ideas, original research results and practical development experiences from all KDD-related areas including data mining, machine learning, databases, statistics, data warehousing, data visualization, automatic scientific discovery, knowledge acquisition and knowledge-based systems.

Written for: Researchers and professionals

Keywords: Web data mining, algorithmic learning, ant colony optimization, association rule mining, biomedical data analysis, classification, clustering, computer security, data analysis, data mining, feature selection, image segmentation, information extraction, knowledge discovery, learning classifier systems, machine learning, privacy, qualitative reasoning, random forest, rough sets, statistical learning, support vector machines, text mining, text summarization, workflow mining

Preface 6
Organization 8
Table of Contents 16
Research Frontiers in Advanced Data Mining Technologies and Applications 27
Finding the Real Patterns 32
Class Noise vs Attribute Noise: Their Impacts, Detection and Cleansing 33
Multi-modal and Multi-granular Learning 35
Hierarchical Density-Based Clustering of Categorical Data and a Simplification 37
Multi-represented Classification Based on Confidence Estimation 49
Selecting a Reduced Set for Building Sparse Support Vector Regression in the Primal 61
Mining Frequent Itemsets from Uncertain Data 73
QC4 - A Clustering Evaluation Method 85
Semantic Feature Selection for Object Discovery in High-Resolution Remote Sensing Imagery 97
Deriving Private Information from Arbitrarily Projected Data 110
Consistency Based Attribute Reduction 122
A Hybrid Command Sequence Model for Anomaly Detection 134
s-Algorithm: Structured Workflow ProcessMining Through Amalgamating TemporalWorkcases 145
Multiscale BiLinear Recurrent Neural Network for Prediction of MPEG Video Traffic 157
An Effective Multi-level Algorithm Based on Ant Colony Optimization for Bisecting Graph 164
A Unifying Method for Outlier and Change Detection from Data Streams Based on Local Polynomial Fitting 176
Simultaneous Tuning of Hyperparameter and Parameter for Support Vector Machines 188
Entropy Regularization, Automatic Model Selection, and Unsupervised Image Segmentation 199
A Timing Analysis Model for Ontology Evolutions Based on Distributed Environments 209
An Optimum Random Forest Model for Prediction of Genetic Susceptibility to Complex Diseases 219
Feature Based Techniques for Auto-Detection of Novel Email Worms 231
Multiresolution-Based BiLinear Recurrent Neural Network 243
Query Expansion Using a Collection Dependent Probabilistic Latent Semantic Thesaurus 250
Scaling Up Semi-supervised Learning: An Efficient and Effective LLGC Variant 262
A Machine Learning Approach to Detecting Instantaneous Cognitive States from fMRI Data 274
Discovering Correlated Items in Data Streams 286
Incremental Clustering in Geography and Optimization Spaces 298
Estimation of Class Membership Probabilities in the Document Classification 310
A Hybrid Multi-group Privacy-Preserving Approach for Building Decision Trees 322
A Constrained Clustering Approach to Duplicate Detection Among Relational Data 334
Understanding Research Field Evolving and Trend with Dynamic Bayesian Networks 346
Embedding New Data Points for Manifold Learning Via Coordinate Propagation 358
Spectral Clustering Based Null Space Linear Discriminant Analysis (SNLDA) 370
On a New Class of Framelet Kernels for Support Vector Regression and Regularization Networks 381
A Clustering Algorithm Based on Mechanics 393
DLDA/QR: A Robust Direct LDA Algorithm for Face Recognition and Its Theoretical Foundation 405
gPrune: A Constraint Pushing Framework for Graph Pattern Mining 414
Modeling Anticipatory Event Transitions 427
A Modified Relationship Based Clustering Framework for Density Based Clustering and Outlier Filtering on High Dimensional Datasets 435
A Region-Based Skin Color Detection Algorithm 443
Supportive Utility of Irrelevant Features in Data Preprocessing 451
Incremental Mining of Sequential Patterns Using Prefix Tree 459
A Multiple Kernel Support Vector Machine Scheme for Simultaneous Feature Selection and Rule-Based Classification 467
Combining Supervised and Semi-supervised Classifier for Personalized Spam Filtering 475
Qualitative Simulation and Reasoning with Feature Reduction Based on Boundary Conditional Entropy of Knowledge 483
A Hybrid Incremental Clustering Method-Combining Support Vector Machine and Enhanced Clustering by Committee Clustering Algorithm 491
CCRM: An Effective Algorithm for Mining Commodity Information from Threaded Chinese Customer Reviews 499
A Rough Set Approach to Classifying Web Page Without Negative Examples 507
Evolution and Maintenance of Frequent Pattern Space When Transactions Are Removed 515
Establishing Semantic Relationship in Inter-query Learning for Content-Based Image Retrieval Systems 524
Density-Sensitive Evolutionary Clustering 533
Reducing Overfitting in Predicting Intrinsically Unstructured Proteins 541
Temporal Relations Extraction in Mining Hepatitis Data 549
Supervised Learning Approach to Optimize Ranking Function for Chinese FAQ-Finder 557
Combining Convolution Kernels Defined on Heterogeneous Sub-structures 565
Privacy-Preserving Sequential Pattern Release 573
Mining Concept Associations for Knowledge Discovery Through Concept Chain Queries 581
Capability Enhancement of Probabilistic Neural Network for the Design of Breakwater Armor Blocks 589
Named Entity Recognition Using Acyclic Weighted Digraphs: A Semi-supervised Statistical Method 597
Contrast Set Mining Through Subgroup Discovery Applied to Brain Ischaemina Data 605
Intelligent Sequential Mining Via Alignment: Optimization Techniques for Very Large DB 613
A Hybrid Prediction Method Combining RBF Neural Network and FAR Model 624
An Advanced Fuzzy C-Mean Algorithm for Regional Clustering of Interconnected Systems 632
Centroid Neural Network with Bhattacharyya Kernel for GPDF Data Clustering 642
Concept Interconnection Based on Many-Valued Context Analysis 649
Text Classification for Thai Medicinal Web Pages 657
A Fast Algorithm for Finding Correlation Clusters in Noise Data 665
Application of Discrimination Degree for Attributes Reduction in Concept Lattice 674
A Language and a Visual Interface to Specify Complex Spatial Patterns 682
Clustering Ensembles Based on Normalized Edges 690
Quantum-Inspired Immune Clonal Multiobjective Optimization Algorithm 698
Phase Space Reconstruction Based Classification of Power Disturbances Using Support Vector Machines 706
Mining the Impact Factors of Threads and Participators on Usenet Using Link Analysis 714
Weighted Rough Set Learning: Towards a Subjective Approach 722
Multiple Self-Splitting and Merging Competitive Learning Algorithm 730
A Novel Relative Space Based Gene Feature Extraction and Cancer Recognition 738
Experiments on Kernel Tree Support Vector Machines for Text Categorization 746
A New Approach for Similarity Queries of Biological Sequences in Databases 754
Anomaly Intrusion Detection Based on Dynamic Cluster Updating 763
Efficiently Mining Closed Constrained Frequent Ordered Subtrees by Using Border Information 771
Approximate Trace of Grid-Based Clusters over High Dimensional Data Streams 779
BRIM: An Efficient Boundary Points Detecting Algorithm 787
Syntactic Impact on Sentence Similarity Measure in Archive-Based QA System 795
Semi-structure Mining Method for Text Mining with a Chunk-Based Dependency Structure 803
Principal Curves with Feature Continuity 811
Kernel-Based Linear Neighborhood Propagation for Semantic Video Annotation 819
Learning Bayesian Networks with Combination of MRMR Criterion and EMI Method 827
A Cooperative Coevolution Algorithm of RBFNN for Classification 835
ANGEL: A New Effective and Efficient Hybrid Clustering Technique for Large Databases 843
Exploring Group Moving Pattern for an Energy-Constrained Object Tracking Sensor Network 851
ProMail: Using Progressive Email Social Network for Spam Detection 859
Multidimensional Decision Support Indicator (mDSI) for Time Series Stock Trend Prediction 867
A Novel Support Vector Machine Ensemble Based on Subtractive Clustering Analysis 875
Keyword Extraction Based on PageRank 883
Finding the Optimal Feature Representations for Bayesian Network Learning 891
Feature Extraction and Classification of Tumor Based on Wavelet Package and Support Vector Machines 897
Resource Allocation and Scheduling Problem Based on Genetic Algorithm and Ant Colony Optimization 905
Image Classification and Segmentation for Densely Packed Aggregates 913
Mining Temporal Co-orientation Pattern from Spatio-temporal Databases 921
Incremental Learning of Support Vector Machines by Classifier Combining 930
Clustering Zebrafish Genes Based on Frequent-Itemsets and Frequency Levels 938
A Practical Method for Approximate Subsequence Search in DNA Databases 947
An Information Retrieval Model Based on Semantics 958
AttributeNets: An Incremental Learning Method for Interpretable Classification 966
Mining Personalization Interest and Navigation Patterns on Portal 974
Cross-Lingual Document Clustering 982
Grammar Guided Genetic Programming forFlexible Neural Trees Optimization 990
A New Initialization Method for Clustering Categorical Data 998
L0-Constrained Regression for Data Mining 1007
Application of Hybrid Pattern Recognition for Discriminating Paddy Seeds of Different Storage Periods Based on Vis/NIRS 1015
Density-Based Data Clustering Algorithms for Lower Dimensions Using Space-Filling Curves 1023
Transformation-Based GMM with Improved Cluster Algorithm for Speaker Identification 1032
Using Social Annotations to Smooth the Language Model for IR 1041
Affection Factor Optimization in Data Field Clustering 1048
A New Algorithm for Minimum Attribute Reduction Based on Binary Particle Swarm Optimization with Vaccination 1055
Graph Nodes Clustering Based on the Commute-Time Kernel 1063
Identifying Synchronous and Asynchronous Co-regulations from Time Series Gene Expression Data 1072
A Parallel Algorithm for Learning Bayesian Networks 1081
Incorporating Prior Domain Knowledge into a Kernel Based Feature Selection Algorithm 1090
Geo-spatial Clustering with Non-spatial Attributes and Geographic Non-overlapping Constraint: A Penalized Spatial Distance Measure 1098
GBKII: An Imputation Method for Missing Values 1106
An Effective Gene Selection Method Based on RelevanceAnalysis and Discernibility Matrix 1114
Towards Comprehensive Privacy Protection in Data Clustering 1122
A Novel Spatial Clustering with Obstacles Constraints Based on Particle Swarm Optimization and K-Medoids 1131
Online Rare Events Detection 1140
Structural Learning About Independence Graphs from Multiple Databases 1148
An Effective Method For Calculating Natural Adjacency Relation in Spatial Database 1157
K-Centers Algorithm for Clustering Mixed Type Data 1166
Proposion and Analysis of a TCP Feature of P2P Traffic 1174
Author Index 1182

Research Frontiers in Advanced Data Mining Technologies and Applications (p. 25)
Data mining, as the confluence of multiple intertwined disciplines, including statistics, machine learning, pattern recognition, database systems, information retrieval, World-Wide Web, and many application domains, has achieved great progress in the past decade [1]. Similar to many research fields, data mining has two general directions: theoretical foundations and advanced technologies and applications.

Here we focus on advanced technologies and applications in data mining and discuss some recent progress in this direction. Notice that some popular research topics, such as privacypreserving data mining, are not covered in the discussion for lack of space/time. Our discussion is organized into nine themes, and we briefly outline the current status and research problems in each theme.

1 Pattern Mining, Pattern Usage, and Pattern Understanding
Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structural pattern mining, correlation mining, associative classification, and frequent-pattern-based clustering, as well as their broad applications.

Recently, studies have proceeded to scalable methods for mining colossal patterns where the size of the patterns could be rather large so that the step-by-step growth using an Apriori-like approach does not work, methods for pattern compression, extraction of high-quality top-k patterns, and understanding patterns by context analysis and generation of semantic annotations.

Moreover, frequent patterns have been used for effective classification by top-k rule generation for long patterns and discriminative frequent pattern analysis. Frequent patterns have also been used for clustering of high-dimensional biological data. Scalable methods for mining long, approximate, compressed, and sophisticated patterns for advanced applications, such as biological sequences and networks, and the exploration of mined patterns for classification, clustering, correlation analysis, and pattern understanding will still be interesting topics in research.

2 Information Network Analysis

Google’s PageRank algorithm has started a revolution on Internet search. However, since information network analysis covers many additional aspects and needs scalable and effective methods, the systematic study of this domain has just started, with many interesting issues to be explored. Information network analysis has broad applications, covering social and biological network analysis, computer network intrusion detection, software program analysis, terrorist network discovery, and Web analysis.

One interesting direction is to treat information network as graphs and further develop graph mining methods. Recent progress on graph mining and its associated structural pattern-based classification and clustering, graph indexing, and similarity search will play an important role in information network analysis.

Moreover, since information networks often form huge, multidimensional heterogeneous graphs, mining noisy, approximate, and heterogeneous subgraphs based on different applications for the construction of application-specific networks with sophisticated structures will help information network analysis substantially.The discovery of the power law distribution of information networks and the rules on density evolution of information networks will help develop effective algorithms for network analysis.

Erscheint lt. Verlag	1.1.2007
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
Themenwelt	Informatik ► Theorie / Studium ► Algorithmen
ISBN-10	3-540-71701-3 / 3540717013
ISBN-13	978-3-540-71701-0 / 9783540717010

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 38,7 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 224,65