ComputationWorld 2018
February 18 - 22, 2018 - Barcelona, Spain

  • SERVICE COMPUTATION 2018, The Tenth International Conference on Advanced Service Computing
  • CLOUD COMPUTING 2018, The Ninth International Conference on Cloud Computing, GRIDs, and Virtualization
  • FUTURE COMPUTING 2018, The Tenth International Conference on Future Computational Technologies and Applications
  • COGNITIVE 2018, The Tenth International Conference on Advanced Cognitive Technologies and Applications
  • ADAPTIVE 2018, The Tenth International Conference on Adaptive and Self-Adaptive Systems and Applications
  • CONTENT 2018, The Tenth International Conference on Creative Content Technologies
  • PATTERNS 2018, The Tenth International Conference on Pervasive Patterns and Applications
  • COMPUTATION TOOLS 2018, The Ninth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking
  • BUSTECH 2018, The Eighth International Conference on Business Intelligence and Technology

DigitalWorld 2018
March 25 - 29, 2018 - Rome, Italy

  • ICDS 2018, The Twelfth International Conference on Digital Society and eGovernments
  • ACHI 2018, The Eleventh International Conference on Advances in Computer-Human Interactions
  • GEOProcessing 2018, The Tenth International Conference on Advanced Geographic Information Systems, Applications, and Services
  • eTELEMED 2018, The Tenth International Conference on eHealth, Telemedicine, and Social Medicine
  • eLmL 2018, The Tenth International Conference on Mobile, Hybrid, and On-line Learning
  • eKNOW 2018, The Tenth International Conference on Information, Process, and Knowledge Management
  • ALLSENSORS 2018, The Third International Conference on Advances in Sensors, Actuators, Metering and Sensing
  • SMART ACCESSIBILITY 2018, The Third International Conference on Universal Accessibility in the Internet of Things and Smart Environments

NexComm 2018
April 22 - 26, 2018 - Athens, Greece

  • ICDT 2018, The Thirteenth International Conference on Digital Telecommunications
  • SPACOMM 2018, The Tenth International Conference on Advances in Satellite and Space Communications
  • ICN 2018, The Seventeenth International Conference on Networks
    • SOFTNETWORKING 2018, The International Symposium on Advances in Software Defined Networking and Network Functions Virtualization
  • ICONS 2018, The Thirteenth International Conference on Systems
  • MMEDIA 2018, The Tenth International Conference on Advances in Multimedia
  • PESARO 2018, The Eighth International Conference on Performance, Safety and Robustness in Complex Systems and Applications
  • CTRQ 2018, The Eleventh International Conference on Communication Theory, Reliability, and Quality of Service
  • COCORA 2018, The Eighth International Conference on Advances in Cognitive Radio
  • ALLDATA 2018, The Fourth International Conference on Big Data, Small Data, Linked Data and Open Data
    • KESA 2018, The International Workshop on Knowledge Extraction and Semantic Annotation
  • SOFTENG 2018, The Fourth International Conference on Advances and Trends in Software Engineering

InfoSys 2018
May 20 - 24, 2018 - Nice, France

  • ICNS 2018, The Fourteenth International Conference on Networking and Services
  • ICAS 2018, The Fourteenth International Conference on Autonomic and Autonomous Systems
  • ENERGY 2018, The Eighth International Conference on Smart Grids, Green Communications and IT Energy-aware Technologies
  • WEB 2018, The Sixth International Conference on Building and Exploring Web Based Environments
  • DBKDA 2018, The Tenth International Conference on Advances in Databases, Knowledge, and Data Applications
    • GraphSM 2018, The Fifth International Workshop on Large-scale Graph Analysis, Management and Applications
  • SIGNAL 2018, The Third International Conference on Advances in Signal, Image and Video Processing

BioSciencesWorld 2018
May 20 - 24, 2018 - Nice, France

  • BIOTECHNO 2018, The Tenth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies
  • BIONATURE 2018, The Ninth International Conference on Bioenvironment, Biodiversity and Renewable Energies

DataSys 2018
July 22 - 26, 2018- Barcelona, Spain

  • AICT 2018, The Fourteenth Advanced International Conference on Telecommunications
  • ICIW 2018, The Thirteenth International Conference on Internet and Web Applications and Services
  • ICIMP 2018, The Thirteenth International Conference on Internet Monitoring and Protection
  • SMART 2018, The Seventh International Conference on Smart Cities, Systems, Devices and Technologies
  • IMMM 2018, The Eighth International Conference on Advances in Information Mining and Management
  • INFOCOMP 2018, The Eighth International Conference on Advanced Communications and Computation
    • MODOPT 2018, The International Symposium on Modeling and Optimization
  • MOBILITY 2018, The Eighth International Conference on Mobile Services, Resources, and Users
  • SPWID 2018, The Fourth International Conference on Smart Portable, Wearable, Implantable and Disability-oriented Devices and Systems
  • ACCSE 2018, The Third International Conference on Advances in Computation, Communications and Services

(to be completed)

 


ThinkMind // International Journal On Advances in Software, volume 9, numbers 3 and 4, 2016 // View article soft_v9_n34_2016_2


Automatic KDD Data Preparation Using Parallelism

Authors:
Youssef Hmamouche
Christian Ernst
Alain Casali

Keywords: Data Mining; Data Preparation; Outliers detection and cleaning; Discretization Methods, Task parallelization

Abstract:
We present an original framework for automatic data preparation, applicable in most Knowledge Discovery and Data Mining systems. It is based on the study of some statistical features of the target database samples. For each attribute of the database used, we automatically propose an optimized approach allowing to ($i$) detect and eliminate outliers, and ($ii$) to identify the most appropriate discretization method. Concerning the former, we show that the detection of an outlier depends on if data distribution is normal or not. When attempting to discern the appropriated discretization method, what is important is the shape followed by the density function of its distribution law. For this reason, we propose an automatic choice for finding the optimized discretization method, based on a multi-criteria (Entropy, Variance, Stability) evaluation. Most of the associated processings are performed in parallel, using the capabilities of multicore computers. Conducted experiments validate our approach, both on rule detection and on time series prediction. In particulary, we show that the same discretization method is not the best when applied to all the attributes of a specific database.

Pages: 167 to 178

Copyright: Copyright (c) to authors, 2016. Used with permission.

Publication date: December 31, 2016

Published in: journal

ISSN: 1942-2628

SERVICES CONTACT
2010 - 2017 © ThinkMind. All rights reserved.
Read Terms of Service and Privacy Policy.