NexTech 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • UBICOMM 2021, The Fifteenth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies
  • ADVCOMP 2021, The Fifteenth International Conference on Advanced Engineering Computing and Applications in Sciences
  • SEMAPRO 2021, The Fifteenth International Conference on Advances in Semantic Processing
  • AMBIENT 2021, The Eleventh International Conference on Ambient Computing, Applications, Services and Technologies
  • EMERGING 2021, The Thirteenth International Conference on Emerging Networks and Systems Intelligence
  • DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics
  • GLOBAL HEALTH 2021, The Tenth International Conference on Global Health Challenges
  • CYBER 2021, The Sixth International Conference on Cyber-Technologies and Cyber-Systems

SoftNet 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • ICSEA 2021, The Sixteenth International Conference on Software Engineering Advances
  • ICSNC 2021, The Sixteenth International Conference on Systems and Networks Communications
  • CENTRIC 2021, The Fourteenth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services
  • VALID 2021, The Thirteenth International Conference on Advances in System Testing and Validation Lifecycle
  • SIMUL 2021, The Thirteenth International Conference on Advances in System Simulation
  • SOTICS 2021, The Eleventh International Conference on Social Media Technologies, Communication, and Informatics
  • INNOV 2021, The Tenth International Conference on Communications, Computation, Networks and Technologies
  • HEALTHINFO 2021, The Sixth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing

NetWare 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • SENSORCOMM 2021, The Fifteenth International Conference on Sensor Technologies and Applications
  • SENSORDEVICES 2021, The Twelfth International Conference on Sensor Device Technologies and Applications
  • SECURWARE 2021, The Fifteenth International Conference on Emerging Security Information, Systems and Technologies
  • AFIN 2021, The Thirteenth International Conference on Advances in Future Internet
  • CENICS 2021, The Fourteenth International Conference on Advances in Circuits, Electronics and Micro-electronics
  • ICQNM 2021, The Fifteenth International Conference on Quantum, Nano/Bio, and Micro Technologies
  • FASSI 2021, The Seventh International Conference on Fundamentals and Advances in Software Systems Integration
  • GREEN 2021, The Sixth International Conference on Green Communications, Computing and Technologies

TrendNews 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • CORETA 2021, Advances on Core Technologies and Applications
  • DIGITAL 2021, Advances on Societal Digital Transformation

 


ThinkMind // SMART 2017, The Sixth International Conference on Smart Cities, Systems, Devices and Technologies // View article smart_2017_5_10_44004


Restructuring Unstructured Documents

Authors:
Jacques Péré-Laperne
Nadine Couture

Keywords: Data Minning; Computer Human Interface (CHI); Portable Document Format (PDF); Knowledge Discovery in Databases (KDD); Graphic reconstruction; Pattern recognition

Abstract:
Every day, the volume of the world's digital data increases considerably. Over 75% of these data are non-structured. This paper is about restructuring graphic information contained in Portable Document Format (PDF) files and/or vector files. These documents are generally held by ‘‘Smart Factory’’ services: design offices, methods departments, new work departments and company maintenance services. To restructure these data, we propose using Knowledge Discovery in Databases (KDD) methods. Although, theoretically, the user is present during the KDD, in practice, this is not the case. This was observed by Fayard in 2003 at the KDD conference. Generally, the user is only present during the validation phase. We show why, in data restructuring, the user must be at the heart of the process and present at all stages. We can talk about (A)KDD for the Anthropocentric Knowledge Discovery in Databases .The first stage of this restructuring consists of extracting graphic and text objects contained in Portable Document Format (PDF) files to put them in a pivot data format. The second stage consists of coding this information in the form of an alphabet. The third stage consists of recreating the graphic and text components which are repeated in these files (which we shall refer to as graphemes). And the fourth stage consists either (1) of automatically identifying these graphemes based on knowledge or (2) presenting them so the user identifies and introduces them into the knowledge base. It is this entire restructuring process, which we will describe in this paper. As we highlighted, in this incremental process it is people who play the main role, assisted by computers and not the opposite.

Pages: 60 to 65

Copyright: Copyright (c) IARIA, 2017

Publication date: June 25, 2017

Published in: conference

ISSN: 2308-3727

ISBN: 978-1-61208-565-4

Location: Venice, Italy

Dates: from June 25, 2017 to June 29, 2017

SERVICES CONTACT
2010 - 2017 © ThinkMind. All rights reserved.
Read Terms of Service and Privacy Policy.