Paperlandiaedición DICOM

Búsqueda sólo por tema

Concept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques

  • The problems of information overload and vocabulary differences have become more pressing with the emergence of the increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search or hypertext browsing. Keyword search often results in low precision, poor recall, and slow response time due to the limitations of indexing and communication methods, controlled language based interfaces, and the inability of searchers themselves to articulate their needs fully. Hypertext browsing, on the other hand, allows users to explore only a very small portion of a large Internet information space. A large information space can also potentially confuse and disorient its user and it can cause the user to spend a great deal of time while learning nothing specific. This research aims to provide concept-based categorization and search capabilities for Internet WWW servers based on selected machine learning and parallel computing techniques. Our proposed approach, which is grounded on automatic textual analysis of Internet documents, attempts to address the Internet search problem by first categorizing the content of Internet documents and subsequently providing semantic search capabilities based on a concept space approach. As a first step, we propose a multi-layered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize the Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases. After individual subject categories have been created, we propose to generate domain-specific concept spaces for each subject category. The concept spaces can then be used to support concept-based information retrieval, a significant improvement over the existing keyword searching and hypertext browsing options for Internet resource discovery. As Internet information space continues to grow at the present pace, we believe this research would shed light on potentially robust and scalable solutions to the increasingly complex and urgent information access and sharing problems that are certain to emerge in the future Internet society.
  • Chen, Hsinchun
  • Schatz, Bruce R.
  • Lin, Chienting
  • 1995-01-01
  • Conference Poster