Abstract
In recent years government agencies and industrial
enterprises are using the web as the medium of publication.
Hence, a large collection of documents, images, text files and
other forms of data in structured, semi structured and
unstructured forms are available on the web. It has become
increasingly difficult to identify relevant pieces of information
since the pages are often cluttered with irrelevant content like
advertisements, copyright notices, etc surrounding the main
content. Thus, we propose a technique known as E-Mine Algorithm that mines the relevant data regions from a web page. This technique is based on three important observations about
data regions on the web.
Introduction
Extracting the regularly structured data records from
web pages is an important problem. So far, several attempts
have been made to deal with the problem. The main
disadvantage with the existing automatic approaches is their
assumption that the relevant information of a data record is
contained in a contiguous segment of HTML code, which is
not always true. Thus, we propose a more effective method to
mine the data region in a web page. The algorithm, eMine,
…………….So on ..........(download any of the following links to get complete paper presentation in word document)
No comments:
Post a Comment