Gaining Insight into User and Search Engine Behaviour by Analyzing Web Logs
(Sprache: Englisch)
Web Usage Mining, also known as Web Log Mining, is the result of user interaction with a Web server including Web logs, click streams and database transaction or the visits of search engine crawlers at a Website. Log files provide an immense source of...
Voraussichtlich lieferbar in 3 Tag(en)
versandkostenfrei
Buch (Kartoniert)
51.40 €
- Lastschrift, Kreditkarte, Paypal, Rechnung
- Kostenlose Rücksendung
Produktdetails
Produktinformationen zu „Gaining Insight into User and Search Engine Behaviour by Analyzing Web Logs “
Klappentext zu „Gaining Insight into User and Search Engine Behaviour by Analyzing Web Logs “
Web Usage Mining, also known as Web Log Mining, is the result of user interaction with a Web server including Web logs, click streams and database transaction or the visits of search engine crawlers at a Website. Log files provide an immense source of information about the behavior of users as well as search engine crawlers. Web Usage Mining concerns the usage of common browsing patterns, i.e. pages requested in sequence from Web logs. These patterns can be utilized to enhance the design and modification of a Website. Analyzing and discovering user behavior is helpful for understanding what online information users inquire and how they behave. The analyzed result can be used in intelligent online applications, refining Websites, improving search accuracy when seeking information and lead decision makers towards better decisions in changing markets, for instance by putting advertisements in ideal places. Similarly, the crawlers or spiders are accessing the Websites to index new and updated pages. These traces help to analyze the behavior of search engine crawlers.The log files are unstructured files and of huge size. These files need to be extracted and pre-processed before any data mining functionality to follow. Pre-processing is done in unique ways for each application. Two pre-processing algorithms are proposed based on indiscernibility relations in rough set theory which generates Equivalence Classes. The first algorithm generates a pre-processed file with successful user requests while the second one generates a pre-processed file for pre-fetching and caching purposes. Two algorithms are proposed to extract usage analytics. The first algorithm identifies the origin of visits, the top referring sites and the most popular keywords used by the visitor to arrive at a Website. The second algorithm extracts user agents like browsers and operating systems used by a visitor to access a Website.
In this study, clustering of users based on Entry Pages to a Website is
... mehr
done to analyze the deep linked traffic at a Website. The Top Ten Entry Pages, the traffic and the temporal information of the Top Ten Entry Pages are also studied.
... weniger
Lese-Probe zu „Gaining Insight into User and Search Engine Behaviour by Analyzing Web Logs “
Text sample:Chapter 2: Pre-processing of Web Logs and Web Usage Analytics:
Web Usage Mining needs tremendous amount of pre-processing before any data mining functionality to follow. The pre-processing will remove irrelevant records which otherwise may affect the mining results. This chapter is divided into 2 sections namely pre-processing of Web logs and Web usage analytics. Two pre-processing algorithms are proposed based on indiscernibility relations in rough set theory which generates Equivalence Classes. The first algorithm pre-processes the raw file for further identification of users and user sessions. The second algorithm pre-processes the log file and gives the pages accessed, ist frequency and total bytes transferred. Two algorithms are proposed to extract usage analytics. The first algorithm identifies the origin of user visits, top referring sites and most popular keywords used by the visitor to arrive at a Website. The second algorithm extracts browsers with ist version and operating system with ist version used by various visitors to access a Website. The browser and operating system are together known as user agents. All algorithms are tested on two different data sets and the results are displayed.
2.1: Pre-processing of Web Logs:
The need for pre-processing is explained in section 1.3. The advantages of pre-processing include the elimination of considerable amount of space needed to store irrelevant records and the precision of mining results can be improved. This Chapter deals with pre-processing of Web log files related to mine user behavior and hence all the search engine crawler requests, unsuccessful requests, other irrelevant requests containing .jpg, .mpg, .gif, .png, .txt, .wav etc. are removed. The indiscernibility relation in rough set theory is used for pre-processing [234Jose12] [240Jose12]. Table 2.1 shows various status codes of Hyper Text Transfer Protocol [27indicating response status.
2.1.1: Indiscernibility Relations in Rough
... mehr
Set Theory:
A rough set based feature selection for Web Usage Mining is used in [94Inbarani07]. The experimental result shows the importance of the Web data pre-processing and it reduces the size of the log file. Feature selection is a preprocessing step in data mining and is very effective in reducing dimensions. Feature selection process refers to choose a subset of attributes from the set of original attributes. The purpose of feature selection is to identify the significant features, eliminate the irrelevant of dispensable features to the learning task and build a good learning model. The indiscernibility relation in rough set theory is used for clustering in [95Hirano05]. The main advantage of this method is that it can be applied to proximity measures that do not satisfy the triangular inequality and very well handles relative proximity. Relative proximity is a class of proximity measures that is suitable for representing subjective similarity or dissimilarity such as the degree of likeness between people. Indiscernibility relations in rough set theory [96Pawalak02] can be used for the data cleaning of Web log files. Rough set is based on the assumption that with every object of the universe of discourse, some information is associated. Objects characterized by the same information are indiscernible (similar) in view of the available information about them. Any set of all indiscernible (similar) objects is called an elementary set and forms a basic granule of knowledge about the universe. Any union of some elementary sets is referred to as crisp (precise) set otherwise the set is rough (imprecise, vague).
Let a given pair S= (U,A) of non-empty finite sets U and A, where U is the Universe of objects and A is the set consisting of attributes. The function a: U Va , where Va is the set of values of attribute a called the domain of a. The pair S=(U,A) is called an information system. Any information system can be represented by a data t
A rough set based feature selection for Web Usage Mining is used in [94Inbarani07]. The experimental result shows the importance of the Web data pre-processing and it reduces the size of the log file. Feature selection is a preprocessing step in data mining and is very effective in reducing dimensions. Feature selection process refers to choose a subset of attributes from the set of original attributes. The purpose of feature selection is to identify the significant features, eliminate the irrelevant of dispensable features to the learning task and build a good learning model. The indiscernibility relation in rough set theory is used for clustering in [95Hirano05]. The main advantage of this method is that it can be applied to proximity measures that do not satisfy the triangular inequality and very well handles relative proximity. Relative proximity is a class of proximity measures that is suitable for representing subjective similarity or dissimilarity such as the degree of likeness between people. Indiscernibility relations in rough set theory [96Pawalak02] can be used for the data cleaning of Web log files. Rough set is based on the assumption that with every object of the universe of discourse, some information is associated. Objects characterized by the same information are indiscernible (similar) in view of the available information about them. Any set of all indiscernible (similar) objects is called an elementary set and forms a basic granule of knowledge about the universe. Any union of some elementary sets is referred to as crisp (precise) set otherwise the set is rough (imprecise, vague).
Let a given pair S= (U,A) of non-empty finite sets U and A, where U is the Universe of objects and A is the set consisting of attributes. The function a: U Va , where Va is the set of values of attribute a called the domain of a. The pair S=(U,A) is called an information system. Any information system can be represented by a data t
... weniger
Bibliographische Angaben
- Autoren: Jeeva Jose , P. Sojan Lal
- 2016, 212 Seiten, 124 Abbildungen, Maße: 15,5 x 22 cm, Kartoniert (TB), Englisch
- Verlag: Anchor Academic Publishing
- ISBN-10: 3960670877
- ISBN-13: 9783960670872
- Erscheinungsdatum: 31.10.2016
Sprache:
Englisch
Kommentar zu "Gaining Insight into User and Search Engine Behaviour by Analyzing Web Logs"
Schreiben Sie einen Kommentar zu "Gaining Insight into User and Search Engine Behaviour by Analyzing Web Logs".
Kommentar verfassen