Analysing Web Traffic: A Case Study on Artificial and Genuine Advertisement-Related Behaviour

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book presents ample, richly illustrated account on results and experience from a project, dealing with the analysis of data concerning behavior patterns on the Web. The advertising on the Web is dealt with, and the ultimate issue is to assess the share of the artificial, automated activity (ads fraud), as opposed to the genuine human activity. After a comprehensive introductory part, a full-fledged report is provided from a wide range of analytic and design efforts, oriented at: the representation of the Web behavior patterns, formation and selection of telling variables, structuring of the populations of behavior patterns, including the use of clustering, classification of these patterns, and devising most effective and efficient techniques to separate the artificial from the genuine traffic. A series of important and useful conclusions is drawn, concerning both the nature of the observed phenomenon, and hence the characteristics of the respective datasets, and the appropriateness of the methodological approaches tried out and devised. Some of these observations and conclusions, both related to data and to methods employed, provide a new insight and are sometimes surprising. The book provides also a rich bibliography on the main problem approached and on the various methodologies tried out.

Author(s): Agnieszka Jastrzębska; Jan W. Owsiński; Karol Opara; Marek Gajewski; Olgierd Hryniewicz; Mariusz Kozakiewicz; Sławomir Zadrożny; Tomasz Zwierzchowski
Series: Studies in Big Data, 127
Publisher: Springer
Year: 2023

Language: English
Pages: 173
City: Cham

1 The Problem and Its Key Characteristics
1.​1 The Problem Considered:​ A General Perspective
1.​2 The Overall Structure of the Advertising Market on the Web
1.​3 Some Important Aspects of the Ad Market on the Web
1.​4 State-of-the-Art Methods for Fraudulent Click Identification
References
2 The Pragmatics of the Data Acquisition and Assessment
2.​1 The Nature of the Data Acquired
2.​2 The Construction of Variables for Analysis
2.​3 The Working of the Ad-Hoc Behavioral Tool
2.​4 Remarks Concerning MLOps and ModelOps
Reference
3 The Proper Representation:​ Patterns, Variables and Their Analysis
3.​1 Some Temporal and Spatial Characteristics
3.​2 Study of Sample-Wise Stability of Variables
3.​3 Correlational Analysis
3.​4 Principal Component Analysis
3.​5 Variable Importance According to h2o Package
3.​6 Comparison with the Existing Blacklists
3.​7 Essential Conclusions from the Study of Variables
3.​8 Some Remarks Concerning the Methodologies to Be Followed
4 Clustering Analysis
4.​1 Introduction:​ The Issues Related to the Clustering Approach
4.​2 An Exploratory Study
4.​3 The Temporal Stability of Clusters
4.​4 Application of Reverse Clustering
4.​5 An Extended Analysis with the k-Medoids Algorithm
4.​6 Conclusions and Recommendations
References
5 Building the Classifiers
5.​1 Introductory Remarks
5.​2 Establishing Experimental Methodology
5.​3 Detailed Case Study:​ Classification of Bot/​Human Traffic
5.​4 The Choice of the Best Classification Pipeline Using a Multi-Criteria Decision-Making Approach
5.​5 Conclusions and Recommendations
References
6 The Hybrid Cluster-And-Classify Approach
6.​1 The Principles
6.​2 The Hybrid Cluster-And-Classify Approach
6.​3 The Results and Their Interpretation
6.​3.​1 Data Staging
6.​3.​2 Reference Recognition Quality in the Best-Case Scenario
6.​3.​3 Results:​ Analysis and Interpretation
6.​4 A Detailed Case Study of Stability of the Hybrid Model:​ Case B
6.​5 A Detailed Case Study of Drift in the Data:​ Case B
6.​6 Conclusions
References
7 A Summary View of the Problem and Its Solution