Authors: V.C. Chavez-Demoulin, S.A. Jarvis, R. Perera, A.S.A. Roehrl, S.W. Schmiedl, M.P. Sondergaard.
Abstract: In recent years there have been a number of developments in the datamining techniques used in the analysis of terrabyte-sized logfiles resulting from Internet-based applications. The information which these datamining techniques provide allow knowledge engineers to rapidly direct business decisions. Current datamining methods however, are generally efficient only in the cases when the information obtained in the logfiles is close to the average. This means that in cases where non-standard logfiles (extreme data) are being studied, these methods provide unrealistic and erroneous results. Non-standard logfiles often have a large bearing on the analysis of web applications, the information which they provide can impact on new or even well established services.
In this paper aspects of the recent Extreme Value Theory methodology are discussed. Particular emphasis is made to its application; a unique toolkit is provided with which to describe, understand and predict the non-standard fluctuations as discovered in real-life Internet-sourced log data.