Data mining combines advanced statistical and computational approaches to explore massive amounts of data. The ultimate aim of data mining is to discover patterns and relationships within a given pool of information.

Members of the Approximity team contributed to the following publications:

  • The WEB archives: A time-machine in your pocket!Download
    Chavez-Demoulin V.C., A.S.A. Roehrl, R.A. Roehrl and A.Weinberg
    Internet Archive Colloquim 2000, editor K. Bollacker, www.archive.org, 2000
    Taking an interdisciplinary approach, the authors discuss both technical issues of creating archives of the World Wide Web (as suggested at www.archive.org), and the possible socio- political relevance of such archives in the future. As the Internet becomes the Ever- and Everywherenet, the Web archives may become a memory of mankind, a sort of time-machine to go back into the past. The authors present the hardware and software concepts, and an initial analysis, of a highly scalable and extendable approach to archive a fully queryable copy of the ever-changing Web
  • World Wide Web Robot for Extreme Datamining with Swiss-Tx Supercomputers
    Armin Roehrl, Martin Frey, Alexander Roehrl
    Interim Report IR-99-20, International Institute for Applied Systems Analysis
    This paper discusses the software and hardware issues of designing a highly parallel robot for extreme datamining on the Internet. As a sample application, a World Wide Web server count experiment for Switzerland and Thailand is presented. Our platform of choice is the SwissTx, a supercomputer built from commodity components that runs NT and COMPAQ Tru64 Unix. Hardware and software of this machine are discussed and benchmark results presented. They show that NT is a feasible choice even under the given extreme conditions. Using statistical modelling for optimizing the search process, the inevitable bandwidth problem is reduced to some extent to a computation problem. We suggest that our approach to Web robots is a robust bet for a multitude of future Internet applications which might lead to a large-scale and cost-efficient usage of Web robots.
  • Extreme DataminingDownload
    Chavez-Demoulin V., Jarvis S.A. , Perera R., Roehrl A.S.A, Schmiedl S.W., Sondergaard M.P
    Between Data Science and Applied Data Analysis; Proceedings of the 26th Annual Conference of the Gesellschaft Für Klassifikation E.V; Springer-Verlag, 2003, pp. 387-394.
    In recent years there have been a number of developments in the datamining techniques used in the analysis of terrabyte-sized logfiles resulting from Internet-based applications. The information which these datamining techniques provide allow knowledge engineers to rapidly direct business decisions. Current datamining methods however, are generally efficient only in the cases when the information obtained in the logfiles is close to the average. This means that in cases where non-standard logfiles (extreme data) are being studied, these methods provide unrealistic and erroneous results. Non-standard logfiles often have a large bearing on the analysis of web applications, the information which they provide can impact on new or even well established services. In this paper aspects of the recent Extreme Value Theory methodology are discussed. Particular emphasis is made to its application; a unique toolkit is provided with which to describe, understand and predict the non-standard fluctuations as discovered in real-life Internet-sourced log data.
  • Unter Verdacht: Datamining mit RDownload
    Armin Roehrl, Stefan Schmiedl
    Linux-Enterprise 3/2002 (in German)
    Im ersten Teil beschreiben wir R, eine (Statistik)-Programmiersprache, die auf S basiert. Dann geben wir eine kurze Einführung in Datamining und zeigen an praktischen Beispielen, wieso R dabei sehr viel Zeit sparen kann. Da wir die mathematischen und statistischen Hintergründe hier nicht eingehend erklaeren können, verweisen wir für die Grundlagen auf die einschlägige Fachliteratur. Unser Ziel ist, dem Leser eine Vorstellung von Datamining zu vermitteln.

Datamining specialists

back to top