5,540 results for Conference item

  • Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

    Frank, Eibe; Hall, Mark A. (2008)

    Conference item
    University of Waikato

    The much-publicized Netflix competition has put the spotlight on the application domain of collaborative filtering and has sparked interest in machine learning algorithms that can be applied to this sort of problem. The demanding nature of the Netflix data has lead to some interesting and ingenious modifications to standard learning methods in the name of efficiency and speed. There are three basic methods that have been applied in most approaches to the Netflix problem so far: stand-alone neighborhood-based methods, latent factor models based on singular-value decomposition, and ensembles consisting of variations of these techniques. In this paper we investigate the application of forward stage-wise additive modeling to the Netflix problem, using two regression schemes as base learners: ensembles of weighted simple linear regressors and k-means clustering—the latter being interpreted as a tool for multi-variate regression in this context. Experimental results show that our methods produce competitive results.

    View record details
  • Combining Naive Bayes and Decision Tables

    Hall, Mark A.; Frank, Eibe (2008)

    Conference item
    University of Waikato

    We investigate a simple semi-naive Bayesian ranking method that combine naive Bayes with induction of decision tables. Naive Bayes and decision tables can both be trained efficientyly, and the same holds true for the combined semi-naive model. We show that the resulting ranker, compared to either component technique, frequently significantly increases AUC. For some datasets it significantly improves on both techniques. This is also the case when attribute selection is performed in naive Bayes and its semi-naive variant.

    View record details
  • The use of auditory feedback in call centre CHHI

    Steel, Anette; Jones, Matt; Apperley, Mark (2002)

    Conference item
    University of Waikato

    The investigations carried out to evaluate issues of the computer-human-human interaction (CHHI) found in call centre scenarios were presented. These investigations suggested some benefits in the use of auditory icons and earcons. The use of non-verbal auditory feedback to improve CHHI was discussed.

    View record details
  • Including design guidelines in the formal specification of interfaces in Z

    Bowen, Judy; Reeves, Steve (2005)

    Conference item
    University of Waikato

    For any sort of computer system, the problems of being sure you have asked for the right thing and then being sure you are implementing the right thing are important and hard problems. For systems with a graphical user interface there are the analogous additional problems of making sure that the interface allows any interaction that is required, and works in a usable way. Design guidelines are used in both the design and evaluation of user interfaces to try and ensure that the systems we build are both usable and conform to specific requirements. This paper discusses practical ways in which we can use formal methods to model guidelines for interface design and then use these as a basis for the formal proof that a specified system has the desired properties described in the guidelines.

    View record details
  • Sentiment knowledge discovery in Twitter streaming data

    Bifet, Albert; Frank, Eibe (2010)

    Conference item
    University of Waikato

    Micro-blogs are a challenging new source of information for data mining techniques. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. We briefly discuss the challenges that Twitter data streams pose, focusing on classification problems, and then consider these streams for opinion mining and sentiment analysis. To deal with streaming unbalanced classes, we propose a sliding window Kappa statistic for evaluation in time-changing data streams. Using this statistic we perform a study on Twitter data using learning algorithms for data streams.

    View record details
  • One-Class Classification by Combining Density and Class Probability Estimation

    Hempstalk, Kathryn; Frank, Eibe; Witten, Ian H. (2008)

    Conference item
    University of Waikato

    One-class classification has important applications such as outlier and novelty detection. It is commonly tackled using density estimation techniques or by adapting a standard classification algorithm to the problem of carving out a decision boundary that describes the location of the target data. In this paper we investigate a simple method for one-class classification that combines the application of a density estimator, used to form a reference distribution, with the induction of a standard model for class probability estimation. In this method, the reference distribution is used to generate artificial data that is employed to form a second, artificial class. In conjunction with the target class, this artificial class is the basis for a standard two-class learning problem. We explain how the density function of the reference distribution can be combined with the class probability estimates obtained in this way to form an adjusted estimate of the density function of the target class. Using UCI datasets, and data from a typist recognition problem, we show that the combined model, consisting of both a density estimator and a class probability estimator, can improve on using either component technique alone when used for one-class classification. We also compare the method to one-class classification using support vector machines.

    View record details
  • Revisiting multiple-instance learning via embedded instance selection

    Foulds, James Richard; Frank, Eibe (2008)

    Conference item
    University of Waikato

    Multiple-Instance Learning via Embedded Instance Selection (MILES) is a recently proposed multiple-instance (MI) classification algorithm that applies a single-instance base learner to a propositionalized version of MI data. However, the original authors consider only one single-instance base learner for the algorithm — the 1-norm SVM. We present an empirical study investigating the efficacy of alternative base learners for MILES, and compare MILES to other MI algorithms. Our results show that boosted decision stumps can in some cases provide better classification accuracy than the 1-norm SVM as a base learner for MILES. Although MILES provides competitive performance when compared to other MI learners, we identify simpler propositionalization methods that require shorter training times while retaining MILES’ strong classification performance on the datasets we tested.

    View record details
  • Optimizing the induction of alternating decision trees

    Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (2001)

    Conference item
    University of Waikato

    The alternating decision tree brings comprehensibility to the performance enhancing capabilities of boosting. A single interpretable tree is induced wherein knowledge is distributed across the nodes and multiple paths are traversed to form predictions. The complexity of the algorithm is quadratic in the number of boosting iterations and this makes it unsuitable for larger knowledge discovery in database tasks. In this paper we explore various heuristic methods for reducing this complexity while maintaining the performance characteristics of the original algorithm. In experiments using standard, artificial and knowledge discovery datasets we show that a range of heuristic methods with log linear complexity are capable of achieving similar performance to the original method. Of these methods, the random walk heuristic is seen to out-perform all others as the number of boosting iterations increases. The average case complexity of this method is linear.

    View record details
  • Large-scale attribute selection using wrappers

    Gutlein, Martin; Frank, Eibe; Hall, Mark A.; Karwath, Andreas (2009)

    Conference item
    University of Waikato

    Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed “optimal” subset size. We show that this technique reduces subset size while maintaining comparable accuracy.

    View record details
  • Evolving triggers for dynamic environments

    Trajcevski, Goce; Scheuermann, Peter; Ghica, Oliviu; Hinze, Annika; Voisard, Agnes (2006)

    Conference item
    University of Waikato

    In this work we address the problem of managing the reactive behavior in distributed environments in which data continuously changes over time, where the users may need to explicitly express how the triggers should be (self) modified. To enable this we propose the (ECA)2 – Evolving and Context-Aware Event-Condition-Action paradigm for specifying triggers that capture the desired reactive behavior in databases which manage distributed and continuously changing data. Since both the monitored event and the condition part of the trigger may be continuous in nature, we introduce the concept of metatriggers to coordinate the detection of events and the evaluation of conditions.

    View record details
  • Using formal models to design user interfaces a case study

    Bowen, Judy; Reeves, Steve (2007)

    Conference item
    University of Waikato

    The use of formal models for user interface design can provide a number of benefits. It can help to ensure consistency across designs for multiple platforms, prove properties such as reachability and completeness and, perhaps most importantly, can help incorporate the user interface design process into a larger, formally-based, software development process. Often, descriptions of such models and examples are presented in isolation from real-world practice in order to focus on particular benefits, small focused examples or the general methodology. This paper presents a case study of developing the user interface to a new software application using a particular pair of formal models, presentation models and presentation interaction models. The aim of this study was to practically apply the use of formal models to the design process of a UI for a new software application. We wanted to determine how easy it would be to integrate such models into our usual development process and to find out what the benefits, and difficulties, of using such models were. We will show how we used the formal models within a user-centred design process, discuss what effect they had on this process and explain what benefits we perceived from their use.

    View record details
  • A Generic Alerting Service for Digital Libraries

    Buchanan, George; Hinze, Annika (2005-06-07)

    Conference item
    University of Waikato

    Users of modern digital libraries (DLs) can keep themselves up-to-date by searching and browsing their favorite collections, or more conveniently by resorting to an alerting service. The alerting service notifies its clients about new or changed documents. Proprietary and mediating alerting services fail to fluidly integrate information from differing collections. This paper analyses the conceptual requirements of this much-sought after service for digital libraries. We demonstrate that the differing concepts of digital libraries and its underlying technical design has extensive influence (a) the expectations, needs and interests of users regarding an alerting service, and (b) on the technical possibilities of the implementation of the service. Our findings will show that the range of issues surrounding alerting services for digital libraries, their design and use is greater than one may anticipate. We also show that, conversely, the requirements for an alerting service have considerable impact on the concepts of DL design. Our findings should be of interest for librarians as well as system designers. We highlight and discuss the far-reaching implications for the design of, and interaction with, libraries. This paper discusses the lessons learned from building such a distributed alerting service. We present our prototype implementation as a proof-of-concept for an alerting service for open DL software.

    View record details
  • Unsupervised discretization using tree-based density estimation

    Schmidberger, Gabi; Frank, Eibe (2005)

    Conference item
    University of Waikato

    This paper presents an unsupervised discretization method that performs density estimation for univariate data. The subintervals that the discretization produces can be used as the bins of a histogram. Histograms are a very simple and broadly understood means for displaying data, and our method automatically adapts bin widths to the data. It uses the log-likelihood as the scoring function to select cut points and the cross-validated log-likelihood to select the number of intervals. We compare this method with equal-width discretization where we also select the number of bins using the cross-validated log-likelihood and with equal-frequency discretization.

    View record details
  • Logistic model trees

    Landwehr, Niels; Hall, Mark A.; Frank, Eibe (2005)

    Conference item
    University of Waikato

    Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into `model trees', i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other state-of-the-art learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.

    View record details
  • Speeding up logistic model tree induction

    Sumner, Marc; Frank, Eibe; Hall, Mark A. (2005)

    Conference item
    University of Waikato

    Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. We address this issue by using the AIC criterion [1] instead of cross-validation to prevent overfitting these models. In addition, a weight trimming heuristic is used which produces a significant speedup. We compare the training time and accuracy of the new induction process with the original one on various datasets and show that the training time often decreases while the classification accuracy diminishes only slightly.

    View record details
  • Evaluating the replicability of significance tests for comparing learning algorithms

    Bouckaert, Remco R.; Frank, Eibe (2004)

    Conference item
    University of Waikato

    Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test indicates a difference when it should not) and Type II error (how often it indicates no difference when it should). In this paper we argue that the replicability of a test is also of importance. We say that a test has low replicability if its outcome strongly depends on the particular random partitioning of the data that is used to perform it. We present empirical measures of replicability and use them to compare the performance of several popular tests in a realistic setting involving standard learning algorithms and benchmark datasets. Based on our results we give recommendations on which test to use.

    View record details
  • Domain-specific keyphrase extraction

    Frank, Eibe; Paynter, Gordon W.; Witten, Ian H.; Gutwin, Carl; Nevill-Manning, Craig G. (1999)

    Conference item
    University of Waikato

    Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive Bayes learning scheme performs comparably to the state of the art. It goes on to explain how this procedure's performance can be boosted by automatically tailoring the extraction process to the particular document collection at hand. Results on a large collection of technical reports in computer science show that the quality of the extracted keyphrases improves significantly when domain-specific information is exploited.

    View record details
  • Cooperating Services in a Mobile Tourist Information System

    Hinze, Annika; Buchanan, George (2005)

    Conference item
    University of Waikato

    Complex information systems are increasingly required to support the flexible delivery of information to mobile devices. Studies of these devices in use have demonstrated that the information displayed to the user must be limited in size, focussed in content [1] and adaptable to the user’s needs [2]. Furthermore, the presented information is often dynamic-even changing continuously. Event based communication provides strong support for selecting relevant information for dynamic information delivery.

    View record details
  • Improving photo searching interfaces for small-screen mobile computers

    Patel, Dynal; Marsden, Gary; Jones, Matt; Jones, Steve (2006)

    Conference item
    University of Waikato

    In this paper, we conduct a thorough investigation of how people search their photo collections for events (a set of photographs relating to a particular well defined event), singles (individual photographs) and properties (a set of photographs with a common theme) on PDAs. We describe a prototype system that allows us to expose many issues that must be considered when designing photo searching interfaces. We discuss each of these issues and make recommendations where applicable. Our major observation is that several different methods are used to locate photographs. In light of this, we conclude by discussing how photo searching interfaces might embody or support such an approach.

    View record details
  • Whose Diwali is it? The case of the Indian Community and Auckland City Council

    Booth, A (2013-12-09)

    Conference item
    Auckland University of Technology

    This paper interrogates the ways that governmental agendas may affect the representation and expression of cultural identity. I trace factors that have transformed the production of Diwali, in Auckland, New Zealand. In 1998, Auckland Indian Association (AIA) started a public Diwali celebration responding to the rapidly growing Indian community population and needs for collective expression and enjoyment of one of India’s most important cultural celebrations. Government support, beginning in 2002, recognized the potential political and economic benefits of cultural celebrations by launching Diwali: Festival of Lights with AIA. By 2004, Auckland Council had gained increasing control over all aspects of event production practices. By 2013, the local Indian press was reporting voices of dissent concerning Diwali’s Bollywood/Panjabi content, noting that the representation of Indian performance culture is now determined by management decisions made by the Council and their selected sponsors. Government support has become government control, transforming a community celebration into a “Major Civic Event” that executive decisions seek to align with larger tourism and economic development strategies. The altered Diwali festival management structure has disenfranchised the local community and the power of community representation. This study demonstrates how power enables as well as constrains musical performance and cultural identity.

    View record details