31 results for Holmes, Geoffrey, Working or discussion paper

  • Predicting apple bruising relationships using machine learning

    Holmes, Geoffrey; Cunningham, Sally Jo; Dela Rue, B. T.; Bollen, A. F. (1998-04)

    Working or discussion paper
    University of Waikato

    Many models have been used to describe the influence of internal or external factors on apple bruising. Few of these have addressed the application of derived relationships to the evaluation of commercial operations. From an industry perspective, a model must enable fruit to be rejected on the basis of a commercially significant bruise and must also accurately quantify the effects of various combinations of input features (such as cultivar, maturity, size, and so on) on bruise prediction. Input features must in turn have characteristics which are measurable commercially; for example, the measure of force should be impact energy rather than energy absorbed. Further, as the commercial criteria for acceptable damage levels change, the model should be versatile enough to regenerate new bruise thresholds from existing data. Machine learning is a burgeoning technology with a vast range of potential applications particularly in agriculture where large amounts of data can be readily collected [1]. The main advantage of using a machine learning method in an application is that the models built for prediction can be viewed and understood by the owner of the data who is in a position to determine the usefulness of the model, an essential component in a commercial environment.

    View record details
  • Racing committees for large datasets.

    Frank, Eibe; Holmes, Geoffrey; Kirkby, Richard Brendon; Hall, Mark A. (2002-06-01)

    Working or discussion paper
    University of Waikato

    This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers using a standard boosting algorithm. It allows the processing of large datasets even if the underlying base learning algorithm cannot efficiently do so. The basic idea is to split incoming data into chunks and build a committee based on classifiers build from these individual chunks [3]. Our method extends earlier work in two ways: (a) the best chunk size is chosen automatically by racing committees corresponding to different chunk sizes, and (b) the committees are pruned adaptively to keep the size of each individual committee as small as possible without negatively affecting accuracy. This paper shows that choosing an appropriate chunk size automatically is important because the accuracy of the resulting committee can vary significantly with the chunk size. It also shows that pruning is crucial to make the method practical for large datasets in terms of running time and memory requirements. Surprisingly, the results demonstrate that pruning can also improve accuracy.

    View record details
  • Navigating the virtual library: a 3D browsing interface for information retrieval

    Rogers, Bill; Cunningham, Sally Jo; Holmes, Geoffrey (1994-07)

    Working or discussion paper
    University of Waikato

    An interface is described for graphically navigating a large collection of documents, as in a library. Its design is based on the metaphor of traversing a landscape. Documents are depicted as buildings, clustered to form 'towns'. A network of 'roads' connects these towns according to the classification hierarchy of the document set. A three-dimensional scene rendering technique allows the user to view this landscape from different perspectives, and at different levels of detail. At one level, the appearance of the buildings provides information like document size and age, at a glance. At higher levels, we provide the user with a visualisation of the structure and extent of the document set that is impossible with a traditional 'shelf' presentation. At all levels, a sense of physical context is maintained, encouraging and supporting browsing.

    View record details
  • Mining data streams using option trees

    Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon (2003-09)

    Working or discussion paper
    University of Waikato

    The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over time within these constraints. Additionally, the model must be able to be used for data mining at any point in time. This paper describes a data stream classification algorithm using an ensemble of option trees. The ensemble of trees is induced by boosting and iteratively combined into a single interpretable model. The algorithm is evaluated using benchmark datasets for accuracy against state-of-the-art algorithms that make use of the entire dataset.

    View record details
  • The LRU*WWW proxy cache document replacement algorithm

    Chang, Chung-yi; McGregor, Anthony James; Holmes, Geoffrey (1999-06)

    Working or discussion paper
    University of Waikato

    Obtaining good performance from WWW proxy caches is critically dependent on the document replacement policy used by the proxy. This paper validates the work of other authors by reproducing their studies of proxy cache document replacement algorithms. From this basis a cross-trace study is mounted. This demonstrates that the performance of most document replacement algorithms is dependent on the type of workload that they are presented with. Finally we propose a new algorithm, LRU*, that consistently performs well across all our traces.

    View record details
  • Weka: Practical machine learning tools and techniques with Java implementations

    Witten, Ian H.; Frank, Eibe; Trigg, Leonard E.; Hall, Mark A.; Holmes, Geoffrey; Cunningham, Sally Jo (1999-08)

    Working or discussion paper
    University of Waikato

    The Waikato Environment for Knowledge Analysis (Weka) is a comprehensive suite of Java class libraries that implement many state-of-the-art machine learning and data mining algorithms. Weka is freely available on the World-Wide Web and accompanies a new text on data mining [1] which documents and fully explains all the algorithms it contains. Applications written using the Weka class libraries can be run on any computer with a Web browsing capability; this allows users to apply machine learning techniques to their own data regardless of computer platform.

    View record details
  • Discovering inter-attribute relationships

    Holmes, Geoffrey (1997-04)

    Working or discussion paper
    University of Waikato

    It is important to discover relationships between attributes being used to predict a class attribute in supervised learning situations for two reasons. First, any such relationship will be potentially interesting to the provider of a dataset in its own right. Second, it would simplify a learning algorithm’s search space, and the related irrelevant feature and subset selection problem, if the relationships were removed from datasets ahead of learning. An algorithm to discover such relationships is presented in this paper. The algorithm is described and a surprising number of inter-attribute relationships are discovered in datasets from the University of California at Irvine (UCI) repository.

    View record details
  • Using model trees for classification

    Frank, Eibe; Wang, Yong; Inglis, Stuart J.; Holmes, Geoffrey; Witten, Ian H. (1997-04)

    Working or discussion paper
    University of Waikato

    Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be applied to classification problems by employing a standard method of transforming a classification problem into a problem of function approximation. Surprisingly, using this simple transformation the model tree inducer M5’, based on Quinlan’s M5, generates more accurate classifiers than the state-of-the-art decision tree learner C5.0, particularly when most of the attributes are numeric.

    View record details
  • Feature selection via the discovery of simple classification rules

    Holmes, Geoffrey; Nevill-Manning, Craig G. (1995-04)

    Working or discussion paper
    University of Waikato

    It has been our experience that in order to obtain useful results using supervised learning of real-world datasets it is necessary to perform feature subset selection and to perform many experiments using computed aggregates from the most relevant features. It is, therefore, important to look for selection algorithms that work quickly and accurately so that these experiments can be performed in a reasonable length of time, preferably interactively. This paper suggests a method to achieve this using a very simple algorithm that gives good performance across different supervised learning schemes and when compared to one of the most common methods for feature subset selection.

    View record details
  • Generating rule sets from model trees

    Holmes, Geoffrey; Hall, Mark A.; Frank, Eibe (1999-03)

    Working or discussion paper
    University of Waikato

    Knowledge discovered in a database must be represented in a form that is easy to understand. Small, easy to interpret nuggets of knowledge from data are one requirement and the ability to induce them from a variety of data sources is a second. The literature is abound with classification algorithms, and in recent years with algorithms for time sequence analysis, but relatively little has been published on extracting meaningful information from problems involving continuous classes (regression). Model trees-decision trees with linear models at the leaf nodes-have recently emerged as an accurate method for numeric prediction that produces understandable models. However, it is well known that decision lists-ordered sets of If-Then rules-have the potential to be more compact and therefore more understandable than their tree counterparts.

    View record details
  • A diagnostic tool for tree based supervised classification learning algorithms

    Holmes, Geoffrey; Trigg, Leonard E. (1999-03)

    Working or discussion paper
    University of Waikato

    The process of developing applications of machine learning and data mining that employ supervised classification algorithms includes the important step of knowledge verification. Interpretable output is presented to a user so that they can verify that the knowledge contained in the output makes sense for the given application. As the development of an application is an iterative process it is quite likely that a user would wish to compare models constructed at various times or stages. One crucial stage where comparison of models is important is when the accuracy of a model is being estimated, typically using some form of cross-validation. This stage is used to establish an estimate of how well a model will perform on unseen data. This is vital information to present to a user, but it is also important to show the degree of variation between models obtained from the entire dataset and models obtained during cross-validation. In this way it can be verified that the cross-validation models are at least structurally aligned with the model garnered from the entire dataset. This paper presents a diagnostic tool for the comparison of tree-based supervised classification models. The method is adapted from work on approximate tree matching and applied to decision trees. The tool is described together with experimental results on standard datasets.

    View record details
  • Subset selection using rough numeric dependency

    Smith, Tony C.; Holmes, Geoffrey (1995-04)

    Working or discussion paper
    University of Waikato

    In this paper we describe a novel method for performing feature subset selection for supervised learning tasks based on a refined notion of feature relevance. We define relevance as others see it and outline our refinement of this concept. We then describe how we use this new definition in an algorithm to perform subset selection, and finally, we show some preliminary results of using this approach with two quite different supervised learning schemes.

    View record details
  • Machine learning in practice: experience with agricultural databases

    Garner, Stephen R.; Cunningham, Sally Jo; Holmes, Geoffrey; Nevill-Manning, Craig G.; Witten, Ian H. (1995-05)

    Working or discussion paper
    University of Waikato

    The Waikato Environment for Knowledge Analysis (weka) is a New Zealand government-sponsored initiative to investigate the application of machine learning to economically important problems in the agricultural industries. The overall goals are to create a workbench for machine learning, determine the factors that contribute towards its successful application in the agricultural industries, and develop new methods of machine learning and ways of assessing their effectiveness. The project began in 1993 and is currently working towards the fulfilment of three objectives: to design and implement the workbench, to provide case studies of applications of machine learning techniques to problems in agriculture, and to develop a methodology for evaluating generalisations in terms of their entropy. These three objectives are by no means independent. For example, the design of the weka workbench has been inspired by the demands placed on it by the case studies, and has also benefited from our work on evaluating the outcomes of applying a technique to data. Our experience throughout the development of the project is that the successful application of machine learning involves much more than merely executing a learning algorithm on some data. In this paper we present the process model that underpins our work over the past two years for the development of applications in agriculture; the software we have developed around our workbench of machine learning schemes to support this model; and the outcomes and problems we have encountered in developing applications.

    View record details
  • The development of Holte's 1R Classifier

    Nevill-Manning, Craig G.; Holmes, Geoffrey; Witten, Ian H. (1995-06)

    Working or discussion paper
    University of Waikato

    The 1R procedure for machine learning is a very simple one that proves surprisingly effective on the standard datasets commonly used for evaluation. This paper describes the method and discusses two areas that can be improved: the way that intervals are formed when discretizing continuously-valued attributes, and the way that missing values are treated. Then we show how the algorithm can be extended to avoid a problem endemic to most practical machine learning algorithms—their frequent dismissal of an attribute as irrelevant when in fact it is highly relevant when combined with other attributes.

    View record details
  • Writing anxiety in computer science students

    Cunningham, Sally Jo; Holmes, Geoffrey (1995-08)

    Working or discussion paper
    University of Waikato

    Effective written communication skills are recognized as essential for computing professions, but are notoriously difficult to impart to our students. One problem in teaching computing students to write may be their attitudes toward writing; anecdotally, computing students are (often justifiably) lacking in confidence about their writing skills, and avoid writing when possible. This paper explores the degree of writing anxiety/apprehension in computing majors through the administration of a standard survey instrument, the Daly and Miller Writing Apprehension Test.

    View record details
  • WEKA: a machine learning workbench

    Holmes, Geoffrey; Donkin, Andrew; Witten, Ian H. (1994-07)

    Working or discussion paper
    University of Waikato

    Weka is a workbench for machine learning that is intended to aid in the application of machine learning techniques to a variety of real-world problems, in particular, those arising from agricultural and horticultural domains. Unlike other machine learning projects, the emphasis is on providing a working environment for the domain specialist rather than the machine learning expert. Lessons learned include the necessity of providing a wealth of interactive tools for data manipulation, result visualization, database linkage, and cross-validation and comparison of rule sets, to complement the basic machine learning tools.

    View record details
  • Practical machine learning and its application to problems in agriculture

    Witten, Ian H.; Holmes, Geoffrey; McQueen, Robert J.; Smith, Lloyd A.; Cunningham, Sally Jo (1993)

    Working or discussion paper
    University of Waikato

    One of the most exciting and potentially far-reaching developments in contemporary computer science is the invention and application of methods of machine learning. These have evolved from simple adaptive parameter-estimation techniques to ways of (a) inducing classification rules from examples, (b) using prior knowledge to guide the interpretation of new examples, (c) using this interpretation to sharpen and refine the domain knowledge, and (d) storing and indexing example cases in ways that highlight their similarities and differences. Such techniques have been applied in domains ranging from the diagnosis of plant disease to the interpretation of medical test date. This paper reviews selected methods of machine learning with an emphasis on practical applications, and suggests how they might be used to address some important problems in the agriculture industries.

    View record details
  • Drawing diagrams in TEX documents

    Rogers, Bill; Holmes, Geoffrey (1992)

    Working or discussion paper
    University of Waikato

    The typesetting language TEX [Knuth (1984)] is now available on a range of computers from mainframes to micros. It has an unequalled ability to typeset mathematical text, with its many formatting features and fonts. TEX has facilities for drawing diagrams using the packages tpic and PiCTEX. This paper describes a TEX preprocessor written in Pascal which allows a programmer to embed diagrams in TEX documents. These diagrams may involve straight or curved lines and labelling text. The package is provided for people who either do not have access to tpic or PiCTEX or who prefer to program in Pascal.

    View record details
  • Natural language processing in speech understanding systems

    Holmes, Geoffrey (1992)

    Working or discussion paper
    University of Waikato

    Speech understanding systems (SUS's) came of age in late 1971 as a result of a five year development programme instigated by the Information Processing Technology Office of the Advanced Research Projects Agency (ARPA) of the Department of Defense in the United States. The aim of the programme was to research and develop practical man-machine communication systems. It has been argued since, that the main contribution of this project was not in the development of speech science, but in the development of artificial intelligence. That debate is beyond the scope of this paper, though no one would question the fact that the field to benefit most within artificial intelligence as a result of this programme is natural language understanding. More recent projects of a similar nature, such as projects in the United Kingdom's ALVEY programme and Europe's ESPRIT programme have added further developments to this important field. This paper presents a review of some of the natural language processing techniques used within speech understanding systems. In particular, techniques for handling syntactic, semantic and pragmatic information are discussed. They are integrated into SUS's as knowledge sources. The most common application of these systems is to provide an interface to a database. The system has to perform a dialogue with a user who is generally unknown to the system. Typical examples are train and aeroplane timetable enquiry systems, travel management systems and document retrieval systems.

    View record details
  • Correcting English text using PPM models

    Teahan, W.J.; Inglis, Stuart J.; Cleary, John G.; Holmes, Geoffrey (1997-11)

    Working or discussion paper
    University of Waikato

    An essential component of many applications in natural language processing is a language modeler able to correct errors in the text being processed. For optical character recognition (OCR), poor scanning quality or extraneous pixels in the image may cause one or more characters to be mis-recognized; while for spelling correction, two characters may be transposed, or a character may be inadvertently inserted or missed out. This paper describes a method for correcting English text using a PPM model. A method that segments words in English text is introduced and is shown to be a significant improvement over previously used methods. A similar technique is also applied as a post-processing stage after pages have been recognized by a state-of-the-art commercial OCR system. We show that the accuracy of the OCR system can be increased from 95.9% to 96.6%, a decrease of about 10 errors per page.

    View record details