182 results for Witten, Ian H.

  • Bi-level document image compression using layout information

    Inglis, Stuart J.; Witten, Ian H. (1996)

    Conference item
    University of Waikato

    Most bi-level images stored on computers today comprise scanned text, and are stored using generic bi-level image technology based either on classical run-length coding, such as the CCITT Group 4 method, or on modern schemes such as JBIG that predict pixels from their local image context. However, image compression methods that are tailored specifically for images known to contain printed text can provide noticeably superior performance because they effectively enlarge the context to the character level, at least for those predictions for which such a context is relevant. To deal effectively with general documents that contain text and pictures, it is necessary to detect layout and structural information from the image, and employ different compression techniques for different parts of the image. The authors extend previous work in document image compression in two ways. First, we include automatic discrimination between text and non-text zones in an image. Second, the system is tested on a large real-world image corpus.

    View record details
  • Privacy preserving computation by fragmenting individual bits and distributing gates

    Will, Mark A.; Ko, Ryan K.L.; Witten, Ian H. (2016)

    Conference item
    University of Waikato

    Solutions that allow the computation of arbitrary operations over data securely in the cloud are currently impractical. The holy grail of cryptography, fully homomorphic encryption, still requires minutes to compute a single operation. In order to provide a practical solution, this paper proposes taking a different approach to the problem of securely processing data. FRagmenting Individual Bits (FRIBs), a scheme which preserves user privacy by distributing bit fragments across many locations, is presented. Privacy is maintained by each server only receiving a small portion of the actual data, and solving for the rest results in a vast number of possibilities. Functions are defined with NAND logic gates, and are computed quickly as the performance overhead is shifted from computation to network latency. This paper details our proof of concept addition algorithm which took 346ms to add two 32-bit values-paving the way towards further improvements to get computations completed under 100ms.

    View record details
  • A mobile reader for language learners

    König, Jemma; Witten, Ian H.; Wu, Shaoqun (2016)

    Conference item
    University of Waikato

    This paper describes a new approach to mobile language learning; a mobile reader that aids learners in extending the breadth of their existing vocabulary knowledge. The FLAX Reader supports L2 (second language) learners in English by building a personalized learner model of receptive vocabulary acquisition. It provides dictionary lookup for words that they struggle with, tracks a learner's reading speed, and models their vocabulary acquisition, generat-ing appropriate exercises to aid in a learner’s personal language growth.

    View record details
  • Learning collocations with FLAX apps

    Yu, Alex; Wu, Shaoqun; Witten, Ian H.; König, Jemma (2016)

    Conference item
    University of Waikato

    The rise of Mobile Assisted Language Learning has brought a new dimension and dynamic into language classes. Game-like apps have become a particularly effective way to promote self-learning to young learners outside classroom. This paper describes a system called FLAX that allows teachers to automatically generate a variety of collocation games from a con-temporary collocation database built from Wikipedia text. These games are fun to play and mimic traditional classroom activities such as Collocation Matching, Collocation Guessing, Collocation Dominoes, and Related Words. The apps can be downloaded onto Android devices from the Google Play store, and exercises are automatically updated whenever new materials are added by teachers through a web-based interface on the FLAX server. Teachers have used these games to provide supplementary material for several Massive Open Online courses (MOOC) in Law discipline.

    View record details
  • Comparing human and computational models of music prediction

    Witten, Ian H.; Manzara, Leonard C.; Conklin, Derrell (1992)

    Working or discussion paper
    University of Waikato

    The information content of each successive note in a piece of music is not an intrinsic musical property but depends on the listener's own model of a genre of music. Human listeners' models can be elicited by having them guess successive notes and assign probabilities to their guesses by gambling. Computational models can be constructed by developing a structural framework for prediction, and "training" the system by having it assimilate a corpus of sample compositions and adjust its internal probability estimates accordingly. These two modeling techniques turn out to yield remarkably similar values for the information content, or "entropy," of the Bach chorale melodies. While previous research has concentrated on the overall information content of whole pieces of music, the present study evaluates and compares the two kinds of model in fine detail. Their predictions for two particular chorale melodies are analyzed on a note-by-note basis, and the smoothed information profiles of the chorales are examined and compared. Apart from the intrinsic interest of comparing human with computational models of music, several conclusions are drawn for the improvement of computational models.

    View record details
  • Models for computer generated parody

    Smith, Tony C.; Witten, Ian H. (1993)

    Working or discussion paper
    University of Waikato

    This paper outlines two approaches to the construction of computer systems that generate prose in the style of a given author. The first involves using intuitive notions of stylistic trademarks to construct a grammar that characterizes a particular author in this case, Ernest Hemingway. The second uses statistical methods for inferring a grammar from samples of an author's work in this instance, Thomas Hardy. A brief outline of grammar induction principles is included as background material for the latter system. The relative merits of each approach are discussed, and text generated from the resulting grammars is assessed in terms of its parodic quality. Further to its esoteric interest, a discussion of parody generation as a useful technique for measuring the success of grammatical inferencing systems is included, along with suggestions for its practical application in areas of language modeling and text compression.

    View record details
  • Semantic and generative models for lossy text compression

    Witten, Ian H.; Bell, Timothy C.; Moffat, Alistair; Smith, Tony C.; Nevill-Manning, Craig G. (1992)

    Working or discussion paper
    University of Waikato

    The apparent divergence between the research paradigms of text and image compression has led us to consider the potential for applying methods developed for one domain to the other. This paper examines the idea of "lossy" text compression, which transmits an approximation to the input text rather than the text itself. In image coding, lossy techniques have proven to yield compression factors that are vastly superior to those of the best lossless schemes, and we show that this a also the case for text. Two different methods are described here, one inspired by the use of fractals in image compression. They can be combined into an extremely effective technique that provides much better compression than the present state of the art and yet preserves a reasonable degree of match between the original and received text. The major challenge for lossy text compression is identified as the reliable evaluation of the quality of this match.

    View record details
  • Language inference from function words

    Smith, Tony C.; Witten, Ian H. (1993)

    Working or discussion paper
    University of Waikato

    Language surface structures demonstrate regularities that make it possible to learn a capacity for producing an infinite number of well-formed expressions. This paper outlines a system that uncovers and characterizes regularities through principled wholesale pattern analysis of copious amounts of machine-readable text. The system uses the notion of closed-class lexemes to divide the input into phrases, and from these phrases infers lexical and syntactic information. The set of closed-class lexemes is derived from the text, and then these lexemes are clustered into functional types. Next the open-class words are categorized according to how they tend to appear in phrases and then clustered into a smaller number of open-class types. Finally these types are used to infer, and generalize, grammar rules. Statistical criteria are employed for each of these inference operations. The result is a relatively compact grammar that is guaranteed to cover every sentence in the source text that was used to form it. Closed-class inferencing compares well with current linguistic theories of syntax and offers a wide range of potential applications.

    View record details
  • Compression by induction of hierarchical grammars

    Nevill-Manning, Craig G.; Witten, Ian H.; Maulsby, David (1993)

    Working or discussion paper
    University of Waikato

    This paper describes a technique that develops models of symbol sequences in the form of small, human-readable, hierarchical grammars. The grammars are both semantically plausible and compact. The technique can induce structure from a variety of different kinds of sequence, and examples are given of models derived from English text, C source code and a file of numeric data. This paper explains the grammatical induction technique, demonstrates its application to three very different sequences, evaluates its compression performance, and concludes by briefly discussing its use as method of knowledge acquisition.

    View record details
  • Multiple viewpoint systems for music prediction

    Witten, Ian H.; Conklin, Darrell (1993)

    Working or discussion paper
    University of Waikato

    This paper examines the prediction and generation of music using a multiple viewpoint system, a collection of independent views of the musical surface each of which models a specific type of musical phenomena. Both the general style and a particular piece are modeled using dual short-term and long-term theories, and the model is created using machine learning techniques on a corpus of musical examples. The models are used for analysis and prediction, and we conjecture that highly predictive theories will also generate original, acceptable, works. Although the quality of the works generated is hard to quantify objectively, the predictive power of models can be measured by the notion of entropy, or unpredictability. Highly predictive theories will produce low-entropy estimates of a musical language. The methods developed are applied to the Bach chorale melodies. Multiple-viewpoint systems are learned from a sample of 95 chorales, estimates of entropy are produced, and a predictive theory is used to generate new, unseen pieces.

    View record details
  • Compressing computer programs

    Davies, Rod M.; Witten, Ian H. (1993)

    Working or discussion paper
    University of Waikato

    This paper describes a scheme for compressing programs written in a particular programming language—which can be any language that has a formal lexical and syntactic description—in such a way that they can be reproduced exactly. Only syntactically correct programs can be compressed. The scheme is illustrated on the Pascal language, and compression results are given for a corpus of Pascal programs; but it is by no means restricted to Pascal. In fact, we discuss how a "compressor-generator" program can be constructed that creates a compressor automatically from a formal specification of a programming language, in much the same way as a parser generator creates a syntactic parser from a formal language description.

    View record details
  • Compression-based template matching

    Inglis, Stuart J.; Witten, Ian H. (1993)

    Working or discussion paper
    University of Waikato

    Textual image compression is a method of both lossy and lossless image compression that is particularly effective for images containing repeated sub-images, notably pages of text (Mohiuddin et al., 1984; Witten et al., 1992). The process comprises three main steps: • Extracting all the characters from an image; • Building a library that contains one representative for each character class; • Compressing the image with respect to the library.

    View record details
  • Getting research students started: a tale of two courses

    Witten, Ian H.; Bell, Timothy C. (1992)

    Working or discussion paper
    University of Waikato

    As graduate programs in Computer Science grow and mature and undergraduate populations stabilize, an increasing proportion of our resources is being devoted to the training of researchers in the field. Many inefficiencies are evident in our graduate programs. These include undesirably long average times to thesis completion, students' poor work habits and general lack of professionalism, and the unnecessary duplication of having supervisors introduce their students individually to the basics of research. Solving these problems requires specifically targeted education to get students started in their graduate research and introduce them to the skills and tools needed to complete it efficiently and effectively. We have used two different approaches in our respective departments. One is a (half-) credit course on research skills; the other a one-week intensive non-credit "survival course" at the beginning of the year. The advantage of the former is the opportunity to cover material in depth and for students to practice their skills; the latter is much less demanding on students and is easier to fit into an existing graduate program.

    View record details
  • Displaying 3D images: algorithms for single image random dot stereograms

    Witten, Ian H.; Inglis, Stuart J.; Thimbleby, Harold W. (1993)

    Working or discussion paper
    University of Waikato

    This paper describes how to generate a single image which, when viewed in the appropriate way, appears to the brain as a 3D scene. The image is a stereogram composed of seemingly random dots. A new, simple and symmetric algorithm for generating such images from a solid model is given, along with the design parameters and their influence on the display. The algorithm improves on previously-described ones in several ways: it is symmetric and hence free from directional (right-to-left or left-to-right) bias, it corrects a slight distortion in the rendering of depth, it removes hidden parts of surfaces, and it also eliminates a type of artifact that we call an "echo". Random dot stereograms have one remaining problem: difficulty of initial viewing. If a computer screen rather than paper is used for output, the problem can be ameliorated by shimmering, or time-multiplexing of pixel values. We also describe a simple computational technique for determining what is present in a stereogram so that, if viewing is difficult, one can ascertain what to look for.

    View record details
  • Practical machine learning and its application to problems in agriculture

    Witten, Ian H.; Holmes, Geoffrey; McQueen, Robert J.; Smith, Lloyd A.; Cunningham, Sally Jo (1993)

    Working or discussion paper
    University of Waikato

    One of the most exciting and potentially far-reaching developments in contemporary computer science is the invention and application of methods of machine learning. These have evolved from simple adaptive parameter-estimation techniques to ways of (a) inducing classification rules from examples, (b) using prior knowledge to guide the interpretation of new examples, (c) using this interpretation to sharpen and refine the domain knowledge, and (d) storing and indexing example cases in ways that highlight their similarities and differences. Such techniques have been applied in domains ranging from the diagnosis of plant disease to the interpretation of medical test date. This paper reviews selected methods of machine learning with an emphasis on practical applications, and suggests how they might be used to address some important problems in the agriculture industries.

    View record details
  • Learning English with FLAX apps

    Yu, Alex; Witten, Ian H. (2015)

    Conference item
    University of Waikato

    The rise of Mobile Assisted Language Learning has brought a new dimension and dynamic into language classes. Game-like language learning apps have become a particularly effective way to promote self-learning outside classroom to young learners. This paper describes a system called FLAX that allows teachers to use their own material to build digital library collections that can then be used to create a variety of web and mobile based language games like Hangman, Scrambled Sentences, Split Sentences, Word Guessing, and Punctuation and Capitalization. These games can be easily downloaded on Android handheld systems such as phones and tablets, and are automatically updated whenever new materials are added by teachers through a web-based interface on the FLAX server

    View record details
  • Using Wikipedia for language learning

    Wu, Shaoqun; Witten, Ian H. (2015)

    Conference item
    University of Waikato

    Differentiating between words like look, see and watch, injury and wound, or broad and wide presents great challenges to language learners because it is the collocates of these words that reveal their different shades of meaning, rather than their dictionary definitions. This paper describes a system called FlaxCLS that overcomes the restrictions and limitations of the existing tools used for collocation learning. FlaxCLS automatically extracts useful syntactic-based word from three millions Wikipedia article and provides a simple interface through which learners seek collocations of any words, or search for combinations of multiple words. The system also retrieves semantically related words and collocations of the query term by consulting Wikipedia. FlaxCLS has been used as language support for many Masters and PhD students in a New Zealand university. Anecdotal evidence suggests that the interface it provides is easy to use and students have found it helpful in improving their written English.

    View record details
  • Second language learning in the context of MOOCs

    Wu, Shaoqun; Fitzgerald, Alannah; Witten, Ian H. (2014)

    Conference item
    University of Waikato

    Massive Open Online Courses are becoming popular educational vehicles through which universities reach out to non-traditional audiences. Many enrolees hail from other countries and cultures, and struggle to cope with the English language in which these courses are invariably offered. Moreover, most such learners have a strong desire and motivation to extend their knowledge of academic English, particularly in the specific area addressed by the course. Online courses provide a compelling opportunity for domain-specific language learning. They supply a large corpus of interesting linguistic material relevant to a particular area, including supplementary images (slides), audio and video. We contend that this corpus can be automatically analysed, enriched, and transformed into a resource that learners can browse and query in order to extend their ability to understand the language used, and help them express themselves more fluently and eloquently in that domain. To illustrate this idea, an existing online corpus-based language learning tool (FLAX) is applied to a Coursera MOOC entitled Virology 1: How Viruses Work, offered by Columbia University.

    View record details
  • Building a public digital library based on full-text retrieval

    Witten, Ian H.; Nevill-Manning, Craig G.; Cunningham, Sally Jo (1995-08)

    Working or discussion paper
    University of Waikato

    Digital libraries are expensive to create and maintain, and generally restricted to a particular corporation or group of paying subscribers. While many indexes to the World Wide Web are freely available, the quality of what is indexed is extremely uneven. The digital analog of a public library a reliable, quality, community service has yet to appear. This paper demonstrates the feasibility of a cost-effective collection of high-quality public-domain information, available free over the Internet. One obstacle to the creation of a digital library is the difficulty of providing formal cataloguing information. Without a title, author and subject database it seems hard to offer the searching facilities normally available in physical libraries. Full-text retrieval provides a way of approximating these services without a concomitant investment of resources. A second is the problem of finding a suitable corpus of material. Computer science research reports form the focus of our prototype implementation. These constitute a large body of high-quality public-domain documents. Given such a corpus, a third issue becomes the question of obtaining both plain text for indexing, and page images for readability. Typesetting formats such as PostScript provide some of the benefits of libraries scanned from paper documents such as paged-based indexing and viewing without the physical demands and error-prone nature of scanning and optical character recognition. However, until recently the difficulty of extracting text from PostScript seems to have encouraged indexing on plain-text abstracts or bibliographic information provided by authors. We have developed a new technique that overcomes the problem. This paper describes the architecture, the indexing, collection and maintenance processes, and the retrieval interface, to a prototype public digital library.

    View record details
  • Browsing in digital libraries: a phrase-based approach

    Nevill-Manning, Craig G.; Witten, Ian H.; Paynter, Gordon W. (1997-01)

    Working or discussion paper
    University of Waikato

    A key question for digital libraries is this: how should one go about becoming familiar with a digital collection, as opposed to a physical one? Digital collections generally present an appearance which is extremely opaque-a screen, typically a Web page, with no indication of what, or how much, lies beyond: whether a carefully-selected collection or a morass of worthless ephemera; whether half a dozen documents or many millions. At least physical collections occupy physical space, present a physical appearance, and exhibit tangible physical organization. When standing on the threshold of a large library one gains a sense of presence and permanence that reflects the care taken in building and maintaining the collection inside. No-one could confuse it with a dung-heap! Yet in the digital world the difference is not so palpable.

    View record details