Reading Resources


NLP - A Top Pick:
  1. Technologies to Watch, (Sept. 4, 2001) PC Magazine.

One of you asked me what are the prospects of NLP? In [1] PC Magazine calls NL one of the top ten technologies of tomorrow.


NLP in Industry:
  1. IBM Research.
  2. Microsoft Research.
  3. Xerox PARC.


Introductory Papers on Statistical Methods in NLP:

  1. Krenn, B. and Samuelsson, C. (1997) The Linguist's Guide to Statistics.
  2. Abney, S. (2000) Statistical Methods.
  3. Abney, S. (1996) Statistical Methods and Linguistics.
  4. Nivre, J. (2001) On Statistical Methods in Natural Language Processing.

[1] takes a detailed look at the current statistical techniques in NLP. In [2] and [3] stochastic grammars are the main focus. [4] is a summary of current research in statistical methods in NLP. It does a good job of placing most current work in this field in perspective.


n-gram Language Models:

  1. Goodman, J. and Chen, S. (1998) An Empirical Study of Smoothing Techniques for Language Modeling.
  2. Goodman, J. (2000) The State of the Art in Language Modeling.

[1] presents a tutorial introduction to n-gram language modeling, and surveys the most widely-used smoothing algorithms for such models. The presentation slides in [2] cover the state-of-the-art in language modeling.


Search Engines:

  1. Brin, S. and Page, L. (2000) The Anatomy of a Large-Scale Hypertextual Web Search Engine.
  2. Amilay, E. (2001) What Lays in the Layout: Using anchor-paragraph arrangements to extract descriptions of Web documents.

[1] presents the technical details of Google. [2] describes a new technique for summarising the information found in Web pages in a coherent snippet. Currently, for example, in Yahoo! and the Open Directory Project, this is done by thousands of human editors.


Latent Semantic Analysis:

  1. Bellegarda, J. R., (2000) Exploiting Latent semantic information in statistical language modeling, Proceedings of IEEE, Vol. 88, No.8, August 2000.
  2. Landauer, T. K., Foltz, P. W., and Laham, D. (1998) Introduction to LSA.
  3. Graesser, A. C., Wiemer-Hastings, K., Wiemer-Hastings, P. and Kreuz, R. (1999) Autotutor : A simulation of a human tutor.

[1] gives good insight on SVD and latent semantic representation of documents with applications in speech recognition and understanding. In [2] a cognitive science perspective on LSA and its applications is presented. [3] covers an application of LSA to comprehend students' answers in an automatic tutoring system.


Hidden Markov Models:

  1. Rabiner, L. R., (1989) A tutorial on Hidden Markov Models and selected applications in speech recognition, Proceedings of IEEE, Vol. 77, No.2, February 1989.

A comprehensive coverage of HMMs along with extensive references.


Topics in Information Retrieval:

  1. Manning, C. and Schütze, H. (1999) Topics in IR.
  2. Porter, M. F. (1980) An algorithm for suffix stripping .

The presentation slides in [1] give an introduction to IR. [2] gives the technical details of the Porter Stemmer.


Statistical Machine Translation:

  1. Knight, K. (1999), A Statistical MT Tutorial Workbook.
  2. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Mercer R. L. (1993), The Mathematics of Statistical Machine Translation.

[2] gives the details for SMT. It explains the IBM Models in detail. [1] is a tutorial based on this paper.


Course Home


HOME RESEARCH TEACHING PUBLICATIONS PATENTS PERSONAL

Last Updated: April 9, 2002.