|
|
 |
by S. M. Weiss and C. V. Apte
|
 |
 |
 |
Cited references and notes
-
D. Radev, H. Jing, and M. Budzikowska, Summarization of Multiple Documents: Clustering, Sentence Extraction, and Evaluation, Proceedings, ANLP NAACL Workshop on Automatic Summarization, Seattle, WA (April 30, 2000).
-
J. Hartigan and M. Wong, A K-Means Clustering Algorithm, Applied Statistics 28, 100108 (1979).
-
A. McCallum, K. Nigam, and L. Ungar, Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching, Proceedings, 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA (August 2023, 2000).
-
A. Griffiths, H. Luckhurst, and P. Willett, Using Interdocument Similarity Information in Document Retrieval Systems, K. Sparck Jones and P. Willett, Editors, Readings in Information Retrieval, Morgan Kaufmann Publishers, San Francisco, CA (1997), pp. 365373.
-
B. Larsen and C. Aone, Fast and Effective Text Mining Using Linear-Time Document Clustering, Proceedings, 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA (August 1518, 1999), pp. 1622.
-
G. Salton and C. Buckley, Term-Weighting Approaches in Automatic Text Retrieval, K. Sparck Jones and P. Willett, Editors, Readings in Information Retrieval, Morgan Kaufmann Publishers, San Francisco, CA (1997), pp. 323328.
-
D. Cutting, D. Karger, J. Pedersen, and J. Tukey, Scatter/Gather: A Cluster-Based Approach to Browsing Large Document Collections, Proceedings, 15th ACM SIGIR International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark (June 2124, 1992).
-
E. Voorhees, Implementing Agglomerative Hierarchical Clustering Algorithms for Use in Document Retrieval, Information Processing and Management 22, 465476 (1986).
-
A stopword is a word that is ignored because it has little statistical predictive value; for example, it and we are common stopwords.
-
S. Weiss, B. White, C. Apte, and F. Damerau, Lightweight Document Matching for Help-Desk Applications, IEEE Intelligent Systems 15, No. 2, 5761 (2000).
-
Nearest neighbor refers to a standard method of measurement that compares a new vector to a stored collection of vectors and finds the one most similar.
-
O. Zamir, O. Etzioni, O. Madani, and R. Karp, Fast and Intuitive Clustering of Web Documents, Proceedings, 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA (August 1417, 1997).
|
 |
|
|