background preloader


Facebook Twitter

Term-weighting approaches in automatic text retrieval. BibTeX @INPROCEEDINGS{Salton88term-weightingapproaches, author = {Gerard Salton and Christopher Buckley}, title = {Term-weighting approaches in automatic text retrieval}, booktitle = {INFORMATION PROCESSING AND MANAGEMENT}, year = {1988}, pages = {513--523}, publisher = {}} Years of Citing Articles Bookmark.

Term-weighting approaches in automatic text retrieval

Gradient Boosting Machine

Content Analysis Web Service. Michael/papers/596.pdf. TF-IDF. Dlibrary/JIPS_v05_no3_paper6.pdf. Latent semantic analysis. Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms.

Latent semantic analysis

LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per paragraph (rows represent unique words and columns represent each paragraph) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of columns while preserving the similarity structure among rows.

Words are then compared by taking the cosine of the angle between the two vectors formed by any two rows. Values close to 1 represent very similar words while values close to 0 represent very dissimilar words.[1] Overview[edit] Document Clustering in Objective-C.