Date
|
Topics
|
Notes
|
Who
|
Readings
|
23 Aug
|
Basic inverted indexes:
Boolean query processing
|
[powerpoint]
[pdf]
|
PR
|
MG Ch. 3.2; MIR Ch. 8.2
Shakespeare plays
WestLaw
|
23 Aug
|
Finish basic indexing
Query processing – more tricks
Proximity/phrase queries
|
[powerpoint]
[pdf]
|
PR
|
MG Ch. 4.0-4.3, 4.5; MIR Ch. 3
Porter's
stemmer
More Porter from the
author
Lovins
stemmer
“Fast Phrase Querying with Combined Indexes”, from http://www.seg.rmit.edu.au/research/research.php?author=4
|
24 Aug
|
Postings pointer storage
Dictionary storage
Compression
Wild-card queries
|
[powerpoint]
[pdf]
|
PR
|
MG 3.3, 3.4, 4.2
|
24 Aug
|
Query expansion
Index construction
|
[powerpoint]
[pdf]
|
PR
|
MG 5
|
25 Aug
|
Parametric and field searches
Scoring documents: zone weighting
tf-df
and vector spaces
|
[powerpoint]
[pdf]
|
PR
|
MG Ch 4.4
New
Retrieval Approaches Using SMART: TREC 4
Gerard Salton and Chris Buckley. Improving Retrieval Performance by Relevance
Feedback. Journal of the American Society for Information Science,
41(4):288-297, 1990.
|
25 Aug
|
Vector space scoring
Nearest neighbors and approximations
|
[powerpoint]
[pdf]
|
PR
|
MG Ch. 4.4-4.6; MIR 2.5, 2.7.2; FSNLP 15.4
Random projection theorem
Faster random projection
http://lsi.argreenhouse.com/lsi/LSIpapers.html
http://lsa.colorado.edu/
http://www.cs.utk.edu/~lsi/
|
26 Aug
|
Evaluating a search engine
Precision and recall
|
[powerpoint]
[pdf]
|
PR
|
MG 4.5
|
26 Aug
|
Web
search
Link-based
ranking in web search engines I
|
[powerpoint]
[pdf]
|
PR
|
MIR
Ch. 13
Bibliography from Bharat/Broder/Hawking/Raghavan Tutorial at ACM SIGIR 2002 [pdf
| html]
Anatomy of a
large-scale hypertextual web search engine
|
26 Aug
|
Afternoon session: crawling and course
project introduction
|
|
MS+PR
|
Tools
|
27 Aug
|
Link-based
ranking in web search engines II
|
[powerpoint]
[pdf]
|
PR
|
FOA Ch. 6.1
Authoritative
sources in a hyperlinked environment
Hypersearching
the Web
Dubhashi
resource collection covering recent topics
The Intelligent Surfer:
Probabilistic Combination of Link and Content Information in PageRank
Topic-Sensitive PageRank
The Structure of
Information Networks (CS 685 at Cornell)
Stable algorithms for
link analysis
|
27 Aug
|
Behavior-based
ranking; crawling; duplicate detection; search engine infrastructure.
|
[powerpoint]
[pdf]
|
PR
|
FOA Ch. 6.2
Supplemental notes on min-wise hashing [ppt
| pdf].
|
30 Aug
|
XML RETRIEVAL
|
[powerpoint]
|
HS
|
XML
tutorial
XML
full text requirements
Other approaches:
XRank,
Result
Ranking for Structured Queries against XML Documents
XML
classification
|
30 Aug
|
CLUSTERING 1.
Introduction to the problem. Agglomerative and
k-means clustering.
Clustering versus classification.
|
[powerpoint]
|
HS
|
Scatter/Gather
Data
Clustering Review
Single-Link
and Complete-Link Clustering
|
31 Aug
|
CLUSTERING 2.
Clustering terms using
documents, labelling clusters,
evaluating clustering, link-based clustering,
trawling |
[powerpoint]
|
HS
|
FSNLP Ch. 14
Mining
Association Rules Between Sets of Items in Large
Databases
Clustering
Hypertext with Applications to Web Searching
Trawling
Emerging Cyber-communities Automatically
Projections
for Efficient Document Clustering
|
31 Aug
|
LATENT SEMANTIC INDEXING, USER INTERFACES.
Browsing and Visualization models.
Evaluation of IR interfaces.
|
[powerpoint]
|
HS
|
MIR Ch. 10.0-10.7,
FOA Ch. 4.3, MIR Ch. 10.8-10.10
Probabilistic
Latent Semantic Analysis
Latent
semantic indexing
Variations
in Relevance Judgments and the Measurement of Retrieval
Effectiveness
Reexamining
the Cluster Hypothesis: Scatter/Gather on Retrieval
Results
Cat-a-Cone:
An Interactive Interface for Specifying Searches and
Viewing Retrieval Results using a Large Category
Hierarchy
A
Case for Interaction: A Study of Interactive Information
Retrieval Behavior and Effectiveness
OLIVE: On-line Library
of Information Visualization Environments
Overview
of the Third REtrieval Conference (TREC-3)
Overview
of the Fourth Text REtrieval Conference (TREC-4)
TREC-6
Interactive Track Report
|
1 Sep
|
CLASSIFICATION 1.
Naive Bayes methods
|
[powerpoint]
|
HS
|
Machine
Learning in Automated Text Categorization
A
Comparison of Event Models for Naive Bayes Text
Classification
Tom Mitchell. Machine Learning. McGraw-Hill, 1997.
A
Re-examination of Text Categorization Methods
|
1 Sep
|
CLASSIFICATION
2. Evaluation, vector space classification, k
nearest neighbors, decision trees
|
[powerpoint]
|
HS
|
FSNLP Ch. 16
Evaluating
and Optimizing Autonomous Text Classification Systems
Dumais,
Platt, Heckerman, and Sahami. Inductive learning
algorithms and representations for text categorization.
CIKM 1998.
Trevor Hastie, Robert Tibshirani, Jerome Friedman.
Elements of Statistical Learning: Data Mining, Inference,
and Prediction. Springer-Verlag, New York, 2001.
Reuters
dataset
A
Comparative Study on Feature Selection in Text
Categorization
|
2 Sep
|
CLASSIFICATION 3.
Logistic regression, support vector machines
|
[powerpoint]
|
HS
|
Support Vector Machine Tutorial
Dumais.
Using SVMs for Text Categorization. IEEE Intelligent Systems
13(4) Jul-Aug 1998.
Text
Categorization Based on Regularized Linear Classification
Methods
Why
the logistic often is a good estimator of class probability.
Tutorial.
|
2 Sep
|
INFORMATION
EXTRACTION AND MINING. Rapier,
hidden markov models |
[powerpoint]
|
HS
|
Fast
Effective Rule Induction (Cohen 1995) Berkeley HMM
Tutorial
Information
Extraction Using Hidden Markov Models
Learning
Hidden Markov Model Structure for Information
Extraction
Information
Extraction with HMM Structures Learned by Stochastic
Optimization
HMM
parameter estimation
Introduction
to Information Extraction Technology, IJCAI 1999
Learning
Information Extraction Rules for Semi-Structured and
Free Text
|
3 Sep (1st slot)
|
BIOINFORMATICS. Special constraints in
bioinformatics, combining textual and non-textual data
|
[pdf ]
|
HS
|
Gene Ontology
Jeff Chang's
BioNLP server
Biological literature improves homology search
|
|
3 Sep (2nd slot)
|
Compression techniques for the Web Graph |
[pdf]
|
PB
|
|
3 Sep
|
Presentation of Projects |
|
HS+PB+MS
|
|