CS3245-HW4

Project Design

Indexing

Main entry point is index.py. Helper files are InputOutput.py, Tokenizer.py.

Indexing approach TBC, need to decide on how to approach positional indexing.

Searching

TBC.

Relevance Feedback

Query Expansion

We created a thesaurus that is specific to legal-context by scraping dictionary.law.com, forming a mapping from a legal term to a set of related terms. We then performed stemming on both the key and value since we are performing stemming on the query (otherwise the query and thesaurus would not match).

Source code for scraping can be found here.

Project style and setup

Project setup

Use whatever CPython interpreter as long as you are on 3.8.10 and have NLTK (with punkt downloaded). We can use external libraries if we package them with our code, but let's try not to.

Code style

Let's keep things consistent!

snake_case for variables and functions
PascalCase for classes/objects and custom types

Type hints

This project will probably get big and complicated with multiple people working on it, so type hints [1] [2] are encouraged. Add custom types to Types.py and import to whichever files need it.

Dictionaries and nested dictionaries are common, so we have common atomic values like DocId and Term abstracted as custom types in Types.py to make dictionary types clearer. E.g., Dict[DocId, DocFreq] rather than Dict[int, int].

Since we are on Python 3.8.10, we will need from __future__ import annotations to get proper type hinting.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
CS3245-hw4-check.sh		CS3245-hw4-check.sh
Config.py		Config.py
InputOutput.py		InputOutput.py
QueryRefinement.py		QueryRefinement.py
README.md		README.md
Searcher.py		Searcher.py
Tokenizer.py		Tokenizer.py
Types.py		Types.py
index.py		index.py
search.py		search.py
stemmed_thesaurus.pickle		stemmed_thesaurus.pickle
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS3245-HW4

Project Design

Indexing

Searching

Relevance Feedback

Query Expansion

Project style and setup

Project setup

Code style

Type hints

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

SimJunYou/CS3245-HW4

Folders and files

Latest commit

History

Repository files navigation

CS3245-HW4

Project Design

Indexing

Searching

Relevance Feedback

Query Expansion

Project style and setup

Project setup

Code style

Type hints

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages