-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathproject-8.qmd
222 lines (149 loc) · 8.99 KB
/
project-8.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
title: "Programming Project 8"
---
```{r}
#| echo: false
#| message: false
#| warning: false
library(tidyverse)
library(readxl)
assignments <- read_excel("assessment_schedule.xlsx") %>%
mutate(formatted_date = format(due_date, "%A, %B %d, %Y"))
```
Programming Projects are to be submitted to [gradescope](https://www.gradescope.com/courses/934148).
**Due date: `r assignments %>% filter(assessment == "Programming Project 8") %>% pull(formatted_date)` at 9pm**
This programming project is an adaptation of Ben Dicken's Spell Check Programming Assignment.
You should name the file `spellcheck.py` and have a function named `correct_spelling` to write out the ouput file.
# Spell Check
Most in this class have probably used a spellchecker at least once in your life (more likely, many times). Spell checkers are built into a number of programs, including Microsoft Word, Apple Pages, gmail, and others. In this assignment, you'll be implementing a program that can suggest and fix spelling issues in text.
# Program behavior
The spell checking program should be named `spellcheck.py`. The spell check program takes as argument the name of a text file, which will contain the text that the user wants to be spell-checked. The program then writes out an output file with swapping the contents of the input file with the spelling already fixed.
Let's walk through a really simple example to demonstrate the difference in the two modes. Let's say that the file `words.txt` has the below content:
```{html}
one hudnred dollars
and fiveteen cents
```
The words hundred (hudnred) and the word fifteen (fiveteen) are spelled wrong. The contents of the file should be written out to an output file with the misspelled words replaced with correctly-spelled ones.
```{html}
one hundred dollars
and fifteen cents
```
Notice that the program goes ahead and replaces the misspelled words with correctly-spelled ones.
How was the program able to identify the words that were not spelled correctly? Read the next section to find out.
# misspellings.txt
Your program should open up and read a file names `misspellings.txt`. You can assume that this file already exists in the same directory as `spellcheck.py`. When testing on your computer, you should create your own `misspellings.txt`. This file will include information about how words are misspelled. Using this information, you can determine which words are spelled wrong in the input file. You can assume that each line of `misspellings.txt` is formatted as follows:
```{html}
incorrect_a->correctly_spelled_word_a
incorrect_b->correctly_spelled_word_b
```
In other words, the first word on a line of the file is a correctly spelled word, then an arrow, then the correctly-spelled version of the word. For instance:
```{html}
hudnred->hundred
hundrid->hundred
fifeteen->fifteen
fiveteen->fifteen
fiften->fifteen
```
This file contains two possible misspellings of the word `"hundred"` and three possible misspellings of the word `"fifteen"`. Your program should handle a misspellings file of any size.
Your program should open and read the `misspellings.txt` file into the program and store the mapping between misspelled and correctly-spelled words in a dictionary. The keys of the dictionary should be the misspelled version of the word, and the value the keys map to should be the correctly-spelled version of that word. Given the example content of `misspelling.txt` shown above, the misspellings dictionary should have the keys and values shown below:
```{python}
#| eval: false
#| echo: true
misspellings = {
'hudnred' : 'hundred',
'hundrid' : 'hundred',
'fifeteen' : 'fifteen',
'fiveteen' : 'fifteen',
'fiften' : 'fifteen' }
```
After you have read in the file and constructed this dictionary, you can use it to determine the correct spelling of an incorrectly-spelled word. How would you do this?
After you've opened up the input file, loop through each line of it. You can split each line into individual words. For each word on the line, you can check if that word is one of the keys in the misspellings dictionary. If it is, you can assume it is incorrectly spelled. The correct spelling of that word is the value associated with the key, so you can grab it from the dictionary and either replace the original word (if in replace mode) or use it as a suggestion (if in suggest mode).
# Preserving Capitalization
The words in the `misspellings.txt` file will be all in lower case. However, your program should also be able to replace or suggest fixes for misspelled capitalized words. For instance, it should be able to change `Hudnred` to `Hundred` in replace mode.
Before you try to look up a word in the misspellings dictionary, determine if it is or is not capitalized (meaning that the first letter is upper-case). Then, you can convert it to lower case. If the word does need replacing, then you should make sure to capitalize the correctly spelled word if the original word was also capitalized.
You might find the string methods `.lower()`,`.isupper()`, and `.capitalize()` useful.
# Ignoring Punctuation
You also should make the spell checker ignore punctuation. In particular, `spellcheck.py` should be able to handle a trailing `,` `.` `;` `?`, or `!`. For instance, it should be able to change `hudnred?` to `hundred?` in replace mode. How can this be accomplished?
Before looking up a word in the misspellings dictionary, check if the last character is one of the five types of punctuation that you should look for (`,` or `.` or `?` or `!` or `;`). If it is, create a variable to store that punctuation, and then trim it off of the end of the word. Then, proceed with checking the spelling. After you have determined if it is spelled correctly or not, you can put the punctuation at the end of the word.
# Development Strategy
## (1) Reading in misspellings.txt
Create a function that will be responsible for opening up the misspellings file and reading in the information. The function can assume that the misspellings.txt file exists. The function should open the file, read the lines, and populate the misspellings dictionary, as described previously in this specification.
## (2) Replacing words
Next, work reading the input file and replacing misspelled words. At this point, don't yet worry about capitalized words, or punctuation. Iterate through the contents of the input file, line-by-line and word-by-word. When a correctly-spelled word is encountered, print it out with a trailing space. When an incorrectly-spelled word is encountered, print out the correctly-spelled version instead.
## (3) Handle Punctuation / Handle Capitalization
You might find one or the other of these easier or harder, so choose whichever you think you will be able to accomplish more quickly. Once you have one working, move to the other! Handling punctuation is extra credit.
## (4) Write out output file
Your function that writes out the output file should be named `correct_spelling` and the output file should have the `output_` prefix. That means that if the original file was names `words_1.txt`, the output file should be named `output_words_1.txt`
# Test Cases
There are going to be a number of test cases for this assignment. You should test out your program in chunks. Don't try to tackle everything at once. If you follow the development strategy, you should be able to progressively pass more and more test cases, as you add more and more functionality.
Below are several examples. Each example includes the contents of `misspellings.txt`, the contents of the input file, and the results for the output file.
## Example 1 (no punctuation or capitalized words)
`misspellings.txt `
```{html}
datamic->dramatic
dramati->dramatic
elaphant->elephant
elofent->elephant
elaphent->elephant
zooo->zoo
zo->zoo
```
`words_1.txt`
```{html}
joe and his family went to the zoo the other day
the zooo had many animals including an elofent
the elaphant was being too dramati though
after they walked around joe left the zo
```
```{python}
#| eval: false
#| echo: true
if __name__ == '__main__':
spell_dict = read_spellings()
correct_spelling("words_1.txt", spell_dict)
```
`output_words_1.txt`
```{html}
joe and his family went to the zoo the other day
the zoo had many animals including an elephant
the elephant was being too dramatic though
after they walked around joe left the zoo
```
## Example 2 (with punctuation and capitalized words)
`misspellings.txt `
```{html}
hereo->hero
heroc->hero
fluwe->flew
jumpedd->jumped
jimped->jumped
jumpped->jumped
savved->saved
sived->saved
sivved->saved
sumprean->superman
sumperan->superman
dayy->day
ayy->day
```
`words_2.txt`
```{html}
There once was a hero, named superman.
Sumperan, being the hero he is, jumped.
After he jumped, he fluwe!
Then, Sumprean savved the day.
```
```{python}
#| eval: false
#| echo: true
if __name__ == '__main__':
spell_dict = read_spellings()
correct_spelling("words_2.txt", spell_dict)
```
`output_words_2.txt`
```{html}
There once was a hero, named superman.
Superman, being the hero he is, jumped.
After he jumped, he flew!
Then, Superman saved the day.
```