Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
class TextParser
attr_reader :text, :word_counts
def initialize(file)
@text = IO.read(file)
@word_counts = {}
end

def remove_newlines
text.gsub(/\s+/, ' ').strip
end

def parsed_text
remove_newlines.split(" ").map do |string|
string.gsub(/[^a-zA-Z0-9'-]/i, '').downcase
end
end

def count_words
parsed_text.each do |word|
if word_counts.key?(word)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you instantiate the word_counts hash with a default value of 0, you wouldn't need to check if the word has already been seen.

http://ruby-doc.org/core-2.2.2/Hash.html#method-c-new

word_counts[word] += 1
else
word_counts[word] = 1
end
end
end

def sorted_counts
word_counts.sort_by {|_key, value| -value}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet there are people that wouldn't think this is readable, but I love the simplicity of it.

It would be a slight improvement to give _key and value more intention-revealing names. _word and count, I suppose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Variables should have meaningful names.

end

def parse
count_words
sorted_counts.each do |pair|
puts "#{pair[1]} - #{pair[0]}"
end
end
end



p = TextParser.new("speech.txt")
p.parse