Project Gutenberg EBook Python Programming Go to https://www.gutenberg.org/. Select a free ebook and download the plain text file (utf8).
Write a program that does the following:
Read in your ebook (any book, any language)
Break the book into words (look at .split() for strings)
Make a dictionary of all words and how often they occur: for instance if the word ‘the’ happened 2000 time and the word ‘a’ happened 1940 times word_dict= {‘the’ : 2000, ‘a’ :1940}
Print the dictionary of word frequencies to a file (freqs.txt) with the most frequent at the top and least frequent at the bottom
No need to submit the book, I will run it on my own files.
Turn in a .py
Note this assignment is worth double since it covers both FileIO and Dictionaries(attached)
………….
Dictionaries are not easily sortable since they are not ordered like a list. To sort for this assignment use the following template.
import operator
sorted_dict = sorted(my_dict.items(), key=operator.itemgetter(1)) 7/16/2020
Dictionaries.ipynb – Colaboratory
Dictionaries
Currenly we have three ways we have looked at handeling data
1. Basic Variables, good for a singular piece of data
2. Lists, can contain a dynamic amount of data
3. Files, can write the outputs of a program to save them for later
Today we will look at an alternative to #2. Lists are great for many thing.s They are dynamic in size,
they can take any type of data, and they are fast for random access when you know the index value.
The third pro is what we will examine with dictionaries. Primarily Lists are used for random access.
By that we mean that we can jump to any piece of data in the list if we know the index of the piece
of data such as:
print(list_name[random_index_value])
What if we don’t know the index of what we are looking for? Well, we have to loop through and
search for what we want. If a List has N items in it, this may take N operations. With a large amount
of data this may become intractable.
This is where dictionaries come in. Dictionaries are a Key/Value pairing. This means you can lookup
a piece of data (a value) by providing a key. This is simular to how a physical dictionary works. If you
want a de nition (value) you look up the key (the word) in the dictionary.
Let’s take a look at a simple dictionary where we de ne a word given a letter.
The basic format for a dictionary is as follows:
dictionary_name = {key1:value1,key2:value2}
my_dictionary
= {‘a’:’apple’,’b’:’banana’,’c’:’carrot’,’d’:’durian’,’e’:’eggplant’}
m
print(my_dictionary)
{‘a’: ‘apple’, ‘b’: ‘banana’, ‘c’: ‘carrot’, ‘d’: ‘durian’, ‘e’: ‘eggplant’}
Here we have de ned a simple dictionary representing words for a,b,c,d,e . While a List would lookup
apple by using index 0, a dictionary has no order. So apple is not actually the 0th item. The
https://colab.research.google.com/drive/1iQmCmLeM_RITQuZPw3P6LzB5fdKOlQwl?usp=sharing#printMode=true
1/6
7/16/2020
Dictionaries.ipynb – Colaboratory
dictionary may order them however it decides. Apple being 0 also does not make any logical sense
to what an pple is.
To look up apple, we have to use the key (‘a’)
While a dictionary is de ned using curly brackets {}, we look up a value using square brackets
following this format
value = my_dictionary[key]
Let’s print out a few of the letters from above
print(my_dictionary[‘a’])
p
print(my_dictionary[‘b’])
print(my_dictionary[‘c’])
print(my_dictionary[‘d’])
print(my_dictionary[‘e’])
apple
banana
carrot
durian
eggplant
This is a much more logical format than using [0] [1] when there is not a logical order.
Let’s look at a few possible errors you might encounter.
The rst one shows that key’s are case sensitive so ‘a’ is different than ‘A’
The second one will show you what happens when you try to access something that is not in the
dictionary. In both cases you will get a KeyError, meaning nothing under that key has been stored.
print(my_dictionary[‘A’])
p
————————————————————————-KeyError
Traceback (most recent call
last)
in ()
—-> 1 print(my_dictionary[‘A’])
KeyError: ‘A’
print(my_dictionary[‘f’])
p
https://colab.research.google.com/drive/1iQmCmLeM_RITQuZPw3P6LzB5fdKOlQwl?usp=sharing#printMode=true
2/6
7/16/2020
Dictionaries.ipynb – Colaboratory
————————————————————————-KeyError
Traceback (most recent call
last)
in ()
—-> 1 print(my_dictionary[‘f’])
KeyError:
Adding
to a’f’dictionary
Above we put all out values into the dictionary at the start. There is not need to do this. We can add
them dynamically as well. We can start out with a blank dictionary like below
user_dict={}
Here we de ne a blank user dictionary. We might gather information about the user as the type it
into their pro le. At that point we might add it to the dictionary.
To add add a new key/value pair to the dictionary we follow the following format:
dictionary_name[new_key]=new_value
So if we wanted to de ne the above dictionary and add a user name and age we can do the
following
#
#blank
dictionary
user_dict={}
#add name to dictionary
user_dict[‘first_name’] = ‘Nathan’
user_dict[‘age’]=39
print(user_dict)
#print in a formatted String
print(f”Our user is {user_dict[‘first_name’]} and he is {user_dict[‘age’]} years old”)
{‘first_name’: ‘Nathan’, ‘age’: 39}
Our user is Nathan and he is 39 years old
Hopefully you can see above that when data has not real order, like a name and an age, it is easier to
look them up by their key instead of [0] and [1]
https://colab.research.google.com/drive/1iQmCmLeM_RITQuZPw3P6LzB5fdKOlQwl?usp=sharing#printMode=true
3/6
7/16/2020
Dictionaries.ipynb – Colaboratory
Another important characteristic of a Dictionary is that every key is unique. This means you cannot
have a key twice.
So if we had later on in the code
user_dict[‘first_name’] = ‘Tim’
This would not add a second name but it would overwrite our current name. This means we can
update a value the same was we add them.
dictionary_name[exisiting_key]=updated_value
Deleting from a dictionary
Occasionally data will not longer be relavant and you want to save memeory so you want to remove
a piece of data. This is simply done using the key you want to delete.
The format is
del dictionary_name[key]
So above if we want to delete age we would do the following
#
#print
what is currently there
print(user_dict)
#remove age
del user_dict[‘age’]
#print the result (age should be gone)
print(user_dict)
{‘first_name’: ‘Nathan’, ‘age’: 39}
{‘first_name’: ‘Nathan’}
Looping through a Dictionary
Looping through a list was straight forward given that we knew we started at 0 and went to the end
of a list. Since a dictionary has not native ordering, we need to use some built in functionality.
https://colab.research.google.com/drive/1iQmCmLeM_RITQuZPw3P6LzB5fdKOlQwl?usp=sharing#printMode=true
4/6
7/16/2020
Dictionaries.ipynb – Colaboratory
We’ll look at two main ways for looping. the rst one will use the .keys() function that gives us alist
of all the keys in the dictionary.
We’ll de ne a new list of cars and prices.
c
cars={‘Honda’:25000,’Tesla’:40000,’Kia’:16000,’BMW’:50000,’Hyundai’:20000}
print(cars.keys())
#let’s loop through the keys to access the falues
for key in cars.keys():
value=cars[key]
print(f'{key}:{value}’)
#we can use format to make it sound human
for key in cars.keys():
value=cars[key]
print(f'{key} costs about ${value} dollars’)
dict_keys([‘Honda’, ‘Tesla’, ‘Kia’, ‘BMW’, ‘Hyundai’])
Honda:25000
Tesla:40000
Kia:16000
BMW:50000
Hyundai:20000
Honda costs about $25000 dollars
Tesla costs about $40000 dollars
Kia costs about $16000 dollars
BMW costs about $50000 dollars
Hyundai costs about $20000 dollars
With these looping mechanisms you do not actually need to know any of the keys. The previous
look did have any extra step of grabbing the value. We have another loop that will return both values
so we can skip that step. This uses the .items() function which gives you both the key and value
Here is the same loop as above using items()
for key,value in cars.items():
print(f'{key} costs about ${value} dollars’)
Honda costs about $25000 dollars
Tesla costs about $40000 dollars
Kia costs about $16000 dollars
BMW costs about $50000 dollars
Hyundai costs about $20000 dollars
https://colab.research.google.com/drive/1iQmCmLeM_RITQuZPw3P6LzB5fdKOlQwl?usp=sharing#printMode=true
5/6
7/16/2020
Dictionaries.ipynb – Colaboratory
This saves us one line of code and is easier to read. If you don’t want to print everything but only
items that meet a certainly criteria, you need to combine this code with a conditional.
Let’s say we can only afford a car under 30,000. We can use conditionals to eliminate anything
higher than 30,0000
for key,value in cars.items():
if value
Purchase answer to see full
attachment