Today we’re going to write a program that conducts a word count on the lyrics of a song (stored in a text file), creating a dictionary that stores the frequency of each word and then creating a file that lists some statistics.

Before we start, a note on program structure:

You should always put the filename and/or a descriptive title at the very top of your program, followed by a brief description of what the program does. This is usually followed by importing any libraries/modules you plan to use, and then defining certain constants, global variables, and/or initial conditions. Then you define your functions, including your main() function (which calls the rest), and last, you will always call main() at the very bottom.

Different programmers will have slight variations on this overall structure, but as a beginner this is a good outline to follow until you develop your own style. The main tab for today’s lesson is outlined using comments; please follow this outline. (We’ll go into more depth about program structure and planning over the next couple days.)

Big Picture

Here’s an outline of what your song analysis program is going to do.

  • Read the contents of the text file containing your song lyrics.
  • Format the text: get rid of punctuation, make everything lowercase, and split the string into a list of words.
  • Conduct the word count:
    • Create an empty dictionary.
    • Go through the list of words one by one. If the word isn’t in the dictionary, add it with an initial count of 1. If it is in the dictionary, increase its count by 1.
  • Process the results, extracting some interesting facts from your dictionary and saving them to a new text file.

This last step will involve some additional algorithms that we haven’t discussed yet. An algorithm is a specific procedure for solving a certain kind of problem. For example, the method for conducting the word count is an algorithm. Algorithms conducted on lists or dictionaries typically involve for loops. Let’s look at one example, a min/max algorithm.

 

Determining Minimum or Maximum

Say you have a list of numbers and you want to know the largest or smallest number in the list. For a short list, it’s pretty easy to do this by looking. But for a really long list, it would be a lot easier to have a method to make sure you find the correct value. Let’s use this list, and try to find the maximum:

values = [47, 56, 32, 18, 85, 14, 66, 87, 71, 23]

It’s hard to compare a bunch of numbers at once, so maybe we should try just comparing two values at a time. Let’s start by looking at the first two values, 47 and 56. Which one is bigger?

Now let’s look at the next value, 32. Which is bigger, 32 or 56?

We’ll keep going through the list this way, picking the bigger number each time and then comparing it to the next number. When we get to the end of the list, we’ll have the maximum. What is it?

We can apply this method in Python using a for loop.

How would you have to change this algorithm to make it work on a dictionary like our word count dictionary?

 

Let’s Go!

When you’re done, please hit the Submit button. Once we’ve okayed your submission, please check out the bonus exercises if there are more than 15min of lab time left.

Did you notice that the .lower() function took away some capital letters that were supposed to be there, like the word "I" or perhaps some proper nouns? Add some code to the formatText() function to fix these.

Sometimes, song lyrics are written weirdly. Are "ooh" and "oooooh" the same word, or not? Are they even words? Add some features to your formatText() function to take care of these issues.

Do some additional analyses of your song lyrics! Or, try running your program on another song's lyrics. Do you have to change anything to make it work properly?

End of Python I!