back

python workout: exercise 18

problem

Write a function (get_final_line) that takes a filename as an argument. The function should return that file’s final line on the screen.

attempts

first

My first thought is to just readlines() and index into the last element with -1. But that seems to defeat the implied purpose of the exercise – to explore hte possibility that “I’m only interested in a single line”.

Maybe it’s worth looking at the docs.

Having done so, the seek() method seems to show promise. In theory, we could f.seek(0,2) to land at the end of a file. And maybe we can call readline() from there and get the whole line without seeking to its start.

logs_path = "files/apache_logs.log"

with open(logs_path, 'r') as f:
    f.seek(0,2)
    print(f.readline())

second

Unfortunately, seeking like that doesn’t work. We have to seek to the end of the last line for readline() (and presumably read() to work as expected). Makes sense.

Our readlines() approach should still work though:

def get_final_line(path: str) -> str:
    with open(path, 'r') as f:
        lines = f.readlines()
        return lines[-1].strip() # to get rid of the trailing new line char (i.e. \n)

logs_path = "files/apache_logs.log"
print(get_final_line(logs_path))
46.105.14.53 - - [20/May/2015:21:05:15 +0000] "GET /blog/tags/puppet?flav=rss20 HTTP/1.1" 200 14872 "-" "UniversalFeedParser/4.2-pre-314-svn +http://feedparser.org/"

solution

The book’s implementation:

def get_final_line(filename):
    final_line = ''
    for current_line in open(filename):
        final_line = current_line
    return final_line

Its more straightforwad. But it does iterate through all the lines. The nice takeway is the ability to implicitly iterate through file objects directly in a for loop.

beyond the exercise

summing intergers in text files

  • problem

    Iterate over the lines of a text file. Find all of the words (i.e., non-whitespace surrounded by whitespace) that contains only integers, and sum them.

  • attempts

    This is a bit tricky. Splititng on whitespace would be complicated, given the varieties of whitespace (e.g. tabs, spaces, double spaces, etc). But I believe passing no argument to the .split method defaults to splitting on whitespace.

    That’s that.

    The second part of parsing integers should be doable with a try block that catches the exceptions of casting words to ints.

    We could also take the opportunity to simply read the file outright with read. Instead of iterating through it line by line.

    Together, we should get:

    logs = 'files/apache_logs.log'
    
    total = 0
    contents = open(logs).read()
    for word in contents.split():
        try:
            value = int(word)
        except ValueError:
            continue
        total += value
    
    print(total)
    
    252305962
    

parsing .tsvs

  • problem

    Create a text file (using an editor, not necessarily Python) containing two tab separated columns, with each column containing a number. Then use Python to read through the file you’ve created. For each line, multiply each first number the second, and then sum the results from all the lines. Ignore any line that doesn’t contain two numeric columns.

  • attempts

    This should be straightforward. We can reuse the fact that split() defaults to splitting on whitespace to read each line, split the elements, multiply them, and add them to a running total:

    numbers_file = "files/numbers.txt"
    
    total = 0
    for line in open(numbers_file):
        first, second = [int(x) for x in line.split()]
        total += first * second
    
    print(total)
    
    550
    

counting vowels

  • problem

    Read through a text file, line by line. Use a dict to keep track of how many times each vowel (a,e,i,o, and u) appears in the file. Print the resulting tabulation.

  • attempts

    This should be straightforward if we. We can iterate through each character on each line and only increment its counter in a collections.defaultdict if its a vowel:

    from collections import defaultdict
    
    logs = 'files/apache_logs.log'
    
    vowel_freqs = defaultdict(int)
    vowels = {'a', 'e', 'i', 'o', 'u'}
    for line in open(logs):
        for char in line:
            if char in vowels:
                vowel_freqs[char] += 1
    
    print(vowel_freqs)
    
    defaultdict(<class 'int'>, {'a': 6207, 'o': 8950, 'e': 9435, 'i': 6535, 'u': 1463})
    
mail@jonahv.comrss