python workout: exercise 18
problem
Write a function (
get_final_line) that takes a filename as an argument. The function should return that file’s final line on the screen.
attempts
first
My first thought is to just readlines() and index into the last element with -1.
But that seems to defeat the implied purpose of the exercise – to explore hte
possibility that “I’m only interested in a single line”.
Maybe it’s worth looking at the docs.
Having done so, the seek() method seems to show promise. In theory, we could
f.seek(0,2) to land at the end of a file. And maybe we can call readline() from
there and get the whole line without seeking to its start.
logs_path = "files/apache_logs.log"
with open(logs_path, 'r') as f:
f.seek(0,2)
print(f.readline())
second
Unfortunately, seeking like that doesn’t work. We have to seek to the end of the
last line for readline() (and presumably read() to work as expected). Makes
sense.
Our readlines() approach should still work though:
def get_final_line(path: str) -> str:
with open(path, 'r') as f:
lines = f.readlines()
return lines[-1].strip() # to get rid of the trailing new line char (i.e. \n)
logs_path = "files/apache_logs.log"
print(get_final_line(logs_path))
46.105.14.53 - - [20/May/2015:21:05:15 +0000] "GET /blog/tags/puppet?flav=rss20 HTTP/1.1" 200 14872 "-" "UniversalFeedParser/4.2-pre-314-svn +http://feedparser.org/"
solution
The book’s implementation:
def get_final_line(filename):
final_line = ''
for current_line in open(filename):
final_line = current_line
return final_line
Its more straightforwad. But it does iterate through all the lines. The nice
takeway is the ability to implicitly iterate through file objects directly in a
for loop.
beyond the exercise
summing intergers in text files
-
problem
Iterate over the lines of a text file. Find all of the words (i.e., non-whitespace surrounded by whitespace) that contains only integers, and sum them.
-
attempts
This is a bit tricky. Splititng on whitespace would be complicated, given the varieties of whitespace (e.g. tabs, spaces, double spaces, etc). But I believe passing no argument to the
.splitmethod defaults to splitting on whitespace.That’s that.
The second part of parsing integers should be doable with a
tryblock that catches the exceptions of casting words to ints.We could also take the opportunity to simply read the file outright with
read. Instead of iterating through it line by line.Together, we should get:
logs = 'files/apache_logs.log' total = 0 contents = open(logs).read() for word in contents.split(): try: value = int(word) except ValueError: continue total += value print(total)252305962
parsing .tsvs
-
problem
Create a text file (using an editor, not necessarily Python) containing two tab separated columns, with each column containing a number. Then use Python to read through the file you’ve created. For each line, multiply each first number the second, and then sum the results from all the lines. Ignore any line that doesn’t contain two numeric columns.
-
attempts
This should be straightforward. We can reuse the fact that
split()defaults to splitting on whitespace to read each line, split the elements, multiply them, and add them to a running total:numbers_file = "files/numbers.txt" total = 0 for line in open(numbers_file): first, second = [int(x) for x in line.split()] total += first * second print(total)550
counting vowels
-
problem
Read through a text file, line by line. Use a dict to keep track of how many times each vowel (a,e,i,o, and u) appears in the file. Print the resulting tabulation.
-
attempts
This should be straightforward if we. We can iterate through each character on each line and only increment its counter in a
collections.defaultdictif its a vowel:from collections import defaultdict logs = 'files/apache_logs.log' vowel_freqs = defaultdict(int) vowels = {'a', 'e', 'i', 'o', 'u'} for line in open(logs): for char in line: if char in vowels: vowel_freqs[char] += 1 print(vowel_freqs)defaultdict(<class 'int'>, {'a': 6207, 'o': 8950, 'e': 9435, 'i': 6535, 'u': 1463})