Python Utilities - Educational Materials In this section, we look at a few of Python's many standard utility modules to solve common problems. File System -- os, os.path, shutil The *os* and *os.path* modules include many functions to interact with the file system. The *shutil* module can copy files. os module docs filenames = os.listdir(dir) -- list of filenames in that directory path (not including . and ..). The filenames are just the names in the directory, not their absolute paths. ## Example pulls filenames from a dir, prints their relative and absolute pathsdef printdir(dir): filenames = os.listdir(dir) for filename in filenames: print filename ## foo.txt print os.path.join(dir, filename) ## dir/foo.txt (relative to current dir) print os.path.abspath(os.path.join(dir, filename)) ## /home/nick/dir/foo.txt Exploring a module works well with the built-in python help() and dir() functions. Running External Processes -- commands The *commands* module is a simple way to run an external command and capture its output. Exceptions
Python Regular Expressions - Educational Materials Regular expressions are a powerful language for matching text patterns. This page gives a basic introduction to regular expressions themselves sufficient for our Python exercises and shows how regular expressions work in Python. The Python "re" module provides regular expression support. In Python a regular expression search is typically written as: match = re.search(pat, str) The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. str = 'an example word:cat!!' The code match = re.search(pat, str) stores the search result in a variable named "match". The 'r' at the start of the pattern string designates a python "raw" string which passes through backslashes without change which is very handy for regular expressions (Java needs this feature badly!). Basic Patterns The power of regular expressions is that they can specify patterns, not just fixed characters. a, X, 9, < -- ordinary characters just match themselves exactly. findall
Python Dict and File - Educational Materials Dict Hash Table Looking up or setting a value in a dict uses square brackets, e.g. dict['foo'] looks up the value under the key 'foo'. Strings, numbers, and tuples work as keys, and any type can be a value. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). Looking up a value which is not in the dict throws a KeyError -- use "in" to check if the key is in the dict, or use dict.get(key) which returns the value or None if the key is not present (or get(key, not-found) allows you to specify what value to return in the not-found case). print dict['a'] ## Simple lookup, returns 'alpha' dict['a'] = 6 ## Put new key/value into dict 'a' in dict ## True ## print dict['z'] ## Throws KeyError if 'z' in dict: print dict['z'] ## Avoid KeyError print dict.get('z') ## None (instead of KeyError) A for loop on a dictionary iterates over its keys by default. ## Get the .keys() list: print dict.keys() ## ['a', 'o', 'g'] Dict Formatting Del Files
Python Sorting - Educational Materials The easiest way to sort is with the sorted(list) function, which takes a list and returns a new list with those elements in sorted order. The original list is not changed. a = [5, 1, 4, 3] print sorted(a) ## [1, 3, 4, 5] print a ## [5, 1, 4, 3] It's most common to pass a list into the sorted() function, but in fact it can take as input any sort of iterable collection. The sorted() function can be customized though optional arguments. strs = ['aa', 'BB', 'zz', 'CC'] print sorted(strs) ## ['BB', 'CC', 'aa', 'zz'] (case sensitive) print sorted(strs, reverse=True) ## ['zz', 'aa', 'CC', 'BB'] Custom Sorting With key= For more complex custom sorting, sorted() takes an optional "key=" specifying a "key" function that transforms each element before comparison. For example with a list of strings, specifying key=len (the built in len() function) sorts the strings by length, from shortest to longest. strs = ['ccc', 'aaaa', 'd', 'bb'] print sorted(strs, key=len) ## ['d', 'bb', 'ccc', 'aaaa'] Tuples
Python Lists - Educational Materials Python has a great built-in list type named "list". List literals are written within square brackets [ ]. Lists work similarly to strings -- use the len() function and square brackets [ ] to access data, with the first element at index 0. (See the official python.org list docs.) colors = ['red', 'blue', 'green'] print colors ## red print colors ## green print len(colors) ## 3 Assignment with an = on lists does not make a copy. b = colors ## Does not copy the list The "empty list" is just an empty pair of brackets [ ]. FOR and IN Python's *for* and *in* constructs are extremely useful, and the first use of them we'll see is with lists. squares = [1, 4, 9, 16] sum = 0 for num in squares: sum += num print sum ## 30 If you know what sort of thing is in the list, use a variable name in the loop that captures that information such as "num", or "name", or "url". list = ['larry', 'curly', 'moe'] if 'curly' in list: print 'yay' You can also use for/in to work on a string. Range While Loop
Python Strings - Educational Materials Python has a built-in string class named "str" with many handy features (there is an older module named "string" which you should not use). String literals can be enclosed by either double or single quotes, although single quotes are more commonly used. Backslash escapes work the usual way within both single and double quoted literals -- e.g. \n \' \". Python strings are "immutable" which means they cannot be changed after they are created (Java strings also use this immutable style). Characters in a string can be accessed using the standard [ ] syntax, and like Java and C++, Python uses zero-based indexing, so if s is 'hello' s is 'e'. s = 'hi' print s ## i print len(s) ## 2 print s + ' there' ## hi there Unlike Java, the '+' does not automatically convert numbers or other types to string form. pi = 3.14 ##text = 'The value of pi is ' + pi ## NO, does not work text = 'The value of pi is ' + str(pi) ## yes For numbers, the standard operators, +, /, * work in the usual way. String %
Basic Python Exercises - Educational Materials There are 3 exercises that go with the first sections of Google's Python class. They are located in the "basic" directory within the google-python-exercises directory. Download the google-python-exercises.zip if you have not already (see the Set-Up page for details). string1.py -- complete the string functions in string1.py, based on the material in the Python Strings section (additional exercises available in string2.py) list1.py -- complete the list functions in list1.py, based on the material in the Python Lists and Python Sorting sections (additional exercises available in list2.py) wordcount.py -- this larger, summary exercise in wordcount.py combines all the basic Python material in the above sections plus Python Dicts and Files (a second exercise is available in mimic.py) With all the exercises, you can take a look at our solution code inside the solution subdirectory.
Baby Names Python Exercise - Educational Materials The Social Security administration has this neat data by year of what names are most popular for babies born that year in the USA (see social security baby names). The files for this exercise are in the "babynames" directory inside google-python-exercises (download the google-python-exercises.zip if you have not already, see Set Up for details). Add your code in babynames.py. The files baby1990.html baby1992.html ... contain raw html, similar to what you get visiting the above social security site. Take a look at the html and think about how you might scrape the data out of it. Part A In the babynames.py file, implement the extract_names(filename) function which takes the filename of a baby1990.html file and returns the data from the file as a single list -- the year string at the start of the list followed by the name-rank strings in alphabetical order. ['2006', 'Aaliyah 91', 'Abagail 895', 'Aaron 57', ...]. Earlier we have had functions just print to standard out. Part B
Copy Special Python Exercise - Educational Materials The Copy Special exercise goes with the file-system and external commands material in the Python Utilities section. This exercise is in the "copyspecial" directory within google-python-exercises (download google-python-exercises.zip if you have not already, see Set Up for details). Add your code in copyspecial.py. The copyspecial.py program takes one or more directories as its arguments. We'll say that a "special" file is one where the name contains the pattern __w__ somewhere, where the w is one or more word chars. Suggested functions for your solution(details below): get_special_paths(dir) -- returns a list of the absolute paths of the special files in the given directory copy_to(paths, dir) given a list of paths, copies those files into the given directory zip_to(paths, zippath) given a list of paths, zip those files up into the given zipfile Part A (manipulating file paths) Gather a list of the absolute paths of the special files in all the directories. $ . Part B (file copying) $ . $ .
Log Puzzle Python Exercise - Educational Materials For the Log Puzzle exercise, you'll use Python code to solve two puzzles. This exercise uses the urllib module, as shown in the Python Utilities section. The files for this exercise are in the "logpuzzle" directory inside google-python-exercises (download the google-python-exercises.zip if you have not already, see Set Up for details). Add your code to the "logpuzzle.py" file. An image of an animal has been broken it into many narrow vertical stripe images. The slice urls are hidden inside apache log files (the open source apache web server is the most widely used server on the internet). Here is what a single line from the log file looks like (this really is what apache log files look like): 10.254.254.28 - - [06/Aug/2007:00:14:08 -0700] "GET /foo/talks/ HTTP/1.1" 200 5910 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:22.214.171.124) Gecko/20070515 Firefox/126.96.36.199" The first few numbers are the address of the requesting browser. Part A - Log File To Urls $ . $ .