Transforming Code into Beautiful, Idiomatic Python (Raymond Hettinger, PyCon 2013)

In Software Engineering



“If there’s only two things to take away from this, they should be:”

  • [2:33] Replace traditional index manipulation with Python’s core looping idioms.
  • [2:33] Learn advanced techniques with for-else clauses and the two argument form of iter()

[03:04] Looping over a range

[3:30] We probably should’ve named the for construct foreach.

[04:47] Looping over a collection

[05:28] Looping backwards

[06:51] Looping over a collection of indices

[7:28] Whenever you’re manipulating indices directly, you’re probably doing it wrong.

[07:36] Looping over two collections

[8:25] zip doesn’t scale. It takes two lists and manifests a third list with pointers back to the originals.

[8:49] In modern processors, only one thing matters: Is the code running in L1 cache? If the cache misses, then a simple move becomes as expensive as a floating-point divide. It can go from a half-clock cycle to 400-600 clock cycles. You can lose 2.5 orders of magnitude by not being in cache.

[9:14] If these lists are really big, do you think zip is going to fit into cache?

[09:42] Looping in sorted order

[10:04] Custom sort order

[10:59] How often would a custom comparator be called? If you have a list of a million, and since the complexity is n log n, 1000000*log2(1000000) = 1000000 * 20 = 20000000.

[11:52] How do we know that key functions are sufficient? Look at SQL people. They compare all the time. Do they use custom comparators? No, they have key functions.

[12:13] Abandon your comparator functions.

[12:27] Call a function until a sentinel value

[13:04] How should you join your strings together? join! Not +!

[13:48] The moment you make something iterable, you’ve done something magical with your code. As soon as something is iterable, you can feed it to set, sort, min, max, heap, queue, sum. A lot of Python works with iterables.

[14:10] The part to focus on is not the for loop, but the two-argument form of iter. In order to make it work, the first function has to be a function with no arguments.

[14:22] How many arguments does take? One. How do you go from 1 to 0? partial. partial takes an argument with many arguments to a function with fewer arguments.

[14:44] The magic of this is that there are many functions (especially in older APIs) that are intended to be called over and over again until they give you a sentinel value. It’s called a control break style of programming.

[15:36] There’s a reason why we don’t do this anymore. It’s the same reason why we don’t terminate our strings with nulls anymore.

[15:43] The two-argument form of iter takes old-world sentinel-based APIs into the new world of iterators.

[15:52] Distinguishing multiple exit points in loops

[16:16] Structured equivalent to GOTO‘s.

[16:24] One problem with for loops is the need for a flag variable, to say when something’s been found or not found… This code is typically intermeshed with more complex code.

[16:52] Flag variables slow down your code and makes it less readable.

[17:15] We have else clauses on for loops. When we have an if in the for loop, the if keeps tell the for loop to keep doing the body. What construct is typically associated with if? else. So what the else means is “I finished the body.”

[18:12] If we called this nobreak, then everyone would know what it does.

[18:51] Just like if we called lambdamakefunction,” no one would say “what does lambda do?”

[19:14] Dictionaries are fundamental for expressing relationships, linking, counting and grouping.

[19:18] Looping over dictionary keys

[20:16] When should you use d.keys()? When you’re mutating the dictionary. In the first way, you can’t mutate the dictionary while iterating over it.

[20:36] d.keys() makes a copy of the keys. At which point, you’re free to mutate the dictionary.

[21:10] Looping over dictionary keys and values

[21:14] One way is to loop over the keys, then look up the value. But this is slow. Because the key’ll have to be rehashed, then a lookup occurs.

[21:52] Construct a dictionary from pairs

[23:15] Counting with dictionaries

[23:22] Show beginners “get()” first.

[24:04] Should you start beginners with defaultdicts and stuff like that? No.

[24:30] What I use is defaultdict, or I use collections.Counter.

[25:30] Grouping with dictionaries

[26:28] If you need to group by anything else, just change the key line.

[27:57] Is a dictionary pop() atomic?

[28:27] Is adding documentation a good way to join an open-source project? Yes! People will love you for it, you make the code more usable, you learn what every module does.

[29:07] popitem() removes an arbitrary item. It’s atomic! So you don’t have to put locks on it to allow different threads to atomically pull out items.

[29:12] Linking dictionaries

[30:18] If you want your code to be fast, don’t copy like crazy. So ChainMap’s been introduced to Python 3. It looks up keys in the first dict, if it doesn’t find it there, it falls back to the second dict. If it doesn’t find it there, it falls back to the third dict.

[31:10] Clarify function calls with keyword arguments

[31:00] Keywords and names are better than positional arguments and indices.

[31:35] A simple to improve the readability of your code is by using keyword arguments. It slows the program by milliseconds, but improves programmer time by hours.

[32:17] Clarify multiple return values with named tuples

[32:45] namedtuple is a subclass of tuple. Use namedtuple.

[33:13] Unpacking sequences

[34:01] Updating multiple state variables

[34:55] If you don’t update state all at once, and update them on multiple lines, the state is mismatched in-between those lines! Here, at one point, y is the new y, and x is the old x. This is a very common source of problems, e.g. order of the lines.

[35:43] The first way is low-level. The second way is a higher-level way of thinking. It doesn’t get the order wrong, and it’s faster in Python.

[36:02] Don’t underestimate the advantages of updating state variables at the same time. It eliminates an entire class of errors due to out-of-order updates. It allows high-level thinking: “chunking.”

[36:15] Simultaneous state updates

[38:15] Excel users will naturally refer to the old row when updating to a new row.

[38:20] Efficiency: Don’t move data around unnecessarily. It only takes a little care to avoid O(N**2) behavior instead of linear behavior.

[38:24] Concatenating strings

[38:41] Updating sequences

[39:27] Decorators and context managers help separate business logic from administrative logic. But good naming is essential. They provide macro-like capability, which means you can hide all kinds of awful actions or you can be very clear.

[39:57] Using decorators to factor-out administrative logic

[40:23] Functions shouldn’t contain both administrative logic and business logic. Plus, they’re not reusable.

[40:50] @cache can be used in front of any pure function (functions that return the same value every time you call it, e.g. is random.random pure? No.).

[40:24] Caching decorator

[41:20] I would like utility directories to be full of decorators.

[41:19] Factor-out temporary contexts

[41:49] Pretty much any time you have setup logic and teardown logic that gets repeated in your code, you want to use a context manager.

[42:01] How to open and close files

[42:25] How to use locks

[42:35] Do you have to use a try-finally? Absolutely. If you don’t, you don’t release the lock under certain situations, e.g. where an error happens.

[43:10] Factor-out temporary contexts

[43:28] You can check if a file exists before you remove it. Is this the right way? No, because a race condition can occur. (For example, another file can create, delete, or lock the file in the time between checking the file’s existence and removing (or opening) the file.

[44:56] Context manager: redirect_stdout()

[46:04] Concise expressive one-liners

[46:04] Two conflicting rules: (1) don’t put too much on one line, (2) don’t break atoms of thought into subatomic particles. Raymond’s Rule: One logical line of code = one sentence in English.

[47:31] Why is a list comprehension better? It’s more declarative. It says what you want. It’s a single unit of thought. The for-loop way is too busy telling you how to do it, and not what it’s doing.

[48:02] Take out the square brackets. That makes a generator version of this, which makes it go fast.

Leave a Reply