Recap: So you want to be a Python expert?

by James Powell, PyData Seattle 2017
python
pydata
Author
Published

July 10, 2024

Modified

July 16, 2024

🔥 source: James Powell: So you want to be a Python expert? | PyData Seattle 2017

what is he gonna present?

  • something deeply behind the Zen of Python;
  • metaphors and programming in Python;
  • python is not only scripting language, he will be introducing 4 features of Python and the way experts would think about these features;
  • a lot of fundamental details of how these feature work but very little talk about we can conceptualize these features in the broader sense of what they mean for modeling core problem. so 4 features you might have heared before and couple of the core mental models how you can think about, an how you can think about python as a whole, wrapping everything together;
  • you will see me stumble to understand that in case you dont utd or dont have docs memorized, it’s alright, the target is to know what they are and what they mean;
  • the talk presumes you to have a baseline of Python, but in case you a new to Python he believes there are cores you can take away.

James Powell on PyData Seattle 2017

data model protocol (or dunder method)

assume we have a data model with Polynomial object intitially do nothing, why I need 4 lines (instead of 2 lines) to create 2 Polynominals. How can I create and assign coefficients together.

Show the code
class Polynomial:
    pass
p1= Polynomial()
p2= Polynomial()
p1.coeffs = 1, 2, 3 #  x^2 + 2x + 3
p2.coeffs = 3, 4, 3 # 3x^2 + 4x + 3

the answer is using a constructor __init__, then we can compact our code like this:

Show the code
class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = coeffs

p1 = Polynomial(1, 2, 3 ) #  x^2 + 2x + 3
p2 = Polynomial(3, 4, 3 ) # 3x^2 + 4x + 3

but if we print our objects in the terminal, they look so ugly. We miss the method that actually call repr(obj) when we run obj:

The need of __repr__

that’s what __repr__ for, a printable representation of the class:

Show the code
class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = coeffs

    def __repr__(self):
        return 'Polynomial(*{!r})'.format(self.coeffs)


p1 = Polynomial(1, 2, 3)  #  x^2 + 2x + 3
p2 = Polynomial(3, 4, 3)  # 3x^2 + 4x + 3

the next thing we wanna do is to add Polinimials together p1 + p2.

Show the code
    def __add__(self, other):
        return Polynomial(*(x + y for x, y in zip(self.coeffs, other.coeffs)))
Note

What is pattern here? I have some behaviours that I want to implement, and I write some underscore (__x__) funtions. We call them dunder or underscore methods.

Try google “data model” and we’ll a bunch of documents that list all method to implement the principle behaviours of our objects. There are top-level functions of top-level syntaxes that corresponding to __.

But there are smthg more fundamental here, when I want to:

  • x + y -> __add__;
  • initialize x -> __init__;
  • repr(x) -> __repr__.

Then what if we want to implement the len() (return the length, which is popular know Python function) -> naturally we’ll be thinking of __len__.

Show the code
    def __len__(self):
        return len(self.coeffs)

The Python data model is a means by which you can protocols. Thos protocols have abstract meaning depending on the object itself. We tight the behaviours to top-level syntaxes and functions.

Similarly, if we do see the object Polynomial a executable, callable thing, we can implement a __call__ which will turn our class to function, in this case we cant imagine such thing, so pass.

✍ There are 3 core patterns we want to understand and remember in Python to really understand object orientation in Python:

  1. The protocol view of Python;
  2. The buil-in inheritance protocol;
  3. Some caveats around how OOP in Python works.

Which will be continue to present in this video is jumping into a very tricky metaphor a feature we may have heard of.

meta class

imagine there are 2 groups working on some piece of software. one is core infrastructure of the group and they write library code, the other group is devloper group, they write user code. the developer use library code to accomplish actual business objectives. the core team less cares about business problem, they focus on technical stuffs.

you are in the dev team and there is no way to change to code of library.py. in what circumstance the code of user can break -> there is no foo method! To avoid that, we could simply write a test that call bar(), then we could know if it fails before production environment’s runtime.

is there anything simpler to know the ability to fail before hit run time in production env? -> use assert to check existence of the attribute. we will have an early warning right before the class was initiated.

we can know if the core team change the function name to “food”

by this way we are enforcing constraints on the base class from the derived class.

now we move to a reverse situation, you are the core structure writer and have to deal with the meathead in the business unit that actually using/abusing/missusing your code ~ you have no idea what are they doing. you write the Base class with the assumption and some responsible developer in the BU will go and implement the bar method.

what if you stand on the left side pane

you have no ability to change and even have no idea where the code on the right pane sit. we can not use hasattr() for this particular situation. the first method is to use try ... catch ..., but it only catches in runtime and we will miss catching it before it goes to production env.

The reason that we could call Python a protocol orientated language is not just because the Python data model the object model is protocol orientated but that the entire Python language itself has a notion of hooks and protocols and safety valves within it.

Python is much much simpler language, the code run linearly from top to bottom. And the class statement in Python is actually an executable code. We can write this:

Show the code
for _ in range(10):
  class Base: pass

# or the same thing in other way

class Base:
  for _ in range(10):
    def bar(self):
      return 'bar'

Only the last one will survive. Python accepts this syntax because it’s class is fundamentally executable.

Let’s get into something more interesting! create a class inside a function and use a function in a module in standard libbrary in Python called dis.

Show the code
# create a class inside a function
def _():
  class Base:
    pass

# dis stands for disassemble
from dis import dis

let’s ‘dis’ - stands for disassemble - to see what happen in the bytecode. there are actually things in Python bytecode call LOAD_BUILD_CLASS, it’s actual executable runtime instruction in the Python interpreter for create a class.

disassemble

at the first section of this lecture, we saw some correspondence between top-level syntac or function AND some underscore method that implements that syntax or function. there should be also top-level mechanism, not explicitly syntax or function, with the process of building a class. there is a build-ins function called __build_class__.

Show the code
old_bc = __build_class__

def my_bc(func, name, base=None, **kw):
  if base is Base:
    print('check if the bar method is defined')
  if base is not None:
    return old_bc(func, name, base, **kw)
  return old_bc(func, name, **kw)

import builtins
builtins.__build_class__ = my_bc

__build_class__, we can check if the bar method is implemented, Python is protocal oriented language!

It’s actually quite a common pattern and quite a fundamental piece of Python almost everything that a Python language does in an execution context like building classes, creating functions, importing modules you can find a way to hook into that. once you can find a way to hook into that you can start doing things that you want to do like check is my user code going to break from the perspective the library author.

The most important thing here is understanding existence of this pattern, knowing that there are options solving such problem, and there are even better approachs.

There are 2 fundamental ways that people ussually use to solve this

1. meta class

Show the code
# meta classes are merely derived from `type`
class BaseMeta(type):
    def __new__(cls, name, bases, namespace):
        print('BaseMeta.__new__', cls, name, bases, namespace)
        if name != 'Base' and 'bar' not in namespace:
            raise TypeError("Bad User class, 'bar' is not defined")
        return super().__new__(cls, name, bases, namespace)


# then our Base should be derived from BaseMeta
class Base(metaclass=BaseMeta):
    def foo(self):
        return self.bar()

metaclass

That’s it! you can control, constraint the derived class from the base class in class hierachy.

2. __init_subclass__

We can implement Base like this:

Show the code
class Base():
    def foo(self):
        return self.bar()

    def __init_subclass__(cls, **kwargs):
        print("init subclass", cls.__dict__, kwargs)
        if 'bar' not in cls.__dict__:
            raise TypeError("Bad sub class")
        super().__init_subclass__(**kwargs)

decorator

We would have met pattern like this, that’s is decorator:

Show the code
@dec 
def f():
  pass

Python is a live language, there is no separate step that turns function definitions into bags of a set bits tagged and some elf binary or some PE binary somewhere. That a function definition is actually a live thing, it actually runs at runtime, there’s actually executable code associated with this def f().

we can interact with our object as Python is a live language, the function itself is a runtime object

That every Python structure that you interact with whether it’s an object or a function or a generator has some runtime life, has some runtime existence you can see it in memory, you can ask it questions like what module UI were you defined in. you can even ask it very useful questions like if you use the inspect module you can say what’s your source code.

 decorators  python -i .\deco.py
>>> from inspect import getsource
>>> getsource(add)
'def add(a, b=10):\n    return a + b\n'
>>> print(getsource(add))
def add(a, b=10):
    return a + b 

>>> 

now let’s say I want to calculate how much time does it take to perform the add. simply we would thinking about time, like this:

Show the code
from time import time

def add(a, b=10):
    return a + b
 
before = time()
print('add(10)', add(10))
after = time()
print('time_taken', after - before)
print('add(20, 30)', add(20, 30))
after = time()
print('time_taken', after - before)
print('add("a", "b")', add("a", "b"))
after = time()
print('time_taken', after - before)
add(10) 20
time_taken 0.0
add(20, 30) 50
time_taken 0.0
add("a", "b") ab
time_taken 0.0

there is something wrong here because we need to add code to everywhere we want to calculate. we see another important pattern in Python, decorator, the Python developers want you to write the simplest, stupidest, quickest thing to get the job done and get the rest of your day with your family.

we will see alot of scenarios where of cases where you can write the simple and stupidest thing to get the job done and when the task at hand becomes harder or more complex or the requirements change you can change your code in a very simple fashion to linearly extend its functionality without have to re-write it from scratch.

in Python we’ll see a number of different features are orientated around how do we write the simplest thing today and then when the problem gets a little bit harder we have an avenue for making our code a little bit more complex.

we can modified a add alittle bit

Show the code
from time import time

def add(a, b=10):
    before = time()
    rv = a + b
    after = time()
    print('elapsed:', after - before)
    return rv

But if we have more functions, let say sub? We need to modify and make our code complicated. Python is live language that everything has some runtime representaion. We can create timer() which take func, x, and y as arguments (x and y will be forwarded to calculation function)

Show the code
from time import time

def timer(func, x, y):
    before = time()
    rv = func(x,y)
    after = time()
    print('elapsed:', after - before)
    return rv

# and 
print('add(20, 30)', timer(add, 20, 30))

we can also wrapping like this, timer() will return a function f:

Show the code
from time import time

def timer(func):
    def f(x, y=10):
      before = time()
      rv = func(x,y)
      after = time()
      print('elapsed:', after - before)
      return rv
    return f

# and 
add = timer(add)
print('add(20, 30)', add(20, 30))

Python provides a syntax for easily implement every behaviours of timer() to every function:

Show the code
@timer
def add(a, b=10):
    return a + b

print('add(20, 30)', add(20, 30))
Note

Fundamentally it’s about allowing you to take this wrapping behavior for functions and to wrap wide swathes of functions in one fashion without having to rewrite a lot of user code or having to even perform a lot of turn on your library code.

another example when we want a function to run n-times:

Show the code
n = 2 # let's say 2 times

def ntimes(f):
  def wrapper(*args, **kwargs):
    for _ in range(n):
      print('running {.__name__}'.format(f))
      rv = f(*args, **kwargs)
    return rv
  return wrapper
    
@ntimes
def add(a, b=10):
    return a + b

add(10, 20)
running add
running add
30

wrapp n in to the function so ntimes() takes n instead of f as arg:

Show the code
# high order decorators
def ntimes(n):
  def inner(f):
    def wrapper(*args, **kwargs):
      for _ in range(n):
        print('running {.__name__}'.format(f))
        rv = f(*args, **kwargs)
      return rv
    return wrapper
  return inner
    
@ntimes(3)
def add(a, b=10):
    return a + b

add(10, 20)
running add
running add
running add
30

There is a very important core concept that is hidden in here which is what you might call the closure object duality. it’s not something that we have time to look at in this session.

generator

recall what we’ve learned in previous section:

  • there are top-level syntax or function <–> and some underscore methods that implemented it.
  • if you have parentheses after something x() <–> implies __call__ protocol implemented.

what is different between these 2:

Show the code
def add1(x, y):
  return x + y


class Adder:
  def __call__(self, x, y):
    return x + y 
add2 = Adder()

print(add1(10, 20)) # 30
print(add2(10, 20)) # 20
type(add1) # <class 'function'>
type(add2) # <class '__main__.Adder'>

functionally there is not distinguish between add1 and add2. add1 is syntaxtically whole hell more easy to write. another diffence is if you want to add some statefull behaviors, we can easily modified the class:

Show the code
class Adder:
  def __init__(self):
    self.z = 0

  def __call__(self, x, y):
    self.z += 1
    return x + y + self.z

Đọc thêm về chuỗi bài này nếu thấy confuse

there is one what to do with the Adder, another way to do with the function, what he hinted at this object closure duality.

let’s think about a function that take a lot of time to do something, like loading the data from the database. we demo by a simple function compute(), which sleep 0.5 sec for each loop. the function only gives us the result once it completelt complete the loop, it will give us the entire result all at once. what if we care about the first? the first 3 values?

this is undesirable. this is wasteful both from the perspective of time & memory. let’s think about it with the object model and rewrite under Compute class:

Show the code
from time import sleep

def compute()
  rv = []
  for i in range(100):
    sleep(.5)
    rv.append(i)
  return rv

class Compute:
  def __init__(self):
    pass

  def __call__(self):
    rv = []
    for i in range(100):
      sleep(.5)
      rv.append(i)
    return rv
compute2 = Compute()

the 2nd method:

  • Retain state between calls (stateful callables)
  • Cache values that result from previous computations
  • Implement straightforward and convenient APIs

if you want to access data during computaion, think of:

for x in xs:
  pass

x1 = iter(xs)   ---> __iter__
while True:
  x = next(x1)  ---> __next__

and re-write the Compute like this:

Show the code
class Compute:
  def __iter__(self):
    self.last = 0
    return self 
  
  def __next__(self):
    rv = self.last
    self.last += 1
    if self.last > 10:
      raise StopIteration()
    sleep(.5)
    return rv

for val in Compute():
  print(val)

now, it:

  • takes 1 iteration to let you start using it;
  • takes no storage;
  • but looks ugly.

and there is much simpler way to write a function that operate in such fashion, the generator syntax, it merely:

Show the code
def compute()
  for i in range(10):
    sleep(.5)
    yield i

its state is maintained internally. instead of eagerly computing the values, you give them to the user as the ask. let’s look in the last example:

Show the code
class Api:
  def run_this_first():
    first()
  def run_this_second():
    second()
  def run_this_last():
    lass()
# in documentation you ask user to use those functions in order but no physically constraint you to not run like this:
Api.run_this_last()
Api.run_this_second()
Api.run_this_first()

the generator not only yield the result back, but also the control back to the caller. we can interleave our code with library code, controling them to run in order, that is conceptualization of how generators are built upon - co-routines.

we can either define execution function inside the class or use generator to control the sequence:

Show the code
...
  def doit():
    first()
    second()
    last()
... 

# or

def api()
  first()
  yield
  second()
  yield
  last()

context manager

setup & teardown ~ initial action and final action. we would often see pattern like:

Show the code
with open('ctx.py') as f:
  pass

beyond the file, it can be also sqllite

Show the code
from sqlite3 import connect

# control the execution is executed when connection is open
with connect('test.db') as conn:
  cur = conn.cursor()
  cur.execute('create table points(x int, y int)')
  cur.execute('insert into points(x, y) values (1, 1)')
  cur.execute('insert into points(x, y) values (2, 1)')
  cur.execute('insert into points(x, y) values (1, 2)')
  for row in cur.execute('select sum(x * y) from points'):
    print(row)
  for row in cur.execute('select x, y from points'):
    print(row)
  cur.execute('drop table points')
(5,)
(1, 1)
(2, 1)
(1, 2)

now we already control if connection is live to do something, we’ve not yet controlled inside the connection, the entry point and the exit point. now behind the scence

with ctx() as x:
  pass

will look like the following:

x = ctx().__enter__
try
  pass
finally:
  x.__exit__

we’ll try to emplement this to our connection, we’ll wrap it to a temporary table behaviour:

Show the code
from sqlite3 import connect

class Temporarytable:
  def __init__(self, cur):
    self.cur = cur
  def __enter__(self):
    print('__enter__')
    self.cur.execute('create table points(x int, y int)')
  def __exit__(self, *arg): # why need *arg
    print('__exit__')
    self.cur.execute('drop table points')

with connect('test.db') as conn:
  cur = conn.cursor()
  with Temporarytable(cur=cur):
    cur.execute('insert into points(x, y) values (1, 1)')
    cur.execute('insert into points(x, y) values (2, 1)')
    cur.execute('insert into points(x, y) values (1, 2)')
    for row in cur.execute('select sum(x * y) from points'):
      print(row)
    for row in cur.execute('select x, y from points'):
      print(row)
__enter__
(5,)
(1, 1)
(2, 1)
(1, 2)
__exit__

now can the exit run before the enter, no! they need to be run sequencely. it remind us about the generator to improve this:

Show the code
from sqlite3 import connect
def temporarytable(cur):
  print('created table')
  cur.execute('create table points(x int, y int)')
  yield
  cur.execute('drop table points')
  print('dropped table')

class Temporarytable:
  def __init__(self, cur):
    self.cur = cur
  def __enter__(self):
    self.gen = temporarytable(self.cur)
    next(self.gen)
  def __exit__(self, *arg):
    next(self.gen, None)

with connect('test.db') as conn:
  cur = conn.cursor()
  with Temporarytable(cur=cur):
    cur.execute('insert into points(x, y) values (1, 1)')
    cur.execute('insert into points(x, y) values (2, 1)')
    cur.execute('insert into points(x, y) values (1, 2)')
    for row in cur.execute('select sum(x * y) from points'):
      print(row)
    for row in cur.execute('select x, y from points'):
      print(row)
created table
(5,)
(1, 1)
(2, 1)
(1, 2)
dropped table

enter and exit are already implemented in function temporarytable and cursor is implemented in the connect() call. now i would rewrite like this

Show the code
from sqlite3 import connect

class Temporarytable:
  def __init__(self, gen):
    self.gen = gen
  def __call__(self, *args, **kwargs):
    self.args, self.kwargs = args, kwargs
    return self
  def __enter__(self):
    self.gen_instance = self.gen(*self.args, **self.kwargs)
    next(self.gen_instance)
  def __exit__(self, *arg):
    next(self.gen_instance, None)

@Temporarytable
def temporarytable(cur):
  cur.execute('create table points(x int, y int)')
  yield
  cur.execute('drop table points')

# temporarytable = Temporarytable(temporarytable)

with connect('test.db') as conn:
  cur = conn.cursor()
  with temporarytable(cur):
    cur.execute('insert into points(x, y) values (1, 1)')
    cur.execute('insert into points(x, y) values (2, 1)')
    cur.execute('insert into points(x, y) values (1, 2)')
    for row in cur.execute('select sum(x * y) from points'):
      print(row)
    for row in cur.execute('select x, y from points'):
      print(row)
(5,)
(1, 1)
(2, 1)
(1, 2)

all the stuff context manager above we do not need to write:

Show the code
from sqlite3 import connect
from contextlib import contextmanager

@contextmanager
def temporarytable(cur):
  cur.execute('create table points(x int, y int)')
  try:
    yield
  finally:
    cur.execute('drop table points')

# temporarytable = Temporarytable(temporarytable)

with connect('test.db') as conn:
  cur = conn.cursor()
  with temporarytable(cur):
    cur.execute('insert into points(x, y) values (1, 1)')
    cur.execute('insert into points(x, y) values (2, 1)')
    cur.execute('insert into points(x, y) values (1, 2)')
    for row in cur.execute('select sum(x * y) from points'):
      print(row)
    for row in cur.execute('select x, y from points'):
      print(row)
(5,)
(1, 1)
(2, 1)
(1, 2)

summary

all combination of what we’ve learned in the last example:

  • a context manager is merely some piece of code that pairs set up actions and teardown actions. teardown occurs only if set up occurs
  • a generator is merely some form of syntax that allows us to do things like enforce sequencing and interleaving notice the

finally we need something to adapt the generator to this data model that we looked at at the very beginning. we have these underscore methods and we have to find some way to take how the generator works and fit it into those underscore methods. one of the things we need to do in order to do that is we need to take this generator object to wrap it in some fashion that wrapping is part of the core of how Python works it’s easy to dynamics and construct functions

  • there does happen to be a feature called decorators that allows us a nice convenient syntax for doing that exactly.
Tip

criteria of Python expert code:

  • expert level code is not code that uses every single feature;
  • it’s in fact not code that even uses that many features of Python;
  • it’s code that has a certain clarity to where and when a feature should be used;
  • it’s code that doesn’t waste the time of the person who’s writing it because they say to themselves I have this pattern Python has this mechanism I fit them together and everything just seamlessly and and very smoothly works;
  • it’s code that doesn’t have a lot of additional mechanisms associated with it it doesn’t have people creating their own protocols it doesn’t have people creating their own frameworks where the language itself provides the core pieces that you need and you merely have to understand what those core pieces are what they need and how to assemble them.

futher reading

  1. Python data model
  2. Python metaclass __new__