Python design choices explained! (well, at least some of them)

I have started to learn Python after developing in C# for over 6 years.

After writing code in one language for such a long time, the structure of the language and the design choices seem natural.

So when I started learning Python I was keen to understand the choices behind some interesting design choices Python made, which differed from what I already knew.

In this post, I hope to give some insights into why some things are the way they are in Python. I hope that with this understanding  some things would be more clear and that you will be inspired to do some research of your own on other things you may wonder about!

It starts with the Zen of Python

The Zen of Python is a set of 20 general principles:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

the oddities start here – as only 19 principles were written down.

As we will see, we could explain some of the design choices by these general principles.

Tip: If you ever want to read these principles again, simply type “import this into your favorite Python shell 🙂

Why len is not a method?

Within minutes into learning Python I noticed that unlike many languages, such as Java or C#,  Python len is not a method. e.g if arr is an array we call len(arr) but not arr.length or something similar.

But why? isn’t this strange?

What len(arr) does “by the book” is to call arr.__len__ special method, so it does call a method within the object. So why not just call __len__ directly or call this unusual name “length” or “count” and be done with it?

The answer is somewhat surprising – the reason is that in some cases this is not what’s going to happen!

For built-in types like list and str (string type) there is a special optimization for getting their length.

These cases are incredibly frequent and appear many times within virtually any Python program. Getting the length of a string or a list has to be super-fast so it fits all use cases  of Python as a general-purpose programming language.

So how does making len not a method helps with that?

While we don’t often think about it, and we normally shouldn’t, calling a method is (relatively) expansive. Calling a method means pushing g parameters to the stack, jumping into a different memory location, calculating the return value, popping parameters back from the stack… and these are just some of the operations done under the hood! Modern languages make this so seamless that we don’t think about it, but there is definitely some stuff that takes time here.

For those built-in types the Python interpreter, mainly CPython (the C language implementation of the Python language and the de-facto standard) takes a different, shorter route: The C implementation defines PyObject which is explained as “All object types are extensions of this type” and PyVarObject which represents objects with length.

So the CPython implementation can do something quite simple: it returns the value of the ob_size field which internally stores the length – and most of the overhead is eliminated. Cool! 🙂

In this sense – len gets special treatment, and the consistency of the language suffers a bit in order to make the language faster for this extremely frequent case.

Now that we understand why len is not a method there is still one question that remains – what does the zen of Python has to do with this?

The zen of Python states:

Special cases aren’t special enough to break the rules.

but, it also says straight afterward:

Although practicality beats purity.

so I guess some cases are special enough after all!

The lesson here, the way I see it, that there are always design tradeoffs and that there is no universal “correct” answer. When we need to make a design choice, even one that might be a “no-brainier” at first,  we should consider our options and pick whichever solution fits best our need.

Why does range(x,y) includes x but not y?

Another question that interested me is the behavior of the built-in range function. For example, range(2,5) will include 2,3 and 4 – but not 5. Why? what interests me is that this behavior is not symmetric to both arguments as only the lower end is included. On one hand, this seems arbitrary, but since this is not what I would expect I thought there might be something behind this choice.

Let’s explore the different possibilities together: we can either include or exclude the lower end, and the same goes for the higher end. Continuing our example, we can denote the number 2,3,4 in four different options:

  • 2 ≤ i < 5
  • 1 < i ≤ 4
  • 2 ≤ i ≤ 4
  • 1 < i < 5

Python chose the first option, and it is not arbitrary at all!

Let’s recall that a bunch of the zen of Python statements:

Beautiful is better than ugly.
Simple is better than complex.
Readability counts.

and now for a trick question: how much iterations does the following (non-pythonic) loop preforms?

Did you say a 100? apparently, that’s such a common error it got it’s very own Wikipedia page about it!

Including both ends means that the difference between the bounds is not equal to the length of the sequence we are iterating over. This is both error-prone and complex.

Two suggestions lack this property – the last two suggestions. This is the reason we should prefer one of the two choices, which both exhibit asymmetry to their arguments.

Not convinced yet? wait, there’s more!

Let’s say we choose to pick the last suggestion and have both ends include for the sake of symmetry. Now consider a case where we have 20 tasks that need to be handled by 2 workers. What are the indices of the tasks each worker is working on?

The first work is assigned tasks with 0 ≤ i ≤ 9. The second one is assigned tasks with 10≤ i ≤ 19.

Isn’t it far more natural, and less error-prone to describe those indices as 0 ≤ i < 10 and 10 ≤ i < 20 ? Note that the endpoint of the first interval is the same as the endpoint of the second interval – so we don’t need to manipulate the limits when we have a stop-point and move to the next interval.

So both of the first two options achieve these nice properties.

Is the choice between the first two options arbitrary? nope!

Before I reveal the last argument, think of it for a second, what asymmetry is there to the numbers we iterate over?

The answer is this – there is a smallest natural number, and it is 0.

If we choose the second option, e.g describing 2,3,4 as 1 < i ≤ 4 then how will we include 0? we will need to write something like -1 < i ≤ x.  This looks strange at the very least and confusing in my opinion. But it’s also not clean from a mathematical point of view – to describe sequences of natural numbers (non-negative integers) we have to resort to using a non-natural number – minus 1. This violet a closure property that mathematicians are fond of.

To conclude, by the method of elimination, we should include the lower limit, but not the high limit – like Python chose to!

I also want to emphasize that Python deliberately doesn’t support for loops like I attached in the code snippet and tries hard so that we won’t need to manipulate indices – the choice that range will only include the lower end is a part of that choice.

I think that it takes someone special to note that there is a good choice among seemingly four equivalent options  – it takes Edsger Dijkstra!

Why do we have to explicitly use self?

I hope, that at this time of reading the post you know that there is probably some zen of Python principle behind this decision. And look no further:

Explicit is better than implicit.

But is there more to it? yes, there is!

Unlike the two previous cases, this is not a ‘local’ choice. That is, the thinking process is not “should we explicitly use self or not” and then go over the meaning of each choice.

This time, this is a consequence of another, bigger, design choice.

Unlike languages like C++, Java or C#, in Python we don’t declare variables. And that’s a big difference that drove the decision to make self  explicit. Let’s say we don’t need to explicitly use self, and consider the following statement within a function: x=1. Now for a trick question: did we just assign to a local variable or to an instance variable?

without explicitly using self there is no good way of knowing. At least without making yourself familiar with the entire class and thinking about aliasing. It is far easier that x=1 means we assigned to a local parameter and self.x=1 means we assigned to an instance variable, no other context required.

And is there more to think about this? yes, there is – as always with interesting design choices.

Summery

At this post, we examined three of the design choices made by Python.

I am amazed that while at first glance some of the design choices look random, they are in fact well thought out choices.

And there are lessons to be learned too:

First – know what tradeoffs you are making and choose wisely.

Second – explore different options – some might have benefits that will surprise you.

Third – be aware that some design choices will have further design consequences.

And lastly, and most impotently – always ask why!

Until next time,

Belgi

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: