Wednesday, March 30, 2011

Meet the Team: Benjamin Peterson

This post is part of the "Meet the Team" series of posts, which is meant to give a brief introduction to the Python core development team.
Name:Benjamin Peterson
Location:Minnesota, USA
Home Page:https://benjamin.pe

How long have you been using Python?

3.5 years.

How long have you been a core committer?

Exactly 3 years this March 25th.

How did you get started as a core developer? Do you remember your first commit?

My first proposal was personally rejected by Guido himself. Luckily, I persisted and got some patches accepted. I believe my first commit was reordering the Misc/ACKS file.

Which parts of Python are you working on now?

I like the parser, compiler, and interpreter core, but I've been known to dabble in just about every part of core Python development... except Windows!

What do you do with Python when you aren't doing core development work?

I use it to implement a Python interpreter (http://pypy.org)! Truly, I'm a Python implementor at heart. :) I am the creator of six (http://pypi.python.org/pypi/six), a Python 2 and 3 compatibility library.

What do you do when you aren't programming?

Compose music, play clarinet, and read math books. I do a little hiking now and then, too.

Monday, March 28, 2011

Deprecations between Python 2.7 and 3.x

Recent discussion on python-dev highlighted an issue with Python's current deprecation policy facing developers moving from Python 2.7 to current versions of Python 3.x. As a result of this issue, the development team has modified the current deprecation policy to take into account the fact that Python users will normally migrate directly from Python 2.7 to the latest version of 3.x without ever seeing older versions.

Background

Python has a strong commitment to backward compatibility. No change is allowed unless it conforms to compatibility guidelines, which in essence say that correct programs should not be broken by new versions of Python. However, this is not always possible, e.g., where an API is clearly broken and needs to be replaced by something else. In this case, Python follows a deprecation policy based on a one-year transition period where features to be removed are formally deprecated. In the intermediate period, a deprecation warning must be issued to allow developers time to update their code. Full details of Python's deprecation policy are documented in PEP 5. As changes are only made in new Python releases, and there is normally an 18 month gap between releases, this means that a one-release deprecation period is the norm.

The one exception to this policy was Python 3. The major version change from Python 2 to Python 3 was specifically intended to allow changes which broke backward compatibility, to allow the Python developers the chance to correct issues which simply couldn't be fixed within the existing policy. For example, making strings Unicode by default, and returning iterators instead of lists.

Parallel Lines of Development

Knowing the transition to Python 3 would take time, 5 years by many estimates, there was going to be some amount of parallel development on 2 and 3.

With Python 2.7 being the final release of Python 2, it was agreed upon that the maintenance period would be extended for a substantial period. In the end, developers who want to move to a newer version of Python will need to make the jump to Python 3.

Here lies one the problems...

Surprise deprecations

In a thread on python-dev, a poster pointed out that one specific function in the C API, PyCObject_AsVoidPtr, was removed with what appeared to be insufficient warning. And yet, this is what the deprecation policy was supposed to protect against! What happened?

The change was part of a larger migration from an older API (PyCObject) to a newer, improved one (PyCapsule). The problem is that PyCObject is the default, and indeed, only API available in Python 2.6. It went on to be deprecated in Python 2.7. In Python 3.2, that API doesn't exist and the new PyCapsule should be used. That gives a deprecation period from the release of Python 2.7 (July 2010) to the release of Python 3.2 (February 2011) - about 7 months. That is a lot less than the minimum 12 month period, and makes it difficult for developers to support a reasonable range of Python releases.

For someone moving from 3.0 to 3.1 then 3.2, the deprecation path is fine. Python 3.1 was released in March 2010 with the deprecation, and so in the 3.x release series, a deprecation period of almost 12 months was available. However, that's not what people really do: they go from 2.7 straight to the latest version of 3.x, in this case 3.2, resulting in this problem. This was never the intention of python-dev, but PEP 5 had not been written with parallel versions of Python, both of which were under active development, in mind.

So what do we do?

While the PyCObject/PyCapsule API break is a definite problem, it's not impossible to work around, but at least one poster on python-dev had some difficulties to deal with. Overall, this shouldn't have happened.

For the specific case of PyCObject/PyCapsule, the problem already exists and there is not much that can be done. Reinstating PyCObject was not really an option, as that would only add further incompatibilities. However, the general view was that it is possible, albeit tedious, to write code to adapt to whichever API is available. In fact, in Python 3.1, the PyCObject API was written as a wrapper over the PyCapsule API. There was a suggestion that should anyone need it, the Python 3.1 implementation could be extracted for use in 3rd party code. Additionally, it was agreed that a "retroactive" PEP covering the change would be written, to describe the reasons behind the change and document resources which can help developers migrate.

On a more general note, the Python development team is now aware of the problem and will work to avoid it reoccurring. Guido posted a review of the situation and suggested that Python 3 should be conservative in the use of deprecations for the moment. At a minimum, deprecated APIs will be retained substantially longer before being removed, to give developers moving from 2.7 a migration path.

More indirectly, the thread raised the issue of how to more effectively communicate changes in Python to a wider audience, in a more timely manner - an issue that this blog was formed precisely to address.

What does all this mean?

First and foremost, it means that the Python developers don't always get everything right. Nobody meant to make life harder for developers, it just wasn't something that was spotted in time.

Secondly, fixing the problem can do more harm than good, so the PyCObject API is not being reinstated. While reinstatement might help developers who were bitten by the change, overall it would make compatibility issues more complex. In the meantime, we have to put up with the issue and move on. Lessons were learned, and we won't make the same mistake next time.

On thing this shows is that the Python development team wants to hear from the users. Compatibility is very important, and every effort is made to make the transition to new versions as painless as possible. In particular, library developers should be able to support multiple Python versions with a reasonable level of effort.

Finally, the developers haven't abandoned 2.7. While it won't be getting new features and there will be no 2.8, the views of people using 2.7 are still important. Making sure users can move to 3.x when they are ready is vital for the whole Python community.

Thursday, March 24, 2011

Of polling, futures and parallel execution

One of the big concerns in modern computing is saving power. It matters a lot in portable devices (laptops, tablets, handhelds). Your modern CPU is able to enter a various number of low-power states when it is idle. The longer it stays idle, the deeper the low-power state, and the lower the energy consumed, and, therefore, the longer the battery life of your device on a single charge.

Low-power states have an enemy: polling. When a task periodically wakes up the CPU, even for something as trivial as reading a memory location to check for potential changes, the CPU leaves the low-power state, wakes up all its internal structures, and will only re-enter a low-power state long after your menial periodic wakeup has finished its intended work. This kills battery life. Intel itself feels concerned.

Python 3.2 comes with a new standard module to launch concurrent tasks and wait for them to end: the concurrent.futures module. While perusing its code, I noticed that it used polling in some of its worker threads and processes. I'm saying "some of", as the implementation differs between the ThreadPoolExecutor and the ProcessPoolExecutor. The former did polling in each of its worker threads, while the latter only did so in a single thread named the queue management thread, which is used to communicate with the worker processes.

Polling here was only used for one thing: detecting when the shutdown procedure should be started. Other tasks such as queueing callables or fetching results from previously queued callables use synchronized queue objects. These queue objects come from either the threading or multiprocessing module depending on which executor implementation you are using.

So, I came up with a simple solution: I replaced this polling with a sentinel, the built-in sentinel named None. When a queue receives None, one waiting worker is naturally woken up and checks whether it should shutdown or not. In the ProcessPoolExecutor, there is a small complication as we need to wake up N worker processes in addition to the single queue management thread.

In my initial patch, I still had a polling timeout; a very large one (10 minutes) so that the workers would wake up at some point. The large timeout existed in case the code is buggy and they didn't get a shutdown notification through the aforementioned sentinel when they should. Out of curiousity, I dove into the multiprocessing source code and came to another interesting observation: under Windows, multiprocessing.Queue.get() with a non-zero, non-infinite timeout uses...polling (for which I opened issue 11668). It uses an interesting high-frequency kind of polling, since it starts with a one millisecond timeout which is incremented at every loop iteration.

Needless to say that still using a timeout, however huge, would render my patch useless under Windows since the way that timeout is implemented would involve wakeups every millisecond. So I bit the bullet and removed the huge polling timeout. My latest patch doesn't use a timeout at all, and therefore should cause no periodic wakeups, regardless of the platform.

Historically speaking, before Python 3.2, every timeout facility in the threading module, and therefore in much of multiprocessing since multiprocessing itself uses worker threads for various tasks, used polling. This was fixed in issue 7316.

2011 Language Summit Report

This year's Language Summit took place on Thursday March 10 in Atlanta, the day before the conference portion of PyCon began. In attendance were members of the CPython, PyPy, Jython, IronPython, and Parrot VMs; packaging developers from Fedora, Ubuntu, and Debian; developers of the Twisted project, and several others.

Development Blog

One of the first orders of business was discussion of this very blog, initiated by PSF Communications Officer Doug Hellmann. Due to the high-traffic and often intense nature of the python-dev mailing-list, the blog hopes to be an easier way for users to get development news. We plan to cover PEPs, any major decisions, new features, and critical bug fixes, and will include informal coverage of what's going on in the development process.

Posting to the blog is open to all implementations of Python. For example, while PyPy already has their own active blog, they are welcome to have news posted here as well. A related side discussion lead to the alternative implementations also being mentioned on the python.org download page. Their releases will also be listed as news items on the python.org front page.

Compatibility Warnings

With 3.2, we introduced ResourceWarning to allow users to find areas of code that depend on CPython's reference counting. The warning not only helps users write better code, but allows them to write safer cross-VM code. To further cross-VM compatibility, a new warning type was suggested: CompatibilityWarning.

The idea came up due to a recently filed CPython bug found by the PyPy developers. Issue #11455 explains a problem where CPython allows a user to create a type with non-string keys in its __dict__, which at least PyPy and Jython do not support. Ideally, users could enable a warning to detect such cases, just as they do with ResourceWarning.

Standalone Standard Library

Now that the transition of CPython's source from Subversion to Mercurial has been completed, the idea of breaking out the standard library into its own repository was resurrected. The developers of alternative implementations are very interested in this conversion, as it would greatly simplify their development processes. They currently take a snapshot from CPython and apply any implementation specific patches, replace some C extensions with pure Python versions, etc.

The conversion will need to be laid out in an upcoming PEP, and one of the discussion points will be how versioning will be worked out. Since the library will live outside of any of the implementations, it would likely be versioned by itself, and the tests will need version considerations as well.

Another topic for the standard library breakout was pure Python implementations and their C extension counterparts. Maciej Fijalkowski of the PyPy project mentioned that over time, some modules have had minor feature differences between their C and Python versions. As discussion of the breakout goes on, the group suggested a more strict approach to changing such modules, as to not penalize the use of one or the other. Additionally, a preference on pure Python implementations was decided, with C implementations being created only in the event that a performance gain is achieved.

Performance Benchmark Site

The PyPy Speed Center has done a great job of showing PyPy's performance results, and some discussion was had about hosting a similar site on python.org, possibly as performance.python.org for all VMs to take part in. In addition to performance benchmarks, others such as memory usage, test success, and language compatibility should be considered. Some effort will be needed to adapt the infrastructure to work with multiple Python implementations, as it currently tests PyPy vs. CPython.

Talk of putting some high-performance machines in the Open Source Lab at Oregon State University, where Allison Randal is on the board, came up as a target for where the new Speed Center could live. Jesse Noller mentioned efforts to obtain hardware to put in the lab -- donations welcome!

If you or your organization are interested in donating for this cause or others, please contact the Python Software Foundation and check out our donations page.

Moratorium Lifted

With the start of development on CPython 3.3, the moratorium on language changes has been lifted. While the flood gates are open, language changes are expected to be conservative while we try to slow the rate of change and continue to allow alternative implementations to catch up. Although no one caught up to the 3.x line thanks to the moratorium, PyPy and IronPython recently reached 2.7 compatibility, and IronPython is beginning down the road to 3.x.

As for what language changes are expected in 3.3, look forward to seeing PEP 380 accepted. The PEP introduces a new yield from <expr> syntax, allowing a generator to yield to another generator. Other than this, no other language changes are expected in the near future.

Exception Attributes

The next topic was a quick discussion on exceptions providing better attributes, rather than forcing users to rely on string messages. For example, on an ImportError, it would be useful to have easy access to the import which failed, rather than parsing to find it.

The implementation will likely rely on a keyword-only argument when initializing an exception object, and a patch currently exists for the ImportError case.

Contributor Agreements

Contributor agreements were also mentioned, and some form of electronic agreement is underway. Google's individual contributor agreement was one of several inspirations for what the new system should work like. The topic has been long discussed, and many people are looking forward to a resolution in this area. Additionally, research is being done to ensure that any move to an electronic agreement remains valid in non-US jurisdictions.

Google Summer of Code

Martin von Löwis took a minute to introduce another year of Google Summer of Code under the PSF umbrella. Developers are encouraged not only to act as mentors, but also to propose projects for students to work on -- and remember that suggesting a project does not imply that you will mentor it. If you are interested in helping in any way, see the PSF's Call for Projects and Mentors.

Distutils

Distutils2 came up and Tarek Ziadé mentioned that their sprint goal was to finish the port to Python 3 and prepare for the eventual merger back into the Python standard library. Additionally, with the merge comes a new name: packaging. The packaging team also plans to provide a standalone package, still called Distutils2, supporting Python 2.4 through 3.2.

The result of the packaging sprint, which was one of the larger groups at the PyCon sprints, was very successful. Their current results are on Bitbucket, awaiting the standard library merge.

The Future of Alternative VMs

IronPython mentioned their future plans, and a 3.x release is next on their plate. They announced their 2.7.0 release at PyCon, their first community-based release since the project was handed off from Microsoft, and will be starting towards 3.x over the next few months.

Jython recently came out with a 2.5.2 release and have begun planning on a 2.6 release. Some suggested that they jump to 2.7, as the differences between 2.6 and 2.7 aren't all that great, but it may take longer to get a first release if they jump. "Release early, release often" was one of the quotes coming out of the talk, and they might be able to get away with going 2.6 to 3.x and considering any 2.6 to 2.7 differences after the fact.

Development Funding

Coming out of the 3.x planning talks was the topic of funding for development work and how it might be able to speed up some of the alternative implementations getting to 3.x. While funds are available, a proposal to the PSF has to be made before anything can be discussed. Those interested in receiving grants for these efforts should contact the PSF board.

Baseline Python

Jim Fulton began a discussion on what he called "baseline" Python. In his experience deploying Python applications, he has found the system Python to be unpredictable and difficult to target. With Fedora and Ubuntu/Debian packaging experts on-hand, we were able to get a look into why things are the way they are.

For Fedora, the base Python install has the Live CD in mind, so it's a very minimal installation with few dependencies, basically the bare minimum to allow the system to run. Additional differences are seen in directory layouts, removal of standard library modules like distutils, or that the distribution provides out-of-date libraries.

There didn't appear to be a clear-cut solution right away, but the relevant parties will continue to work on the problem.

3.3 Features

Some thoughts for 3.3 features came up, including two PEPs. PEP 382, covering Namespace Packages, should appear at some point in the cycle. It was also mentioned during the distutils merger topic.

PEP 393, defining a flexible string repesentation, was also up for discussion and also has some interested students as a GSoC project. Along with the implementation, some effort will need to be placed on the performance and memory characteristics of the new internals in order to see if they can be accepted.

Unladen Swallow

Unladen Swallow is currently in a "resting" state and will not be included in CPython 3.3 as-is. To make further progress, we would need to identify several champions, as the domain experts are unavailable to do the work. During the discussion, it was again mentioned that if funding is what it would take to push Unladen Swallow to the next level, interested parties should apply to the PSF.

While Unladen Swallow is in its resting state and has an uncertain future, the project provided a large benefit to the Python and general open source community. The benchmark suite used by Unladen Swallow is very useful for testing alternative implementations, for example. Additionally, contributions to LLVM and Clang from the Unladen Swallow developers helped out those projects as well.

Two other performance ideas were also briefly discussed, including Dave Malcolm's function inlining proposal. Martin von Löwis mentioned a JIT extension module he has in the works, although the PyPy developers expressed skepticism of the effectiveness of a JIT of this kind.

Paving a Path to Asynchronous Frameworks

Ending the day was a discussion of some level of integration of Twisted into the standard library. The main idea is that an alternative to asyncore exists which allows for an easier transition to Twisted or other asynchronous programming frameworks.

The process will be laid out in an upcoming PEP, which some suggested would serve a purpose similar to the WSGI reference but for asynchronous event loops. Along with the PEP author(s), the Twisted project and others will need to put in effort to ensure everyone is on the same page.

More Information

For more information, see CPython developer Nick Coghlan's rough notes and highlights

Wednesday, March 23, 2011

Welcome to Python Insider!

Python Insider is the official blog of the Python core development team. It will provide a way for people who don't follow the mailing list to get an overview of topics discussed there, and especially to learn about changes in store for Python. We will be writing about Python-Dev activities such as the recently completed migration to Mercurial hosting, newly approved Python Enhancement Proposals (PEPs), API changes, and other major efforts going on in Python core development.

The blog will be an addition to, rather than a replacement for, the python-dev mailing list and individual developer blogs (see the links in the sidebar). It will provide a channel for talking about projects publicly after they are complete, or when they reach a stage where more volunteers are needed. While discussion on the blog is welcomed, we hope that people interested in the topics raised will join the python-dev mailing list and follow the discussion and development directly.

Think of this blog as your window into the evolution of Python.

Subscribing

There are several ways to follow and read Python Insider:

Help Wanted

Although we do have a team of dedicated writers working on posts for the blog, we are looking for someone with web design skills to work on the Blogger template. If you can help us give the blog a face-lift, contact Doug Hellmann (doug dot hellmann at gmail).