Wednesday, March 14, 2012

2012 Language Summit Report

This year's Language Summit took place on Wednesday March 7 in Santa Clara, CA before the start of PyCon 2012. As with previous years, in attendance were members of the various Python VMs, packagers from various Linux distributions, and members of several community projects.

The Namespace PEPs

The summit began with a discussion on PEPs 382 and 402, with Barry Warsaw leading much of the discussion. After some discussion, the decision was ultimately deferred with what appeared to be a want for parts of both PEPs.

As of Monday at the PyCon sprints, both PEPs have been rejected (see the Rejection Notice at the top of each PEP). Martin von Loewis posted to the import-sig list that a resolution has been found and Eric Smith will draft a new PEP on the ideas agreed upon there. Effectively, PEP 382 has been outright rejected, while portions of PEP 402 will be accepted.

importlib Status

Brett Cannon announced that there is a completed and available branch of CPython using importlib at http://hg.python.org/sandbox/bcannon/. See the bootstrap_importlib named branch.

Discussion began by outlining the only real existing issue, which lies in stat'ing of directories. There's a minor backwards incompatibility issue with time granularity. However, everyone agreed that it's so unlikely to be of issue that it's not a showstopper and the work can move forward. Additionally, there was an optimization made around the stat calls, which was arrived at independently by each of Brett, Antoine Pitrou, and P.J. Eby.

The topic of performance came up and Brett explained that the current pure-Python implementation is around 5% slower. Thomas Wouters exclaimed that 5% slower is actually really good, especially given some recent benchmark work he was doing showing that changing compilers sometimes shows a 5% difference in startup time. There was a shared feeling that 5% slower was not something to hold up integration of the code, which pushed discussion happily along.

Brett went on to explain what the bootstrapping actually looks like, even asserting that the implementation finds what could be the first real use of frozen modules! Guido's first response was, "you mean to tell me that after 20 years we finally found a use for freezing code?"

importlib._bootstrap is a frozen module containing the necessary builtins to operate, along with some re-implementations of a small number of functions. Some of the libraries included in the frozen module are warnings, _os (select code from posix), and marshal.

Another compatibility issue was brought up, but again, was decided to be an issue unworthy of halting the progress on this issue. There's a negative level count which is not supported in importlib, used in implicit relative imports, and it was agreed that it's acceptable to continue not supporting it.

The future will likely result in a strip down of import.c, as well as the exposure of numerous hooks as well as exposure of much of the importlib API.

As for merging with the default branch, it was pretty universally agreed upon that this should happen for 3.3 and it should happen soon in order to get mileage on the implementation throughout the alpha and beta cycles. Since this will be happening shortly, Brett is going to follow-up to python-dev with some cleanup details and look for reviews.

Release Schedule PEPs

Discussion on PEPs 407 and 413 followed the importlib talk. Like the namespace PEP discussion, several ideas were tossed around but the group didn't arrive at any conclusion on acceptability of the PEPs.

Immediately, the idea of splitting out the standard library to be on its own was resurrected, which could lend itself to both PEPs. Some questions remain, namely in where would the test suite live. Additionally, there may need to be some distinction between the tests which cover standard libraries versus the tests which cover language features.

The topic of versioning came up, with three distinctions needing to be made. We would seem to need a version of the language spec, a version of the implementation, and a version of the standard library.

Many commenters mentioned that these PEPs make things too complicated. Additionally, there was a question about whether there are enough users who care about either of these changes being made. Several of us stated that we could use the quicker releases, but with so many users being stuck on old versions for one reason or another, there was a wonder of who would take the releases.

Thomas Wouters mentioned a good point about the difficulty in lining up the so-called Python "LTS" releases with other Python consumers who do similar LTS-style releases. Ubuntu and their LTS schedule was a prime example, as well as the organizations who plan releases atop something like Ubuntu. Many of the Linux distribution packagers in attendance seemed to agree.

One thing that seemed to have broad agreement was that shortening the standard library turnaround time would be a good thing in terms of new contributors. Few people are interested in writing new features that might not be released for over a year -- it's just not fun. Even with bug fixes, sometimes the duration can be seen as too long, to the point where users may end up just fixing our problems from within their own code if possible.

Guido went on to make a comment about how we hope to avoid the mindset some have of "my package isn't accepted until it's in the standard library". The focus continues to be on projects being hosted on PyPI, being successful out in the wild, then vetted for acceptance in the standard library after maturity of the project and its APIs.

It was suggested that perhaps speeding up bug fix releases could be a good move, but we would need to check with release managers to ensure they're on board and willing to expend the effort to produce more frequent releases. As with the new feature releases, we need to be sure there's an audience to take the new bug fixes.

There was also some discussion about what have previously been called "sumo" releases. Given that some similar releases are already made by third-party vendors, the idea didn't seem to gain much traction.

Funding from the Python Software Foundation

PSF Chairman Steve Holden joined the group after lunch to mention that the foundation has resources available to assist development efforts, especially given the sponsorship success of this year's conference. While the foundation can't and won't dictate what should be coded up, they're open to proposals about the types of work to be funded.

Steve and Jesse Noller were adamant about the support not only being for all Python implementations, but also for third-party projects. What's needed to begin funding for a project is a concrete proposal on what will be accomplished. They stressed that the money is ready and waiting -- proposals are the way to unlock it.

Some ideas for how to use the funding came from Steve but also from around the room. One idea which started off the discussion was the idea of funding one-month sabbaticals. Then comes the issue of who might be available. Some suggested that freelance consultants in the development community might be the ones we should try to engage. Those with full-time employment may find it harder to acquire such a sabbatical, but the possibility is open to anyone.
Another thought was potential funding of someone to do spurts of full-time effort on the bug tracker, ideally someone already involved in the triage effort. This type of funding would hope to put an end to the times when it takes three days to fix a bug and three years for the patch to be accepted. Some thought this might be a nice idea in the short term, but it could be tough work and burn out the individual(s) involved. If anyone is up for it, they're encouraged to propose the idea to the foundation.

Along similar lines of tracker maintenance, Glyph Lefkowitz of the Twisted project had an idea to fund code reviews over code-writing efforts. Some thought this might be a good way to push forward the regex/re situation, given that the regex is very large and most felt that the only thing holding it back from some form of inclusion is an in-depth review. The cdecimal module was mentioned as another project that could use some review assistance.

The code review funding is also an idea to push forward some third-party project's ports to Python 3, specifically including Twisted, which the group felt was an effort which should receive some of this funding.

Along the way it was remarked that the core-mentors group has been a success in involving new contributors. Kudos to those involved with that list.

virtualenv Inclusion

In about two minutes, discussion on PEP 405 came and went. Carl Meyer mentioned that a reference implementation is available and is working pretty well. A look from the OSX maintainers would be beneficial, and both Ned Deily and Ronald Oussoren were in attendance. It seemed like one of the only things left in terms of the PEP was to find someone to make a declaration on it, and Thomas Wouters put his name out there if Nick Coghlan wasn't going to do t (update: Nick will be the PEP czar).

PEP 397 Inclusion

Without much of a Windows representation at the summit, discussion was fairlyquick, but it was pretty much agreed that PEP 397 was something we should accept. Brian Curtin spoke in favor of the PEP, as well as mentioning ongoing work on the Windows installer to optionally add the executable's directory to the Path.

After discussion outside of the summit, it was additionally agreed upon that the launcher should be installed via the 3.3 Windows installer, while it can also live as a standalone installer for those not taking 3.3. Additionally, there needs to be some work done on the PEP to remove much of the low-level detail that is coupled too tightly with the implementation, e.g., explaining of the location of the py.ini file.

speed.python.org

After generous hardware donations, the http://speed.python.org site has gone live and is currently running PyPy benchmarks. We need to make a decision on what benchmarks can be used as well as what benchmarks should be used when it comes to creating a Python 3 suite. As we get implementations on Python 3 we'll want to scale back 2.7 testing and push forward with 3.x.

The project suffers not from a technological problem but from a personnel problem, which was thought to be another area that funding could be used for. However, even if money is on the table, we still need to find someone with the time, the know-how, and the drive to complete the task. Ideally the starting task would be to get PyPy and CPython implementations running and comparing. After that, there are a number of infrastructure tasks in line.

PEP 411 Inclusion

PEP 411 proposes the inclusion of provisional packages into the standard library. The recently discussed regex and ipaddr modules were used as examples of libraries to include under this PEP. As for how this inclusion should be implemented and denoted to users was the major discussion point.

It was first suggested that documentation notes don't work -- we can't rely only on documentation to be the single notification point, especially for this type of code inclusion. Other thoughts were some type of flag on the library to specify its experimental status. Another thought was to emit a warning on import of a provisional library, but it's another thing that we'd likely want to silence by default in order to not affect user code in the hopes that developers are running their test suite with warnings enabled. However, as with other times we've gone down this path, we run the risk of developers just disabling warnings all together if they become annoying.

As has been suggested on python-dev, importing a provisional library from a special package, e.g., from __experimental__ import foo, was pretty strongly discouraged. If the library gains a consistent API, it penalizes users once it moves from provisional status to being officially accepted. Aliasing just exacerbates the problem.

The PEP boils down to being about process, and we need to be sure that libraries being included use the ability to change APIs very carefully. We also need to make people, especially the library author, aware of the need to be responsive to feedback and open to change as the code reaches a wider audience.

Looking back, Jesse Noller suggested multiprocessing would have been a good candidate for something like this PEP is suggesting. Around this time, it was suggested that Michael Foord's mock could gain some provisional inclusion within unittest, perhaps as unittest.mock. Instead, given mock's stable API and wide use among us, along with the need for a mocking library within our own test suite, it was agreed to just accept it directly into the standard library without any provisional status.

While on the topic of ``regex``'s role within the PEP came an idea from Thomas Wouters that ``regex`` be introduced into the standard library, bypassing any provisional status. From there, the previously known ``re`` module could be moved to the ``sre`` name, and there didn't appear to be any dissenting opinion there.

It should also be noted to users of provisional libraries that the library maintainers would need to exercise extreme care and be very conservative in changing of the APIs. The last thing we want to do is introduce a good library but as a moving target to its users.

Keyword Arguments on all builtin functions

As recently came up on the tracker, it was suggested that wider use of keyword arguments in our APIs would likely be a good thing. Gregory P. Smith suggested that we leave single-argument APIs alone, which was agreed upon. However, the overall change got some push back as "change for change's sake".

In order to support this, the PyArg_ParseTuple function would need to do more work, and it's already known to be somewhat slow. Alternatively, PyArg_Parse is much faster, and the tuple version could take a thing or two from it regardless of any wide scale change to builtins.

There does exist some potential break in compatibility when replacing a builtin function with a Python one, where positional-only arguments suddenly get a potentially conflicting name.

It was widely agreed upon that we should avoid any blanket rules and keep changes to places where it makes sense rather than make wholesale changes. We also need to be mindful of documentation and doc strings being kept to match the actual keyword argument names as well as keep them in sync.

OrderedDict was suggested as the container for keyword arguments, but Guido and Gregory were unsure of use-cases for that. Whether or not we use a traditional or ordered dictionary, it was suggested that we could possibly use a decorator to handle some of this. We could even go as far as exposing something like PyArg_ParseTuple as a Python-level function.

PEP 362, a proposal for a function signature object, would help here and with decorators in general. It seems that all that's left with that PEP is another look and someone to declare on it.

Porting to Python 3

We moved on to talk about Python 3 porting, starting with the current strategies and how they're working out. Single-codebase porting is working better than expected for most of us, although except handling is a bit messy when supporting versions like 2.4. Having a lot of options, from 3to2 to 2to3, then the single codebase through parallel trees, is a really good thing. However, it's hard for us to choose a strategy for projects, so we don't, which is why most documentation tries to lay numerous strategies out there.

It was suggested that documentation could stand to gain more examples of real-world porting examples, ideally pointing to changesets of these projects. The thought of our porting documentation gaining a cookbook-style approach seemed to get some agreement as a good idea.

Hash Randomization

Release candidates are available to all branches receiving security fixes, and in the meantime, David Malcolm found and reported a security issue in the upstream expat project. However, since the upstream fix includes many other fixes at the same time, we should pick up only the security fix at this time and leave the bug fixes for the next bug fix release of the relevant branches.

New dict Implementation

Since the implementation makes sense and the tests pass, it was quickly agreed upon that Mark Shannon's PEP 412 should be accepted. As with other changes agreed upon in this summit, we'd like for the change to be pushed soon in order to get mileage on it throughout the alpha and beta cycles. With this acceptance comes commit access for Mark so that he can maintain the code.

It was also remarked that the only user-visible difference that this implementation brings is a difference in sort ordering, but the recent hash randomization work makes this a moot point.

New pickle Protocol

PEP 3154, mentioned by Lukasz Langa, specifies a new pickle protocol -- version 4. Lukasz mentioned exception pickling in multiprocessing as being an issue, and Antoine solved it with this PEP. While qualified names provide some help, it was agreed upon that this PEP needs more attention.


If you have any questions or comments, please post to python-dev.

Thanks to Eric Snow and Senthil Kumaran for contributing to this post.