::::: : the wood : davidrobins.net

Python 3: a dearth of modules, a need for CPAN

News, Technical ·Thursday December 24, 2009 @ 06:17 EST (link)

It was not without reason that the Freenode #Python channel tried to dissuade me from using Python 3: they're correct, it has very few modules. Fortunately, the Python core is quite strong. And many modules can easily be converted using the 2to3 utility; pyinotify was such a module, only requiring a few non-automated tweaks, generally in the bytes vs. str area, which I converted for my monitor utility.

I'm currently looking at extending XBMC in a few ways. XBMC has an embedded Python interpreter, mainly because XBox could only run one process, at least back then; I'd like to generalize to other scripts, using D-Bus for IPC, and also to allow scrapers to be arbitrary scripts rather than using the rather baroque custom XML regular-expression parsers required presently.

Another project is MP3 renaming; stagger is a good library (there's one other that works in Python 3, but isn't as easy to use, although this one has a lousy web site—I have no idea if it's still being worked on or not—and required reading the code to use). The detect_tag function doesn't detect ID3v1 (yes, I have some really old MP3 files, mostly old Christian ska bands, from Ben Sloetjes…). I ended up checking for ID3v2, catching NoTagError, and checking for ID3v1 using stagger.Tag1.read, catching NoTagError again; and if that fails, creating a new tag. Although stagger.default_tag is a Tag24 (ID3v2.4) object, it seems most applications can't deal with them (found this out when the id3 and id3v2 binaries couldn't see a tag after my script had tag things), so I used ID3v2.3 (Tag23) instead. I used eyeD3's --to-v2.3 switch on the top-level music directory to recursively convert everything. (The eyeD3 utility is written in Python; I'd use the eyeD3 library except it doesn't work in Python 3 and it wasn't worth the effort to try to convert when stagger already existed.)

In general the Python package index (PyPI) lacks a lot when compared to CPAN, and apparently I'm not the first to think so (misnamed page with some arguments; Guido's comment that people want CPAN). Presumably the CPAN people could help: anyone that would mirror Perl would probably mirror Python for the same reasons (advance open source, contribute to a good cause, rational self-interest, whatever).

There is a little more talk (and a lot less action) at the Python catalog and distutils SIGs. The distutils SIG has a wiki which appears to be not dead (it has a roadmap for current/future Python versions, at time of writing, 2.7 and 3.2). I joined the mailing list and joined a current discussion about what CPAN has that PyPI and Python's distutils lack. One thing I really miss is online documentation for modules; some aren't even stored on PyPI (making mirroring hard); I don't think they even have a consistent indexing system, although some have XML manifest files of some sort linked. I'm not sure how to resolve this—CPAN is massive, with mirrors everywhere, with PAUSE as access/control system, and it keeps multiple versions, etc. Steffen Mueller started a thread Python people want CPAN and how the latter came about which has been active although not necessarily productive (there's a lot of denial).

My mail responding to a response to SM's mail with some concrete things that CPAN has that Python's distribution network needs:

On Mon, Dec 21, 2009 at 11:13:31AM +0100, Lennart Regebro wrote:
What nobody still fails to explain in this discussion is what CPAN "is" and Why Python doesn't already have it. There is just a lot of "CPAN is great!" And "Python needs CPAN" but noone can come up with one single thing that CPAN does that Python doens't have, or explain why CPAN is so great, where PyPI isn't. And unless somebody can do that, this discussion ain't going nowhere. :)

Here are a couple things I really like about CPAN:

1. Module documentation - the perldoc is extracted, formatted as HTML, and is available for browsing (e.g., search.cpan.org - perhaps this is part of the "sugar" described by Steffen but it tastes delicious). The same could presumably be done with pydoc. (Some modules have some documentation on PyPI, but it's not the pydoc, just a summary.) (The local pydoc server also doesn't help me for modules that I don't have installed yet, and installing every module matching, say, "ID3", and then reading the pydoc is a significant hurdle.)

Slightly tangentially, the Python community doesn't seem to have instilled the same documentation culture as the Perl folk. The standard perldoc sections (DESCRIPTION, SYNOPSIS, etc.) are enormously helpful, whereas pydoc seems limited to very dry docstrings, and tends to import unneeded extras (e.g., when 'pydoc dbus' shows the dbus.Array class, it also feels a need to list all the methods of the __builtin__.list class from which it inherits). It's become more of a rule than an exception to have to examine the module source to determine how to use a Python module.

2. A conceptual link between different versions of the same module. On CPAN (search.cpan.org), there's a page for module X with a dropdown for the known versions and their release dates, which may also be downloaded. PyPI appears to treat multiple versions of the same package as completely different entries. A link to an extracted changelog is also convenient.

3. Index by module name (as well as package name). Even further, it would help predictability to make the two match when possible (as perl module X::Y version V will usually be X-Y-V.tar.gz), or at least obviate the need to display package names. Frequently I don't care about the cutesy package name, just what it implements.

4. Namespaces and some way of reserving them. There are likely many modules named postgresql on PyPI, but there's only one DBD::Pg (although there are other PostgreSQL modules that implement the perl DBI driver interface). This also helps with specifying dependencies.