Converting Python2 to Python3

(Last updated 2/15/24)

tl;dr

I believe the most useful part of this article is the section on “Incompatibilities”, below. That section lists problems not handled by the “future” module, and for which I was unable to find documentation on the differences. You might want to skip directly to that section.

dl;dr 2 (2/25/24)

I have come to realize, through the experience of gradually migrating my collection of Python programs over several years that almost all of my programs can be converted by simply handling the print conundrum as follows:

Add the following line immediately after the docstring at the beginning of the program:
from __future__ import print_function
Replace all print statements with the corresponding print() function.

Introduction

This is a technical article on my experience converting my personal code from version 2 to version 3 of the python programming language. In particular I want to document several differences I haven’t read about elsewhere. See the section on “Incompatibilities” for details.

I have put off doing this conversion for well over a year. I envisioned several difficulties:

I have multiple virtual machines, almost all of which have python2 as the primary python version. I want my code to be able to work on either a python2 or python3 system.
I have several python modules written in C
I have two Macintosh systems that run on python2.

See the section on Methods for details of how I addressed these.

Methodology

Environment

Since I frequently use virtual machines, I decided to use a virtual machine as the environment in which to work on this conversion. I chose Ubuntu 21.04 as the guest operating system. I chose Ubuntu because it is my preferred distro for most Linux work and current versions seems to be somewhat “agnostic” as to python version. I used 21.04 simply because it is the current version. Yes, I know it is not an LTS (Long Term Support) version, but then this is not a long-term project (I hope).

To synchronize both python code and C source code, I use a git repository. For this project I have created a side branch in git in an attempt to reduce accidental premature introduction of python3 code into my general environment.

Most of the modules I have written contain unit tests. This has proven extremely useful to me in this project because I can use those tests to easily do a reasonably thorough job of testing the python3 version of those modules. (Most of the modules I wrote in C are “wrapped” by python modules and those latter contain unit tests also.)

References

I used the following websites as a guide:

https://docs.python.org/3/howto/pyporting.html

http://python-future.org/compatible_idioms.html

http://python-future.org/automatic_conversion.html

https://docs.python.org/3.0/whatsnew/3.0.html

This last reference is the (close to) complete list of changes, written by Guido van Rossum, the creator of Python, and, at the time the referenced article was written, the so-called “benevolent dictator for life” for Python.

Handling the Expected Difficulties

More Than One Environment

I looked at several options of how to cope with the multiplicity of real and virtual machines I use.

My original idea was to support both python2 and python3 on every machine. The difficulty here is primarily with python modules written in C. I will discuss that in the following section, but for now, suffice it to say it would make a more complicated operating environment.

The opposite approach would be to do a “big bang” where I would convert my code to only support python3: initially in the one virtual machine where I’m working on the conversion but then roll out those changes everywhere. This would mean that once I started rolling out the change, I would break every real or virtual environment until I converted that environment to python3.

I decided to take a middle ground. My current plan is to make the python code itself compatible with both python2 and python3. But, because of the modules written in C, I would only have one version of python on any one machine. This means that I would keep most existing environments as python2. For specific environments I believe it would not be that difficult to switch from python2 to python3 once the basic conversion of code is done.

Python Modules Written in C

I have a number of python modules written in C. This is largely because I already had the C code and didn’t want to re-code it in Python. (I believe I could do such a re-coding but it would be a lot of work, and I would end up supporting two implementations of the same functionality since I also have applications written in C that use the code in question.)

I knew, as a minimum, I would have to compile the code with python3 header files. I was worried that that wouldn’t be enough: that there would be other incompatibilities. So far my worries have not materialized. By simply recompiling my code with python3 headers it has all worked so far.

I could have separate makefiles for the two python versions, but for now I’m using a simpler approach: I put the python3 headers ahead of the python2 headers in the search list for include and library files. Thus, if I have the python3 headers installed on a given system, they will be preferred.

Macintosh Systems

Originally, MacOS systems shipped with python2. More recently, to deal with the issues of which python version to support, Apple has decided to punt and not install any version of python. This leaves the user to install whatever version they like.

I recently updated a Mac Mini to a newer version (not the M1 version–mine still uses and Intel CPU). I used a time-machine backup to copy my environment from the old Mac Mini to the new one. This, I’m happy to report, copied my python2 system the new machine.

At various times I had thought about also installing a python3 system, but was afraid it would corrupt the python2 system. Then I discovered that python3 was already installed by not one but two apps. I found that by simply putting a python3 executable in my search path I could run python3 (but not my own python programs).

I have yet to decide whether and when to upgrade my macs to python3. This is one reason I want to keep my python code (and C source code) compatible with both versions.

My General Process

I started by downloading the python3 header files and compiling my modules that were written in C. To my surprise they all compiled the first time, and minimal testing confirmed they work as before. (Almost all of these modules have “wrapper” modules that I wrote in python to provide a more pythonic interface. I deferred detailed testing to the point where I converted the corresponding wrapper modules.)

I have installed the “future” support in my virtual machine wherein I am doing the conversion.

I wrote a shell script to step through the process of converting a given program or module, but after using that script for awhile I decided it was too complicated and abandoned it. I have since written it to execute the sequence of commands I normally use:

futurize
fix_convert.py (my program to undo or alter some of the things done by futurize)
pylint
python3 (to run the newly-converted program

I knew I wouldn’t get any single python program to work until I had converted all of the modules that used. I have lots of modules.

In fact I have a program called “modules” that lets me examine other modules (either those I wrote or those that came from other sources). I decided my “modules” program would be good first candidate to convert.

The steps I am using for any given module or program are:

1. Run the code through the “futurize program” that is part of the “future” package (Reference 3 above):

futurize -w whatever.py

2. Run pylint on the “futurized” program.

3. Run the program, or if a module, run the unit test for that module.

This will produce one of three results:

I will get syntax errors from some unconverted module included by the program or module under test. In that case recurse to step 1 for the module that’s failing.
Some other error will occur. These will typically be either problems described on the referenced websites or new problems I had to address (and described in the section, “Incompatibilities” below). Debug and repeat step 2.
The program (or module’s unit test) works. If I had recursed this procedure then go back to the previous level of recursion, i.e. back to the program or module I was testing when I discovered the current module to be failing. If I’m not recursing then pick another program to work on.

Incompatibilities

This section describes specific problems I found that I didn’t see documented anywhere else.

Buffer Flushing

In python2 sys.stdout and sys.stderr buffers follow the same rules used by the C run-time library. That is not the case in python3. I fixed this by adding an explicit flush where necessary:

        if message is None:
            message = "Press any key to continue..."

        sys.stderr.write(message)
        sys.stderr.flush()

Signals Aborting Calls to sleep()

In python2, the receipt of a signal aborts a call to time.sleep(). In python3 the sleep is restarted after the call to the signal handler. I was using a long time.sleep() call to really mean “sleep until a signal is received”.

Luckily, in my situation, I just added a call to sys.exit() in the signal handler, which worked for me, but is not a general solution.

def handler(signal, stack):
    """Signal handler"""
    info("Signal %s (%s) received" % (signal,
        process.Signals.numberToSymbol(signal)))

    if not loop:
        sys.exit(0)

Handling Comparison Operators in Class Instances

In python2, the presence of a __cmp__() method in a class will handle operations such as == != > >= etc. in an instance of that class. That is not the case in python3. I added calls to __eq__() and __lt__() in my classes. (The other comparisons can be synthesized by the interpreter from these two.*) In my case, since I already had a __cmp__() method, I just called that from my new methods. So far as I can tell, __cmp__() is never called by the interpreter in python3. (It might be called by cmp() but I don’t normally use that.)

*Update: I ran across a case where “a <= b” produced an exception where a and b were members of a class that defined __eq__() and __lt__(). This could have been synthesized as “a < b or a == b” but it wasn’t. When I added a definition for __le__() the code worked.

    def __lt__(self, time2):
        return self.__cmp__(time2) < 0

    def __eq__(self, time2):
        return self.__cmp__(time2) == 0
    
    def __cmp__(self, time2):
        """
        Operator overload for < > <= >= == !=
        Compare two times.  First must be a 
        Time object;  second can be any of 
        the forms allowed by constructor 
        except hours, minutes
        """
        time2 = Time(time2)
        result = cmp(self.hh, time2.hh)
        if result == 0:
            result = cmp(self.mm, time2.mm)
            if result == 0:
                return cmp(self.ss, time2.ss)
        return result

exec Doesn’t Set Local Variables

exec("import %s as namespace" % module_name)

In python2 the variable, namespace, will be set after the above statement has been run; not on python3. This was driving me crazy because I was getting an error saying “namespace” was not defined, yet when I displayed it in the debugger it was defined.

It turns out exec (like print) is a statement in python2 but a function in python3. Obviously a function cannot affect the local variables of the calling function.

The solution is described here: https://stackoverflow.com/questions/15086040/behavior-of-exec-function-in-python-2-and-python-3

        exec_locals = {}
        exec("import %s as namespace" % \
            module_name, exec_locals)
        namespace = exec_locals["namespace"]

Fortunately, this code works in python2 as well.

signals are enums not ints

In python2, the function, signal.signal(), takes an int to represent the signal number. In python3, it takes an enum. This is fine if you are passing a specific signal, but what about code like this:

for i in range(1, 32):
    try:
        signal.signal(i, handler)

    except (RuntimeError as error:
            warn(
                "Setting signal %d failed -- skipping (%s) " % (i, type(error))
            )

This signal.signal() call will fail in python3. The correction is to add:

for i in range(1, 32):
    try:
        #  In python3 the signal number is an enum
        if hasattr(signal, "Signals"):
            i = signal.Signals(i)
    
        signal.signal(i, handler)

    except (RuntimeError, OSError, ValueError) as error:
            warn(
                "Setting signal %d failed -- skipping (%s) " % (i, type(error))
            )

(Note that signal.signal() throws more types of exceptions in python3 than in python2.)

list.sort no longer takes a cmp parameter

This no longer works:

         self.row_numbers.sort(cmp=self._sort)
...
    def _sort(self, x, y):
        """Sort the randomized list to least-used first"""
        row_x = self.db.getrow(x)
        row_y = self.db.getrow(y)

        return cmp(row_x.rand, row_y.rand)

Now you must use:

        self.row_numbers.sort(key=self._key)
...
    def _key(self, x):
        """used by sort of the randomized list to least-used first"""
        row = self.db.getrow(x)
        return row.rand

Retesting

After I changed the main body of python code that I use, I ran my “daily processing” for a week in the python3 environment where I found some of the problems described above. After that I ran for another week in a python2 environment where I found the following additional problems. Finally, I ran yet another week on python3 to make sure my python2 changes didn’t break execution on python3.

next vs. next

An iterator for an object is created by calling the __iter__ method, which should return a reference to an object (possibly self) which will handle the iteration. In this object, there must be a method called next() in python2 and __next__() in python3. The futurize command will convert all appropriate references for next to become __next__. However that doesn’t work in python2. So I simply went back and added the following:

next = __next__

Perhaps a better solution would be to use the alternate definition of __iter__ which uses a co-routine and the “leave” statement. However, I haven’t tried that.

Which str class?

The futurize command provides a new definition of str which in python2 equates to a unicode string. That works in some cases, but not in others. This is done with the following line of code inserted into the code by futurize:

from builtins import str

This is specifically problematic when passing an str to C code which expects a C (i.e. ASCII) string, and the above import (in python2) converts strings to unicode.

My solution is to simply comment out the above import.

Perhaps it is also possible to save the old definition of str in code something like this:

old_str = str
from builtins import str

I haven’t tried that.

unicode is not defined in python3

In python2, str’s are ASCII and there is a separate built-in class, unicode, for Unicode strings. That’s not necessary in python3 since all str’s are unicode, specifically UTF-8, which is more-or-less equivalent to ASCII for many purposes. However, I have several places where I have:

if isinstance(x, unicode):

This doesn’t work in python3 so futurize changes these to:

if isinstance(x, str):

However, that does’t always work for me. I have solved this one of two ways:

        try:
            if isinstance(arg1, unicode):
                arg1 = str(arg1)

        except NameError:
            pass

or:

try:
    unicode

except NameError:
    unicode = str

I thought pylint would complain about this, but it didn’t.

ttk is now part of tkinter

In python2 ttk is a separate module from Tkinter. In python3 it is a module within tkinter. This causes futurize to translate all references, including the import, from “ttk” to “tkinter.ttk”. That code doesn’t work on python2. So I did the following:

try:
    import ttk

except ImportError:
    import tkinter.ttk as ttk

I then manually converted all references to “tkinter.ttk” back to “ttk”.

Tools

In the process of doing this conversion, I wrote a couple of tools:

fix_futurize: a python program that corrects some of the changes from futurize that don’t really work in python2.
py_convert: a shell script that runs futurize, fix_futurize, pylint, and then runs the converted program.

I would be happy to upload these to github, but haven’t done so yet because they would need some clean-up to remove dependencies on other tools. Let me know in the comments if you’re interested.

Status

June 12, 2021: As of this writing, I am still in the process of cycling through various modules and programs, making them compatible with python3. After that I have to retest the code in python2. I will update this post as I gain additional experience.

June 27: I have finished converting “all” of my python code to python3. (I put “all” in quotes, because I’m sure I missed something.) I now intend to run my day-to-day processing for a week on my python3 virtual machine to further shake down the conversion.

July 17: I have finished testing my code on both python3 and python2 as described above. Starting tomorrow, I plan to go back to running my normal Macintosh environment, which for the time being is still on python2. I didn’t convert or test every single python program I have written, so I expect to continue to convert and/or debug programs with (hopefully) decreasing frequency for some time to come.

1 comment

glenn says:

February 4, 2024 at 9:56 am

I have been “commenting out” these three statements inserted by futurize:

from future import standard_library
standard_library.install_aliases()
from builtins import str

The last one causes python3 to complain about sending Unicode data to ASCII char* interfaces in modules written in C. My solution of reverting to the native interface seems to work even for non-ASCII UTF-8 characters.

The first two don’t work on systems where I haven’t installed the “future” module.

I have recently converted my Mac Mini to Python 3 which mostly works. (Some google interfaces don’t like that Python 3 is in a non-standard location. I have coded a workaround for this problem that doesn’t depend on Python 3.)

I am now down to one device still using Python 2: My Raspberry
Pi 3. Converting that is low priority until something breaks.

Converting Python2 to Python3

tl;dr

dl;dr 2 (2/25/24)

Introduction

Methodology

Environment

References

Handling the Expected Difficulties

More Than One Environment

Python Modules Written in C

Macintosh Systems

My General Process

Incompatibilities

Buffer Flushing

Signals Aborting Calls to sleep()

Handling Comparison Operators in Class Instances

exec Doesn’t Set Local Variables

signals are enums not ints

list.sort no longer takes a cmp parameter

Other Problems

file vs. open

Types of strings

pylint

Retesting

next vs. next

Which str class?

unicode is not defined in python3

ttk is now part of tkinter

Tools

Status

1 comment

Leave a comment Cancel reply

tl;dr

dl;dr 2 (2/25/24)

Introduction

Methodology

Environment

References

Handling the Expected Difficulties

More Than One Environment

Python Modules Written in C

Macintosh Systems

My General Process

Incompatibilities

Buffer Flushing

Signals Aborting Calls to sleep()

Handling Comparison Operators in Class Instances

exec Doesn’t Set Local Variables

signals are enums not ints

list.sort no longer takes a cmp parameter

Other Problems

file vs. open

Types of strings

pylint

Retesting

next vs. __next__

Which str class?

unicode is not defined in python3

ttk is now part of tkinter

Tools

Status

1 comment

Leave a comment Cancel reply

next vs. next