Converting Python2 to Python3

(Last updated 2/15/24)

tl;dr

I believe the most useful part of this article is the section on “Incompatibilities”, below. That section lists problems not handled by the “future” module, and for which I was unable to find documentation on the differences. You might want to skip directly to that section.

dl;dr 2 (2/25/24)

I have come to realize, through the experience of gradually migrating my collection of Python programs over several years that almost all of my programs can be converted by simply handling the print conundrum as follows:

  1. Add the following line immediately after the docstring at the beginning of the program:
    from __future__ import print_function
  2. Replace all print statements with the corresponding print() function.

Introduction

This is a technical article on my experience converting my personal code from version 2 to version 3 of the python programming language. In particular I want to document several differences I haven’t read about elsewhere. See the section on “Incompatibilities” for details.

I have put off doing this conversion for well over a year. I envisioned several difficulties:

  • I have multiple virtual machines, almost all of which have python2 as the primary python version. I want my code to be able to work on either a python2 or python3 system.
  • I have several python modules written in C
  • I have two Macintosh systems that run on python2.

See the section on Methods for details of how I addressed these.

Methodology

Environment

Since I frequently use virtual machines, I decided to use a virtual machine as the environment in which to work on this conversion. I chose Ubuntu 21.04 as the guest operating system. I chose Ubuntu because it is my preferred distro for most Linux work and current versions seems to be somewhat “agnostic” as to python version. I used 21.04 simply because it is the current version. Yes, I know it is not an LTS (Long Term Support) version, but then this is not a long-term project (I hope).

To synchronize both python code and C source code, I use a git repository. For this project I have created a side branch in git in an attempt to reduce accidental premature introduction of python3 code into my general environment.

Most of the modules I have written contain unit tests. This has proven extremely useful to me in this project because I can use those tests to easily do a reasonably thorough job of testing the python3 version of those modules. (Most of the modules I wrote in C are “wrapped” by python modules and those latter contain unit tests also.)

References

I used the following websites as a guide:

https://docs.python.org/3/howto/pyporting.html

http://python-future.org/compatible_idioms.html

http://python-future.org/automatic_conversion.html

https://docs.python.org/3.0/whatsnew/3.0.html

This last reference is the (close to) complete list of changes, written by Guido van Rossum, the creator of Python, and, at the time the referenced article was written, the so-called “benevolent dictator for life” for Python.

Handling the Expected Difficulties

More Than One Environment

I looked at several options of how to cope with the multiplicity of real and virtual machines I use.

My original idea was to support both python2 and python3 on every machine. The difficulty here is primarily with python modules written in C. I will discuss that in the following section, but for now, suffice it to say it would make a more complicated operating environment.

The opposite approach would be to do a “big bang” where I would convert my code to only support python3: initially in the one virtual machine where I’m working on the conversion but then roll out those changes everywhere. This would mean that once I started rolling out the change, I would break every real or virtual environment until I converted that environment to python3.

I decided to take a middle ground. My current plan is to make the python code itself compatible with both python2 and python3. But, because of the modules written in C, I would only have one version of python on any one machine. This means that I would keep most existing environments as python2. For specific environments I believe it would not be that difficult to switch from python2 to python3 once the basic conversion of code is done.

Python Modules Written in C

I have a number of python modules written in C. This is largely because I already had the C code and didn’t want to re-code it in Python. (I believe I could do such a re-coding but it would be a lot of work, and I would end up supporting two implementations of the same functionality since I also have applications written in C that use the code in question.)

I knew, as a minimum, I would have to compile the code with python3 header files. I was worried that that wouldn’t be enough: that there would be other incompatibilities. So far my worries have not materialized. By simply recompiling my code with python3 headers it has all worked so far.

I could have separate makefiles for the two python versions, but for now I’m using a simpler approach: I put the python3 headers ahead of the python2 headers in the search list for include and library files. Thus, if I have the python3 headers installed on a given system, they will be preferred.

Macintosh Systems

Originally, MacOS systems shipped with python2. More recently, to deal with the issues of which python version to support, Apple has decided to punt and not install any version of python. This leaves the user to install whatever version they like.

I recently updated a Mac Mini to a newer version (not the M1 version–mine still uses and Intel CPU). I used a time-machine backup to copy my environment from the old Mac Mini to the new one. This, I’m happy to report, copied my python2 system the new machine.

At various times I had thought about also installing a python3 system, but was afraid it would corrupt the python2 system. Then I discovered that python3 was already installed by not one but two apps. I found that by simply putting a python3 executable in my search path I could run python3 (but not my own python programs).

I have yet to decide whether and when to upgrade my macs to python3. This is one reason I want to keep my python code (and C source code) compatible with both versions.

My General Process

I started by downloading the python3 header files and compiling my modules that were written in C. To my surprise they all compiled the first time, and minimal testing confirmed they work as before. (Almost all of these modules have “wrapper” modules that I wrote in python to provide a more pythonic interface. I deferred detailed testing to the point where I converted the corresponding wrapper modules.)

I have installed the “future” support in my virtual machine wherein I am doing the conversion.

I wrote a shell script to step through the process of converting a given program or module, but after using that script for awhile I decided it was too complicated and abandoned it. I have since written it to execute the sequence of commands I normally use:

  • futurize
  • fix_convert.py (my program to undo or alter some of the things done by futurize)
  • pylint
  • python3 (to run the newly-converted program

I knew I wouldn’t get any single python program to work until I had converted all of the modules that used. I have lots of modules.

In fact I have a program called “modules” that lets me examine other modules (either those I wrote or those that came from other sources). I decided my “modules” program would be good first candidate to convert.

The steps I am using for any given module or program are:

1. Run the code through the “futurize program” that is part of the “future” package (Reference 3 above):

futurize -w whatever.py

2. Run pylint on the “futurized” program.

3. Run the program, or if a module, run the unit test for that module.

This will produce one of three results:

  • I will get syntax errors from some unconverted module included by the program or module under test. In that case recurse to step 1 for the module that’s failing.
  • Some other error will occur. These will typically be either problems described on the referenced websites or new problems I had to address (and described in the section, “Incompatibilities” below). Debug and repeat step 2.
  • The program (or module’s unit test) works. If I had recursed this procedure then go back to the previous level of recursion, i.e. back to the program or module I was testing when I discovered the current module to be failing. If I’m not recursing then pick another program to work on.

Incompatibilities

This section describes specific problems I found that I didn’t see documented anywhere else.

Buffer Flushing

In python2 sys.stdout and sys.stderr buffers follow the same rules used by the C run-time library. That is not the case in python3. I fixed this by adding an explicit flush where necessary:

        if message is None:
            message = "Press any key to continue..."

        sys.stderr.write(message)
        sys.stderr.flush()

Signals Aborting Calls to sleep()

In python2, the receipt of a signal aborts a call to time.sleep(). In python3 the sleep is restarted after the call to the signal handler. I was using a long time.sleep() call to really mean “sleep until a signal is received”.

Luckily, in my situation, I just added a call to sys.exit() in the signal handler, which worked for me, but is not a general solution.

def handler(signal, stack):
    """Signal handler"""
    info("Signal %s (%s) received" % (signal,
        process.Signals.numberToSymbol(signal)))

    if not loop:
        sys.exit(0)
        

Handling Comparison Operators in Class Instances

In python2, the presence of a __cmp__() method in a class will handle operations such as == != > >= etc. in an instance of that class. That is not the case in python3. I added calls to __eq__() and __lt__() in my classes. (The other comparisons can be synthesized by the interpreter from these two.*) In my case, since I already had a __cmp__() method, I just called that from my new methods. So far as I can tell, __cmp__() is never called by the interpreter in python3. (It might be called by cmp() but I don’t normally use that.)

*Update: I ran across a case where “a <= b” produced an exception where a and b were members of a class that defined __eq__() and __lt__(). This could have been synthesized as “a < b or a == b” but it wasn’t. When I added a definition for __le__() the code worked.

    def __lt__(self, time2):
        return self.__cmp__(time2) < 0

    def __eq__(self, time2):
        return self.__cmp__(time2) == 0
    
    def __cmp__(self, time2):
        """
        Operator overload for < > <= >= == !=
        Compare two times.  First must be a 
        Time object;  second can be any of 
        the forms allowed by constructor 
        except hours, minutes
        """
        time2 = Time(time2)
        result = cmp(self.hh, time2.hh)
        if result == 0:
            result = cmp(self.mm, time2.mm)
            if result == 0:
                return cmp(self.ss, time2.ss)
        return result

exec Doesn’t Set Local Variables

exec("import %s as namespace" % module_name)

In python2 the variable, namespace, will be set after the above statement has been run; not on python3. This was driving me crazy because I was getting an error saying “namespace” was not defined, yet when I displayed it in the debugger it was defined.

It turns out exec (like print) is a statement in python2 but a function in python3. Obviously a function cannot affect the local variables of the calling function.

The solution is described here: https://stackoverflow.com/questions/15086040/behavior-of-exec-function-in-python-2-and-python-3

        exec_locals = {}
        exec("import %s as namespace" % \
            module_name, exec_locals)
        namespace = exec_locals["namespace"]

Fortunately, this code works in python2 as well.

signals are enums not ints

In python2, the function, signal.signal(), takes an int to represent the signal number. In python3, it takes an enum. This is fine if you are passing a specific signal, but what about code like this:

for i in range(1, 32):
    try:
        signal.signal(i, handler)

    except (RuntimeError as error:
            warn(
                "Setting signal %d failed -- skipping (%s) " % (i, type(error))
            )

This signal.signal() call will fail in python3. The correction is to add:

for i in range(1, 32):
    try:
        #  In python3 the signal number is an enum
        if hasattr(signal, "Signals"):
            i = signal.Signals(i)
    
        signal.signal(i, handler)

    except (RuntimeError, OSError, ValueError) as error:
            warn(
                "Setting signal %d failed -- skipping (%s) " % (i, type(error))
            )

(Note that signal.signal() throws more types of exceptions in python3 than in python2.)

list.sort no longer takes a cmp parameter

This no longer works:

         self.row_numbers.sort(cmp=self._sort)
...
    def _sort(self, x, y):
        """Sort the randomized list to least-used first"""
        row_x = self.db.getrow(x)
        row_y = self.db.getrow(y)

        return cmp(row_x.rand, row_y.rand)

Now you must use:

        self.row_numbers.sort(key=self._key)
...
    def _key(self, x):
        """used by sort of the randomized list to least-used first"""
        row = self.db.getrow(x)
        return row.rand

Other Problems

This section lists a few other problems that are documented to some extent and which I was already aware of, but were still sources of frequent problems.

file vs. open

Most courses, web pages, etc. that discuss file I/O refer to the open() built-in function as the way to open files. In python2 that function will return a file object. So I decided that using file() instead of open() was more object oriented. Maybe, but in python3 there is no file() object and open() creates some kind of io object. I wish futurize would catch this, but I guess that would erroneously convert calls to some use-specified file() callable.

Types of strings

In python2 there are two types of common strings:

  • str – containing ASCII-encoded characters, or alternately any other type of bytes of data. Each individual element in the sequence is one byte of memory, which is also of type str.
  • unicode – containins UTF-8-encoded characters. Each individual element in the sequence is one character which is also of type unicode and may encompass one or more bytes.

In python3 there are also two types of common strings, but they do not map well to those of python2:

  • str – containing UTF-8-encoded characters. (ASCII can be considered a subset of UTF-8, but it might be dangerous to make that assumption.) Each individual element in the sequence is one character which is also of type str and may encompass one or more bytes.
  • bytes – containing any type of bytes of data. Each individual element in the sequence is an int.

These distinctions between strings in python2 and python3 are well documented, but they have caused me more grief trying to fix my logic so that it works in both python2 and python3.

pylint

I have always considered pylint my friend. But it has become a pain for several reasons:

  • It sometimes flags constructs created by futurize. Some of these are not correctable, such as complaining about the order of imports when the first import defines the latter.
  • It gives spurious messages about undefined function arguments. I have yet to find the cause or cure for this problem.

For example, the following line of code:

import pyscrn
...
    pyscrn.set_color(pyscrn.LSCRN_YELLOW)

generates this error message from pylint:

Netconfig.py:164:4: E1120: No value for argument ‘background’ in function call (no-value-for-parameter)

import _pyscrn

#  functions and other variables in _pyscrn are all exported by this
#  module as well.
# pylint: disable=wildcard-import
# pylint: disable=unused-wildcard-import
from _pyscrn import *
...
def set_color(fg=LSCRN_NOCHANGE, bg=LSCRN_NOCHANGE):
    """
    This function sets screen colors.

    The argument, fg, sets the text or foreground color
    The argument, bg, sets the background color.

    The values for both fg and bg should be one of the data values LSCRN_*.

    If either value is omitted, then LSCRN_NOCHANGE is assumed, and that color
    is not altered from its previous value.
    """
    _pyscrn.set_color(fg, bg)

Here is the definition of set_color in _pyscrn.py (code generated by swig):

def set_color(foreground, background):
    return __pyscrn.set_color(foreground, background)

Obviously pylint is picking up the definition from _pscrn.py instead of pyscrn.py.

(I’ll give a clarifying examples for each of these, the next time I run across one.)

Yes, I know, I should probably file a bug. But my experience with filing open-source bugs is one can spend a lot of time creating a test case to reproduce the problem, and then no one ever bothers to look at the bug. That is obviously true with pylint as well: I’ve looked in their bug tracking system.

  • pylint complains of “useless inheritance from object”. This may be useless in python3 but it has a specific meaning in python2.
  • Things that are flagged in python3’s version of pylint are not defined in the python2 version; the end result is one way or the other you end up with problems you can’t tell pylint to ignore. I hate ignoring errors/warnings etc.

Retesting

After I changed the main body of python code that I use, I ran my “daily processing” for a week in the python3 environment where I found some of the problems described above. After that I ran for another week in a python2 environment where I found the following additional problems. Finally, I ran yet another week on python3 to make sure my python2 changes didn’t break execution on python3.

next vs. __next__

An iterator for an object is created by calling the __iter__ method, which should return a reference to an object (possibly self) which will handle the iteration. In this object, there must be a method called next() in python2 and __next__() in python3. The futurize command will convert all appropriate references for next to become __next__. However that doesn’t work in python2. So I simply went back and added the following:

next = __next__

Perhaps a better solution would be to use the alternate definition of __iter__ which uses a co-routine and the “leave” statement. However, I haven’t tried that.

Which str class?

The futurize command provides a new definition of str which in python2 equates to a unicode string. That works in some cases, but not in others. This is done with the following line of code inserted into the code by futurize:

from builtins import str

This is specifically problematic when passing an str to C code which expects a C (i.e. ASCII) string, and the above import (in python2) converts strings to unicode.

My solution is to simply comment out the above import.

Perhaps it is also possible to save the old definition of str in code something like this:

old_str = str
from builtins import str

I haven’t tried that.

unicode is not defined in python3

In python2, str’s are ASCII and there is a separate built-in class, unicode, for Unicode strings. That’s not necessary in python3 since all str’s are unicode, specifically UTF-8, which is more-or-less equivalent to ASCII for many purposes. However, I have several places where I have:

if isinstance(x, unicode):

This doesn’t work in python3 so futurize changes these to:

if isinstance(x, str):

However, that does’t always work for me. I have solved this one of two ways:

        try:
            if isinstance(arg1, unicode):
                arg1 = str(arg1)

        except NameError:
            pass

or:

try:
    unicode

except NameError:
    unicode = str

I thought pylint would complain about this, but it didn’t.

ttk is now part of tkinter

In python2 ttk is a separate module from Tkinter. In python3 it is a module within tkinter. This causes futurize to translate all references, including the import, from “ttk” to “tkinter.ttk”. That code doesn’t work on python2. So I did the following:

try:
    import ttk

except ImportError:
    import tkinter.ttk as ttk

I then manually converted all references to “tkinter.ttk” back to “ttk”.

Tools

In the process of doing this conversion, I wrote a couple of tools:

  • fix_futurize: a python program that corrects some of the changes from futurize that don’t really work in python2.
  • py_convert: a shell script that runs futurize, fix_futurize, pylint, and then runs the converted program.

I would be happy to upload these to github, but haven’t done so yet because they would need some clean-up to remove dependencies on other tools. Let me know in the comments if you’re interested.

Status

June 12, 2021: As of this writing, I am still in the process of cycling through various modules and programs, making them compatible with python3. After that I have to retest the code in python2. I will update this post as I gain additional experience.

June 27: I have finished converting “all” of my python code to python3. (I put “all” in quotes, because I’m sure I missed something.) I now intend to run my day-to-day processing for a week on my python3 virtual machine to further shake down the conversion.

July 17: I have finished testing my code on both python3 and python2 as described above. Starting tomorrow, I plan to go back to running my normal Macintosh environment, which for the time being is still on python2. I didn’t convert or test every single python program I have written, so I expect to continue to convert and/or debug programs with (hopefully) decreasing frequency for some time to come.

1 comment

  1. I have been “commenting out” these three statements inserted by futurize:

    from future import standard_library
    standard_library.install_aliases()
    from builtins import str

    The last one causes python3 to complain about sending Unicode data to ASCII char* interfaces in modules written in C. My solution of reverting to the native interface seems to work even for non-ASCII UTF-8 characters.

    The first two don’t work on systems where I haven’t installed the “future” module.

    I have recently converted my Mac Mini to Python 3 which mostly works. (Some google interfaces don’t like that Python 3 is in a non-standard location. I have coded a workaround for this problem that doesn’t depend on Python 3.)

    I am now down to one device still using Python 2: My Raspberry
    Pi 3. Converting that is low priority until something breaks.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.