Migrate a ZODB from Python 2.7 to Python 3#

Note

The process described here was successfully applied to several projects with its databases. Anyway, this is still work in progress. Any help to make this document better is appreciated. To continue with documenting the process, or help improve the involved scripts and tools. please have a look at the following resources:

Plone 5.2 can be run on Python 2 and Python 3. For new projects, you can start with Python 3 with a fresh database. To use an existing project in Python 3 though, you need to migrate your existing database first. This section explains how to do that.

ZODB itself is compatible with Python 3, but a database created in Python 2.7 cannot be used in Python 3 without modifying it. See Why do I have to migrate my database? for technical background.

Database upgrade procedure#

Follow these steps to migrate your database.

  1. Upgrade your site to Plone 5.2 running on Python 2 first. See Upgrading Plone 5.1 to 5.2.

  2. Make sure your code and all add-ons that you use work in Python 3. See Migrating Plone 5.2 to Python 3.

  3. Backup your database!

  4. Pack your database to 0 days. zodbupdate will not update your database history, will leave old objects in place, and you will not be able to pack your database in the future.

  5. In your old buildout under Python 2, verify your database integrity using zodbverify. Solve integrity problems, if there are any.

  6. Prepare your buildout for migrating the database to Python 3.

  7. Do not start the instance.

  8. Migrate your database using zodbupdate.

  9. Verify your database integrity using zodbverify. If there are any problems, solve them and redo the migration.

  10. Start the instance.

  11. Manually check if all works as expected.

Why do I have to migrate my database?#

To understand the problem that arises when migrating a ZODB from Python 2 to Python 3, this introduction and the following example will help.

When pickling an object, the data types and values are stored.

In Python 2, strings get STRING, and Unicode gets UNICODE.

$ python2
Python 2.7.14 (default, Sep 23 2017, 22:06:14)
>>> di=dict(int=23,str='Ümläut',unicode=u'Ümläut')
>>> di
{'int': 23, 'unicode': u'\xdcml\xe4ut', 'str': '\xc3\x9cml\xc3\xa4ut'}
>>> import pickle
>>> import pickletools
>>> pickletools.dis(pickle.dumps(di))
    0: (    MARK
    1: d        DICT       (MARK at 0)
    2: p    PUT        0
    5: S    STRING     'int'
   12: p    PUT        1
   15: I    INT        23
   19: s    SETITEM
   20: S    STRING     'unicode'
   31: p    PUT        2
   34: V    UNICODE    u'\xdcml\xe4ut'
   42: p    PUT        3
   45: s    SETITEM
   46: S    STRING     'str'
   53: p    PUT        4
   56: S    STRING     '\xc3\x9cml\xc3\xa4ut'
   80: p    PUT        5
   83: s    SETITEM
   84: .    STOP
highest protocol among opcodes = 0

Python 3 does not allow non-ASCII characters in bytes, the pickle declares the byte string as SHORT_BINBYTES and the string (Python 2 Unicode) as BINUNICODE.

$ python3
Python 3.6.3 (default, Oct  3 2017, 21:45:48)
>>> di=dict(int=23,str=b'Ümläut',unicode='Ümläut')
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
>>> di=dict(int=23,str=b'Umlaut',unicode='Ümläut')
>>> di
{'int': 23, 'str': b'Umlaut', 'unicode': 'Ümläut'}
>>> import pickle
>>> import pickletools
>>> pickletools.dis(pickle.dumps(di))
    0: \x80 PROTO      3
    2: }    EMPTY_DICT
    3: q    BINPUT     0
    5: (    MARK
    6: X        BINUNICODE 'int'
   14: q        BINPUT     1
   16: K        BININT1    23
   18: X        BINUNICODE 'str'
   26: q        BINPUT     2
   28: C        SHORT_BINBYTES b'Umlaut'
   36: q        BINPUT     3
   38: X        BINUNICODE 'unicode'
   50: q        BINPUT     4
   52: X        BINUNICODE 'Ümläut'
   65: q        BINPUT     5
   67: u        SETITEMS   (MARK at 5)
   68: .    STOP
highest protocol among opcodes = 3

Python 3 will wrongly interpret a pickle created with Python 2 that contains non-ASCII characters in a field declared with OPTCODE STRING. In that case, we may end up with a UnicodeDecodeError for this pickle in ZODB.serialize.

$ python3
>>> b'\xc3\x9cml\xc3\xa4ut'.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Or when UTF-8 encoded byte strings are interpreted as Unicode, we do not get an error, but mangled non-ASCII characters:

$ python3
>>> print('\xdcml\xe4ut')
Ümläut
>>> print('\xc3\x9cml\xc3\xa4ut')
Ãmläut

How zodbupdate solves the problem#

zodbupdate loads the ZODB and iterates on a low level without actually loading the pickle over all pickle data.

It does:

  • Update magic marker to indicate the new format of the database.

  • Rename classes and modules with different locations.

  • Convert bytes to string or keep binary (according to rule mappings).

  • Handle encoding problems while doing bytes-to-string conversion, and provide fallbacks for mixed encodings in the database.

Prepare your buildout for migrating the database to Python 3#

You need to add zodbverify to your Python 2 buildout's eggs = variable in the [instance] section.

You need to add the package zodbupdate and zodbverify to your Python 3 buildout.

Depending on your buildout, this could look like the following.

[buildout]

parts +=
    zodbupdate

auto-checkout +=
    zodbupdate

[instance]
eggs +=
    zodbverify

[zodbupdate]
recipe = zc.recipe.egg
eggs =
    zodbupdate
    ${buildout:eggs}

[sources]
zodbupdate = git https://github.com/zopefoundation/zodbupdate.git pushurl=git@github.com:zopefoundation/zodbupdate.git branch=master

This adds a new buildout part zodbupdate. The coredev buildout already has this part.

After re-running buildout, you will now have a new executable ./bin/zodbupdate.

Warning

Do not try to start Plone in Python 3 with the old database before migrating it! Trying to do that will destroy the database and result in a traceback, such as the following.

Traceback (most recent call last):
  File "/Users/pbauer/workspace/projectx/parts/instance/bin/interpreter", line 279, in <module>
    exec(compile(__file__f.read(), __file__, "exec"))
  File "/Users/pbauer/.cache/buildout/eggs/Zope-4.0b8-py3.7.egg/Zope2/Startup/serve.py", line 219, in <module>
    sys.exit(main() or 0)

  [...]

  File "/Users/pbauer/.cache/buildout/eggs/ZODB-5.5.1-py3.7.egg/ZODB/FileStorage/FileStorage.py", line 1619, in read_index
    raise FileStorageFormatError(name)
ZODB.FileStorage.FileStorage.FileStorageFormatError: /Users/pbauer/workspace/projectx/var/filestorage/Data.fs

Verify the integrity of the database in Python 2#

The preflight verification of the database is run on Plone 5.2 in Python 2. First check if all Python pickles in the database can be loaded. In older and mature projects, it is possible to have pickles in there pointing to classes long gone from the code. Those may cause problems later.

Call ./bin/instance zodbverify in your Python 2.7 setup. If a problem pops up, there is a debug mode with additional parameter -D, resulting in PDB with the pickle-data and a decompiled pickle in place to gather information about the source of the problem. This enables solving the problem by either adding a stub class in the code or by deleting the object in the ZODB.

Migrate database using zodbupdate#

The migration of the database is run on Plone 5.2 in Python 3. It is expected to work the same in Python 3.6, 3.7, or 3.8.

Run the migration with appropriate arguments.

  • The operation to perform, convert-py3.

  • The location of the database.

  • The encoding expected.

  • Optionally, encoding fallbacks, if the database contains mixed encodings.

./bin/zodbupdate --convert-py3 --file=var/filestorage/Data.fs --encoding utf8 --encoding-fallback latin1

Depending on the size of your database, this can take a while.

Ideally the output is similar to the following.

Updating magic marker for var/filestorage/Data.fs
Ignoring index for /Users/pbauer/workspace/projectx/var/filestorage/Data.fs
Loaded 2 decode rules from AccessControl:decodes
Loaded 12 decode rules from OFS:decodes
Loaded 2 decode rules from Products.PythonScripts:decodes
Loaded 1 decode rules from Products.ZopeVersionControl:decodes
Committing changes (#1).

Note

The blob storage (holding binary data of files and images) will not be changed or even be read during the migration, because the blobs only contain the raw binary data of the file or image.

Note

The encoding should always be utf8, and will be used when porting database entries of classes where no encoding is specified in a [zodbupdate.decode] mapping in the package that holds the base class.

Note

The encoding fallback is optional and should not be provided by default. If a UnicodeDecodeError occurs, try to find out if the instance was configured with encodings different from utf8. Provide those encodings as the --encoding-fallback argument. If in doubt, try latin1, since this was in former times of Zope the default encoding.

Test migration#

You can use the following command to check if all records in the database can be successfully loaded.

bin/instance zodbverify

The output should look like the following.

INFO:Zope:Ready to handle requests
INFO:zodbverify:Scanning ZODB...
INFO:zodbverify:Done! Scanned 7781 records. Found 0 records that could not be loaded.

Most likely you will have additional log messages, warnings, and even errors.

Note

You can use the debug mode with ./bin/instance zodbverify -D. This will drop you into a pdb session each time a database entry cannnot be unpickled You can inspect it, and figure out if that is a real issue or not.

Before you start debugging you should read the following section on Troubleshooting, because in many cases you can ignore the warnings.

Troubleshooting#

This section covers common issues when running zodbverify.

Data.fs.index broken#

Delete Data.fs.index before migrating, or you will get the following error during migrating.

Updating magic marker for var/filestorage/Data.fs
loading index
Traceback (most recent call last):
  File "/home/erral/downloads/eggs/ZODB-5.5.1-py3.6.egg/ZODB/FileStorage/FileStorage.py", line 465, in _restore_index
    info = fsIndex.load(index_name)
  File "/home/erral/downloads/eggs/ZODB-5.5.1-py3.6.egg/ZODB/fsIndex.py", line 134, in load
    v = unpickler.load()
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 249: ordinal not in range(128)

This error can be safely ignored.

Search/Catalog raises errors#

If searches are failing and are raising errors, go to the ZMI of your Plone site root. Select the portal_catalog, and click on the Advanced tab. Select Clear and Rebuild. This may take a while.

ModuleNotFoundError: No module named PloneLanguageTool#

There were cases when the migration aborted with an import error such as the following.

An error occured
Traceback (most recent call last):
  File "/Users/pbauer/.cache/buildout/eggs/plone.app.upgrade-2.0.22-py3.7.egg/plone/app/upgrade/__init__.py", line 120, in <module>
    from Products.PloneLanguageTool import interfaces  # noqa F811
ModuleNotFoundError: No module named 'PloneLanguageTool'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pbauer/workspace/stiftung_py3/src-mrd/zodbupdate/src/zodbupdate/main.py", line 201, in main
    updater()
  File "/Users/pbauer/workspace/stiftung_py3/src-mrd/zodbupdate/src/zodbupdate/update.py", line 82, in __call__
    new = self.processor.rename(current)
  File "/Users/pbauer/workspace/stiftung_py3/src-mrd/zodbupdate/src/zodbupdate/serialize.py", line 333, in rename
    data = unpickler.load()
  File "/Users/pbauer/workspace/stiftung_py3/src-mrd/zodbupdate/src/zodbupdate/serialize.py", line 199, in __find_global
    return find_global(*self.__update_symb(klass_info), Broken=ZODBBroken)
  File "/Users/pbauer/workspace/stiftung_py3/src-mrd/zodbupdate/src/zodbupdate/serialize.py", line 177, in __update_symb
    symb = find_global(*symb_info, Broken=ZODBBroken)
  File "/Users/pbauer/.cache/buildout/eggs/ZODB-5.5.1-py3.7.egg/ZODB/broken.py", line 204, in find_global
    __import__(modulename)
  File "/Users/pbauer/.cache/buildout/eggs/plone.app.upgrade-2.0.22-py3.7.egg/plone/app/upgrade/__init__.py", line 127, in <module>
    'Products.PloneLanguageTool.LanguageTool',
AttributeError: type object 'LanguageTool' has no attribute 'LanguageTool'
Stopped processing, due to: type object 'LanguageTool' has no attribute 'LanguageTool'
Traceback (most recent call last):
  File "/Users/pbauer/.cache/buildout/eggs/plone.app.upgrade-2.0.22-py3.7.egg/plone/app/upgrade/__init__.py", line 120, in <module>
    from Products.PloneLanguageTool import interfaces  # noqa F811
ModuleNotFoundError: No module named 'PloneLanguageTool'

To work around this, comment out the lines offending lines in plone/app/upgrade/__init__.py. Remember to uncomment them after the migration!

# try:
#     from Products.PloneLanguageTool import interfaces  # noqa F811
# except ImportError:
#     alias_module('Products.PloneLanguageTool.interfaces', bbb)
#     alias_module('Products.PloneLanguageTool', bbbd)
#     __import__(
#         'Products.PloneLanguageTool.LanguageTool',
#     ).PloneLanguageTool.LanguageTool = __import__(
#         'Products.PloneLanguageTool.LanguageTool',
#     ).PloneLanguageTool.LanguageTool.LanguageTool

Migration logs errors and warnings#

If there are log messages during the migration or during zodbverify, that does not necessarily mean the migration did not work, or that your database is broken. For example, if you migrated from Plone 4 to Plone 5, and then from Archetypes to Dexterity, it is very likely that items in the database cannot be loaded, because packages such as Products.Archetypes, plone.app.blob, or plone.app.imaging are not available. These items most likely remain, and were not removed properly and are not used. If your site otherwise works fine, you can choose to ignore these issues.

Here is the output of a migration started in Plone 4 with Archetypes. The site still works nicely in Plone 5.2 on Python 3.7 despite the warnings and errors:

Updating magic marker for var/filestorage/Data.fs
Loaded 2 decode rules from AccessControl:decodes
Loaded 12 decode rules from OFS:decodes
Loaded 2 decode rules from Products.PythonScripts:decodes
Loaded 1 decode rules from Products.ZopeVersionControl:decodes
Warning: Missing factory for App.Product ProductFolder
Warning: Missing factory for Products.Archetypes.ReferenceEngine ReferenceCatalog
Warning: Missing factory for Products.Archetypes.ArchetypeTool ArchetypeTool
Warning: Missing factory for Products.PloneLanguageTool.LanguageTool LanguageTool
Warning: Missing factory for Products.Archetypes.UIDCatalog UIDCatalog
Warning: Missing factory for Products.CMFPlone.MetadataTool MetadataTool
Warning: Missing factory for Products.CMFDefault.MetadataTool MetadataSchema
Warning: Missing factory for Products.Archetypes.ReferenceEngine ReferenceBaseCatalog
Warning: Missing factory for Products.ATContentTypes.tool.atct ATCTTool
Warning: Missing factory for Products.ATContentTypes.tool.topic TopicIndex
Warning: Missing factory for Products.ResourceRegistries.tools.CSSRegistry CSSRegistryTool
Warning: Missing factory for Products.ResourceRegistries.tools.CSSRegistry Stylesheet
Warning: Missing factory for Products.PasswordResetTool.PasswordResetTool PasswordResetTool
New implicit rule detected copy_reg _reconstructor to copyreg _reconstructor
New implicit rule detected __builtin__ object to builtins object
Warning: Missing factory for Products.CMFPlone.CalendarTool CalendarTool
Warning: Missing factory for Products.CMFPlone.InterfaceTool InterfaceTool
Warning: Missing factory for Products.CMFPlone.ActionIconsTool ActionIconsTool
Warning: Missing factory for Products.CMFActionIcons.ActionIconsTool ActionIcon
Warning: Missing factory for Products.Archetypes.UIDCatalog UIDBaseCatalog
Warning: Missing factory for Products.CMFPlone.UndoTool UndoTool
Warning: Missing factory for Products.TinyMCE.utility TinyMCE
Warning: Missing factory for Products.ResourceRegistries.tools.JSRegistry JSRegistryTool
Warning: Missing factory for Products.ResourceRegistries.tools.JSRegistry JavaScript
Warning: Missing factory for Products.CMFPlone.FactoryTool FactoryTool
New implicit rule detected copy_reg __newobj__ to copyreg __newobj__
Warning: Missing factory for Products.ATContentTypes.tool.metadata MetadataTool
Warning: Missing factory for Products.ATContentTypes.interfaces.interfaces IATCTTool
New implicit rule detected Products.CMFPlone.DiscussionTool DiscussionTool to OFS.SimpleItem SimpleItem
Warning: Missing factory for Products.CMFDefault.MetadataTool ElementSpec
Warning: Missing factory for Products.CMFDefault.MetadataTool MetadataElementPolicy
New implicit rule detected plone.app.folder.nogopip GopipIndex to plone.folder.nogopip GopipIndex
Warning: Missing factory for Products.ATContentTypes.content.folder ATFolder
Warning: Missing factory for Products.Archetypes.BaseUnit BaseUnit
Warning: Missing factory for Products.ATContentTypes.content.document ATDocument
Warning: Missing factory for plone.app.blob.content ATBlob
Warning: Missing factory for plone.app.blob.interfaces IATBlobImage
Warning: Missing factory for Products.ATContentTypes.interfaces.image IATImage
Warning: Missing factory for Products.ATContentTypes.interfaces.image IImageContent
Warning: Missing factory for plone.app.blob.field BlobWrapper
Warning: Missing factory for plonetheme.stiftung.portlets.news Assignment
Warning: Missing factory for plonetheme.stiftung.portlets.linkportlet Assignment
New implicit rule detected plone.app.portlets.portlets.events Assignment to plone.app.event.portlets.portlet_events Assignment
Warning: Missing factory for Products.Archetypes.ReferenceEngine Reference
Warning: Missing factory for Products.ATContentTypes.content.link ATLink
Warning: Missing factory for Products.ATContentTypes.content.newsitem ATNewsItem
Warning: Missing factory for Products.Archetypes.Field Image
Warning: Missing factory for plone.app.imaging.scale ImageScale
Warning: Missing factory for webdav.LockItem LockItem
Warning: Missing factory for plone.app.blob.interfaces IATBlobFile
Warning: Missing factory for Products.ATContentTypes.interfaces.file IATFile
Warning: Missing factory for Products.ATContentTypes.interfaces.file IFileContent
Error: cannot pickle modified record: Can't pickle <class 'Products.ResourceRegistries.tools.JSRegistry.JavaScript'>: attribute lookup Products.ResourceRegistries.tools.JSRegistry.JavaScript failed
Warning: Missing factory for plone.app.collection.collection Collection
Warning: Missing factory for collective.flowplayer.media VideoInfo
Error: cannot pickle modified record: Can't pickle <class 'Products.ResourceRegistries.tools.CSSRegistry.Stylesheet'>: attribute lookup Products.ResourceRegistries.tools.CSSRegistry.Stylesheet failed
Warning: Missing factory for Products.ResourceRegistries.interfaces.settings IResourceRegistriesSettings
Warning: Missing factory for collective.js.jqueryui.controlpanel IJQueryUICSS
Warning: Missing factory for collective.js.jqueryui.controlpanel IJQueryUIPlugins
Warning: Missing factory for wildcard.media.content Video
Committing changes (#1).

Found new rules: {
 'Products.CMFPlone.DiscussionTool DiscussionTool': 'OFS.SimpleItem SimpleItem',
 '__builtin__ object': 'builtins object',
 'copy_reg __newobj__': 'copyreg __newobj__',
 'copy_reg _reconstructor': 'copyreg _reconstructor',
 'plone.app.folder.nogopip GopipIndex': 'plone.folder.nogopip GopipIndex',
 'plone.app.portlets.portlets.events Assignment': 'plone.app.event.portlets.portlet_events Assignment',
}

Eliminating downtime#

You can try the following to have an upgrade without downtime.

  • You can try to leverage the ZRS replication protocol, where the secondary server has the converted data. It would probably be a trivial change to ZRS to get this to work.

  • There is a ZRS equivalent for Relstorage. Its repository is at newtdb/db.

Further reading#