Migration to Python 3 only

This is a personal experience of having migrated IPython from being single source Py2-Py3 to Python 3 only.

The migration plan

The migration of IPython to be Python 3 only, started about a year ago. For the last couple of years, the IPython code base was "single source", meaning that yo could run it on Python 2 and Python 3 without a single change to the source code.

We could have made the transition to a Python 3 only code base with the use of a transpiler (like 2to3, but 3to2), though there does not seem to be any commonly used tools. This would also have required taking care of functionality backport, which can be a pain, and things like async-io are quasi impossible to backport cleanly to Python 2

So we just dropped Python 2 support

The levels of Non-support

While it is easy to use the term "non-supported" there are different level of non-support.

  • Do not release for Python 2, but ou can "compile" or clone/install yourself.
  • Officially saying "this software is not meant to run on Python 2", but it still does and is released.
  • CI Tests are run on Python 2 but "allow failure"
    • likely to break, but you accept PRs to fix things
  • CI Tests are not run on Python 2, PR fixing things are accepted
  • PR to fix things on Python 2 are not accepted
  • You are actively adding Python 3 only code
  • You are actively removing Python 2 code
  • You are actively keeping Python 2 compatibility, but make the software delete user home directory.

We settle somewhere in between adding python 3 only feature, and removing Python 2 code.

Making a codebase Python 3 only is "easy" in the sens that adding a single yield from is enough to make your code not valid Python 2, and no __future__ statement can fix that.

Removing code

One of the things you will probably see in the background of this section is that static languages would be of great help for this task. I would tend to say "thank you captain obvious", but there is some truth. Though Python is not a static language and we are trying to see how we can write Python in a better way to ease the transition.

the obvious

There are obvious functions that are present only for Python 2. In general present in if Py2 blocks. These can simply be deleted, and hopefully now your linter will complain about a ton of unused variable and import you can remove.

This is not always the case with function definition as most linter assume function are exported. You can help with coverage, but then you have to make sure your function is not tested separately on Python 3.

One of the indirect effect in many places was the reduced indentation. Especially at module level this lead to much greater readability as module-level function are easily confused for object methods when indented in an if py2:


It is common in Python to use try/except in place of if/else condition. The well-known hasattr works by catching an exception, and if/else is subject to race conditions. So it's not uncommon to hear that "Easier to Ask Forgiveness than Permission" is preferred to "Look Before you Leap". That might be a good move in a codebase with requirement that will never change, though in the context of code removal it is an hassle. Indeed when encountering a try/except which is likely meant to handle a change of behavior between versions of Python is hard to know for which version(s) of Python this was written – some changes are between minor versions ; in which order is the try/except written (Python 2 in the try, or in the except clause), and especially it is quasi impossible to find these location.

In the other hand explicit if statement (if sys.version_info < (3,)) are easy to find – remember you only need to compare the first item of the tuple – and easy to reduce to the only needed branch. It's also way easier to apply (and find) these for minor versions.

The zen of Python had it right: Explicit is better than implicit.

For me at least, try/except ImportError, AttributeError is a pattern I'll avoid in favor of explicit if/else.


There is a couple location where you might have to deal with bytes/unicode/str/string – oh boy, these names are not well chosen. In particular in area where you are casting thing that are bytes to unicode and vice-versa. And I can never remember when I read cast_bytes_py2 if it's doing nothing on Python 2, or nothing on Python 3. Though once you got the hang of it the code is soooo much shorter and simpler and clearer in your head.

Remember bytes<->unicode at boundary and keep things Unicode everywhere in your programs if you want to avoid headache. Good Python Code is boring Python code.

Python 2-ism

Dealing with removing Python 2 code made me realise that there is still a lot of Python-2-ism in most of the Python 3 code I write.

inheriting classes

Writing classes that do not need to inherit from object feels weird, and I definitively don't have the habit (yet) of not doing it. Having the ability to use a bare super() is great as I fevered remembered the order of parameter.


IPython uses a lot of path manipulation, so we keep using os.path.join in many paces, or even just use the with open(...) context manager. If you can afford it and target only recent python version pathlib and Path object are great alternative that we tend to forget exist.


Most of decode/encode operation do the right things, there is almost no need to precise the encoding anywhere. This make handling bytes-> str conversion even easier.

Python 3 ism

This are the feature of Python 3 which do not have equivalent in Python 2 and would make great addition in many code base. I tend to forget they exist and do not design code around them enough.


I'm just scratching the surface of async/await, and I definitively see great opportunities here. You need to design code to work in an async-fashion, but it should be relatively straightforward to use async code from synchronous one. I should learn more about sans-io (google is your friend) to make code reusable.

type anotations

Type annotation are an incredible feature that even just as visual annotation replace numpydoc. I have a small grudge against the pep8 that describe the position of space, but even without mypy the ability to annotate type is a huge boon for documentation. Now docstring can focus on why/how of functions.

kwarg only

Keyword arguments only is a great feature of Python 3, often under-appreciated the *-syntax is IMHO a bit clunky – but I don't have a better option. It give you a great flexibility in api without sacrifying backward compatibility. I wish I had position only as well.

Writing an async REPL - Part 1

This is a first part in a series of blog post which explain how I implemented the ability to await code at the top level scope in the IPython REPL. Don't expect the second part soon, or bother me for it. I know I shoudl write it, but time is a rarte luxury.

It is an interesting adventure into how Python code get executed, and I must admit it changed quite a bit how I understand python code now days and made me even more excited about async/await in Python.

It should also dive quite a bit in the internals of Python/CPython if you ever are interested in what some of these things are.

In [1]:
# we cheat and deactivate the new IPython feature to match Python repl behavior
%autoawait False

Async or not async, that is the question

You might now have noticed it, but since Python 3.5 the following is valid Python syntax:

In [2]:
async def a_function():
    async with contextmanager() as f:
        result = await f.get('stuff')
        return result

So you've been curious and read a lot about asyncio, and may have come across a few new libraries like aiohttp and all hte aio-libs, heard about sans-io, read complaints and we can take differents approaches, and maybe even maybe do better. You vaguely understand the concept of loops and futures, the term coroutine is still unclear. So you decide to poke around yourself in the REPL.

In [3]:
import aiohttp
In [4]:
coro_req = aiohttp.get('https://api.github.com')
<aiohttp.client._DetachedRequestContextManager at 0x1045289d8>
In [5]:
import asyncio
res = asyncio.get_event_loop().run_until_complete(coro_req)
In [6]:
<ClientResponse(https://api.github.com) [200 OK]>
<CIMultiDictProxy('Server': 'GitHub.com', 'Date': 'Thu, 06 Apr 2017 19:49:20 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Status': '200 OK', 'X-Ratelimit-Limit': '60', 'X-Ratelimit-Remaining': '50', 'X-Ratelimit-Reset': '1491508909', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept', 'Etag': 'W/"7dc470913f1fe9bb6c7355b50a0737bc"', 'X-Github-Media-Type': 'github.v3; format=json', 'Access-Control-Expose-Headers': 'ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval', 'Access-Control-Allow-Origin': '*', 'Content-Security-Policy': "default-src 'none'", 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'deny', 'X-Xss-Protection': '1; mode=block', 'Vary': 'Accept-Encoding', 'X-Served-By': 'a51acaae89a7607fd7ee967627be18e4', 'Content-Encoding': 'gzip', 'X-Github-Request-Id': '8182:3911:C50FFE:EF0636:58E69BC0')>
In [7]:
<generator object ClientResponse.json at 0x1052cd9e8>
In [8]:
json = asyncio.get_event_loop().run_until_complete(res.json())
{'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}',
 'current_user_url': 'https://api.github.com/user',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': 'https://api.github.com/user/keys',
 'notifications_url': 'https://api.github.com/notifications',
 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}',
 'organization_url': 'https://api.github.com/orgs/{org}',
 'public_gists_url': 'https://api.github.com/gists/public',
 'rate_limit_url': 'https://api.github.com/rate_limit',
 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}',
 'repository_url': 'https://api.github.com/repos/{owner}/{repo}',
 'starred_gists_url': 'https://api.github.com/gists/starred',
 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}',
 'team_url': 'https://api.github.com/teams',
 'user_organizations_url': 'https://api.github.com/user/orgs',
 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}',
 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}',
 'user_url': 'https://api.github.com/users/{user}'}

It's a bit painful to pass everything to run_until_complete, you know how to write async-def function and pass this to an event loop:

In [9]:
loop = asyncio.get_event_loop()
run = loop.run_until_complete
url = 'https://api.github.com/rate_limit'

async def get_json(url):
    res = await aiohttp.get(url)
    return await res.json()

{'rate': {'limit': 60, 'remaining': 50, 'reset': 1491508909},
 'resources': {'core': {'limit': 60, 'remaining': 50, 'reset': 1491508909},
  'graphql': {'limit': 0, 'remaining': 0, 'reset': 1491511760},
  'search': {'limit': 10, 'remaining': 10, 'reset': 1491508220}}}

Good ! And the you wonder, why do I have to wrap thing ina function, if I have a default loop isn't it obvious what where I want to run my code ? Can't I await things directly ? So you try:

In [10]:
await aiohttp.get(url)
  File "<ipython-input-10-055eb13ed07d>", line 1
    await aiohttp.get(url)
SyntaxError: invalid syntax

What ? Oh that's right there is no way in Pyton to set a default loop... but a SyntaxError ? Well, that's annoying.

Outsmart Python

Hopefully you (in this case me), are in control of the REPL. You can bend it to your will. Sure you can do some things. First you try to remember how a REPL works:

In [11]:
mycode = """
a = 1
def fake_repl(code):
    import ast
    module_ast = ast.parse(mycode)
    bytecode = compile(module_ast, '<fakefilename>', 'exec')
    global_ns = {}
    local_ns = {}
    exec(bytecode, global_ns, local_ns)
    return local_ns

{'a': 1}

We don't show global_ns as it is huge, it will contain all that's availlable by default in Python. Let see where it fails if you use try a top-level async statement:

In [12]:
import ast
mycode = """
import aiohttp
await aiohttp.get('https://aip.github.com/')

module_ast = ast.parse(mycode)
  File "<unknown>", line 3
    await aiohttp.get('https://aip.github.com/')
SyntaxError: invalid syntax

Ouch, so we can't even compile it. Let be smart can we get the inner code ? if we wrap in async-def ?

In [13]:
mycode = """
async def fake():
    import aiohttp
    await aiohttp.get('https://aip.github.com/')
module_ast = ast.parse(mycode)
"Module(body=[AsyncFunctionDef(name='fake', args=arguments(args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))], decorator_list=[], returns=None)])"
In [14]:
"AsyncFunctionDef(name='fake', args=arguments(args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))], decorator_list=[], returns=None)"

As a reminder, as AST stands for Abstract Syntax Tree, you may construct an AST which is not a valid Python, program, like an if-else-else. AST tree can be modified. What we are interested in it the body of the function, which itself is the first object of a dummy module:

In [15]:
body = module_ast.body[0].body
[<_ast.Import at 0x105d503c8>, <_ast.Expr at 0x105d50438>]

Let's pull out the body of the function and put it at the top level of a newly created module:

In [16]:
async_mod = ast.Module(body)
"Module(body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))])"

Mouahahahahahahahahah, you managed to get a valid top-level async ast ! Victory is yours !

In [17]:
bytecode = compile(async_mod, '<fakefile>', 'exec')
  File "<fakefile>", line 4
SyntaxError: 'await' outside function

Grumlgrumlgruml. You haven't said your last word. Your going to take your revenge later. Let's see waht we can do in Part II, not written yet.

Changing ByteStr REPR

A recent rebutal against Python 3 was recently written by the (in)famous Zed Shaw, with many responses to various arguments and counter arguments.

One particular topic which caught my eye was the bytearray vs unicodearray debate. I'll try explicitely avoid the term str/string/bytes/unicode naming as it is (IMHO) confusing, but that's a debate for another time. If one pay attention to above debates, you might see that there are about two camps:

  • bytearray and unicodearray are two different things, and we should never convert from one to the other. (that's rought the Pro-Python-3 camp)
  • bytearray and unicodearray are similar enough in most cases that we should do the magic for users.

I'm greatly exagerating here and the following is neither for one side or another, I have my personal preference of what I think is good, but that's irrelevant for now. Note that both sides argue that their preference is better for beginners.

You can often find posts trying to explain the misconception string/str/bytes, like this one which keep insisting on the fact that str in python 3 is far different from bytes.

The mistake in the REPR

I have one theory that the bytes/str issue is not in their behavior, but in their REPR. The REPR is in the end the main informatin communication channel between the object and the brain of the programmer, user. Also, Python "ducktyped", and you have to admit that bytes and str kinda look similar when printed, so assuming they should behave in similar way is not far fetched. I'm not saying that user will conciously assume bytes/str are the same. I'm saying that human brain inherently may do such association.

From the top of your head, what does requests.get(url).content returns ?

In [1]:
import requests_cache
import requests
In [2]:
b'{"name":"Luke Skywalker","height":"172","mass":"77","hair_color":"blond","skin_color":"fair","eye_color":"blue","birth_year":"19BBY","gender":"male","homeworld":"http://swapi.co/api/planets/1/","films":["http://swapi.co/api/films/6/","http://swapi.co/api/films/3/","http://swapi.co/api/films/2/","http://swapi.co/api/films/1/","http://swapi.co/api/films/7/"],"species":["http://swapi.co/api/species/1/"],"vehicles":["http://swapi.co/api/vehicles/14/","http://swapi.co/api/vehicles/30/"],"starships":["http://swapi.co/api/starships/12/","http://swapi.co/api/starships/22/"],"created":"2014-12-09T13:50:51.644000Z","edited":"2014-12-20T21:17:56.891000Z","url":"http://swapi.co/api/people/1/"}'

... bytes...

I'm pretty sure you glanced ahead in this post and probaly thought it was "Text", even probably in this case Json. It might be invalid Json, I'm pretty sure you cannot tell.

Why does it returns bytes ? Because it could fetch an image:

In [3]:
b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\xcc\x00\x00\x01\xcc\x08\x06\x00\x00\x00X\xdb\x98\x86\x00\x00 \x00IDATx\xda\xac\xbdy\x93\x1b\xb9\xb2\xf6\xf7K\x00\xb5\x90\xbdH\xa3\x99\xb9s7\xbf\xf1:\x1c\x0e/\xdf\xff\xdb8\xec\xb0}\xd79g4Rw\xb3IV\x15\x80\xf4\x1f@\xedUl\xea\\w\x84\xa65-6Y\x85\x02ry\xf2\xc9'\xa5\xfe\x9f\xfeGE\x04#\x821\x061\x16c\x0c\xc6XD\x0c\x02\xa0\x8a\x8a\x801\xa4\x1f\x08\x880\xfdRUD\x04\xd5\xfe\xff#6z\x8c*\xaa\x82\x88\xe0C \x84@\xf7~\xa6yy\xc5=>Q>~\xe6\xe1\xf3g~\xfd\xa7\x7f\xc28\x07\xb6\x00\x84h-\x88A1(\xe0U\xd2\xfb\xb8t\r1("

And if you decode the first request ?

In [4]:

Well that looks the same (except leading b...). Go explain a beginner that the 2 above are totally different things, while they already struggle with 0 base indexing, iterators, and the syntax of the language.

Changing the repr

Lets revert the repr of bytesarray to better represent what they are. IPython allows to change object repr easily:

In [5]:
text_formatter = get_ipython().display_formatter.formatters['text/plain']
In [6]:
def _print_bytestr(arg, p, cycle):
text_formatter.for_type(bytes, _print_bytestr)
<function IPython.lib.pretty._repr_pprint>
In [7]:

Make a usefull repr

<bytesbytesbytes> may not an usefull repr, so let's try to make a repr, that:

  • Convey bytes are, in genral not text.
  • Let us peak into the content to guess what it is
  • Push the user to .decode() if necessary.

Generally in Python objects have a repr which start with <, then have the class name, a quoted representation of the object, and memory location of the object, a closing >.

As the _quoted representation of the object may be really long, we can ellide it.

A common representation of bytes could be binary, but it's not really compact. Hex, compact but more difficult to read, and make peaking at the content hart when it could be ASCII. So let's go with ASCII reprentation where we escape non ASCII caracterd.

In [8]:
ellide = lambda s: s if (len(s) < 75) else  s[0:50]+'...'+s[-16:]
In [9]:
def _print_bytestr(arg, p, cycle):
    p.text('<bytes '+ellide(repr(arg))+' at {}>'.format(hex(id(arg))))       
text_formatter.for_type(bytes, _print_bytestr)
<function __main__._print_bytestr>
In [10]:
<bytes b'{"name":"Wilhuff Tarkin","height":"180","mass":"...pi/people/12/"}' at 0x107299228>
In [11]:
'{"name":"Wilhuff Tarkin","height":"180","mass":"unknown","hair_color":"auburn, grey","skin_color":"fair","eye_color":"blue","birth_year":"64BBY","gender":"male","homeworld":"http://swapi.co/api/planets/21/","films":["http://swapi.co/api/films/1/","http://swapi.co/api/films/6/"],"species":["http://swapi.co/api/species/1/"],"vehicles":[],"starships":[],"created":"2014-12-10T16:26:56.138000Z","edited":"2014-12-20T21:17:50.330000Z","url":"http://swapi.co/api/people/12/"}'

Advantage: It is not gobbledygook anymore when getting binary resources !

In [12]:
<bytes b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\...0IEND\xaeB`\x82' at 0x107e0c000>

Remapping notebook shortcuts

As Jupyter notebook run in a browser for technical and practical reasons we only have a limited number of shortcuts available and choices need to be made. Often this choices may conflict with browser shortcut, and you might need to remap it.

Today I was inform by Stefan van Der Walt that Cmd-Shift-P conflict for Firefox. It is mapped both to open the Command palette for the notebook and open a new Private Browsing window.

Using Private Browsing windows is extremely useful. When developing a website you might want to look at it without being logged in, and with an empty cache. So let see how we can remap the Jupyter notebook shortcut.


Use the following in your ~/.jupyter/custom/custom.js :

require(['base/js/namespace'], function(Jupyter){
  // we might want to but that in a callback on wait for 
  // en even telling us the ntebook is ready.
  console.log('== remaping command palette shortcut ==')
  // note that meta is the command key on mac.
  var source_sht = 'meta-shift-p'
  var target_sht = 'meta-/'
  var cmd_shortcuts = Jupyter.keyboard_manager.command_shortcuts;
  var action_name = cmd_shortcuts.get_shortcut(source_sht)
  cmd_shortcuts.add_shortcut(target_sht, action_name)
  console.log('== ', action_name, 'remaped from', source_sht, 'to', target_sht )


We need to use require and register a callback once the notebook is loaded:

require(['base/js/namespace'], function(Jupyter){

Here we grab the main namespace and name it Jupyter.

Then get the object that hold the various shortcuts: var cmd_shortcuts = Jupyter.keyboard_manager.command_shortcuts.

Shortcuts are define by sequence on keys with modifiers. Modifiers are dash-separated (need to be pressed at the same time). Sequence are comma separated. Example quiting in vim would be esc,;,w,q, in emacs ctrl-x,ctrl-c.

Here we want to unbind meta-shift-p (p is lowercase despite shift being pressed) and bind meta-/ (The shortcut Stefan wants). Note that meta- is the command key on mac.

We need to get the current command bound to this shortcut (cmd_shortcuts.get_shortcut(source_sht)). You could hardcode the name of the command but it may change a bit depending on notebook version (this is not yet public API). Here it is jupyter-notebook:show-command-palette.

You now bind it to your new shortcut:

cmd_shortcuts.add_shortcut('meta-/', action_name)

And finally unbind the original one


UI reflect your changes !

If you open the command palette, you should see that the Show command palette command now display Command-/ as its shortcut !


We are working on an interface to edit shortcuts directly from within the UI and not to have to write a single line of code !

Questions, feedback and fixes welcomed



As usual this is available and as been written as a jupyter notebook if you like to play with the code feel free to fork it.

The jet colormap (AKA "rainbow") is ubiquitous, there are a lot of controverse as to wether it is (from far) the best one. And better options have been designed.

The question is, if you have a graph that use a specific colormap, and you would prefer for it to use another one; what do you do ?

Well is you have th eunderlying data that's easy, but it's not always the case.

So how to remap a plot which has a non perceptually uniform colormap using another ? What's happend if yhere are encoding artificats and my pixels colors are slightly off ?

I came up with a prototype a few month ago, and was asked recently by @stefanv to "correct" a animated plot of huricane Matthew, where the "jet" colormap seem to provide an illusion of growth:


Let's see how we can convert a "Jet" image to a viridis based one. We'll first need some assumptions:

  • This assume that you "know" the initial color map of a plot, and that the emcoding/compressing process of the plot will not change the colors "too much".
  • There are pixels in the image which are not part of the colormap (typically text, axex, cat pictures....)

We will try to remap all the pixels that fall not "too far" from the initial colormap to the new colormap.

In [1]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
In [2]:
import matplotlib.colors as colors
In [3]:
!rm *.png *.gif out*
rm: output.gif: No such file or directory

I used the following to convert from mp4 to image sequence (8 fps determined manually). Sequence of images to video, and video to gif (quality is better than to gif dirrectly):

$ ffmpeg -i INPUT.mp4 -r 8 -f image2 img%02d.png
$ ffmpeg -framerate 8 -i vir-img%02d.png -c:v libx264 -r 8 -pix_fmt yuv420p out.mp4
$ ffmpeg -i out.mp4  output.gif
In [4]:
ffmpeg -i input.mp4 -r 8 -f image2 img%02d.png -loglevel panic

Let's take our image without the alpha channel, so only the first 3 components:

In [5]:
import matplotlib.image as mpimg
img = mpimg.imread('img01.png')[:,:,:3]
In [6]:
fig, ax = plt.subplots()

As you can see it does use "Jet" (most likely),

let's look at the repartitions of pixels on the RGB space...

In [7]:
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

import matplotlib.pyplot as plt
In [8]:
def rep(im, cin=None, sub=128):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    pp = im.reshape((-1,3)).T[:,::300]
    if cin:
        cmapin = plt.get_cmap(cin)
        cmap256 = colors.makeMappingArray(sub, cmapin)[:, :3].T
        ax.scatter(cmap256[0], cmap256[1], cmap256[2], marker='.', label='colormap', c=range(sub), cmap=cin, edgecolor=None)
    ax.scatter(pp[0], pp[1], pp[2], c=pp.T, marker='+')
    ax.set_title('Color of pixels')
    if cin:
    return ax
ax = rep(img)

We can see a specific clusers of pixel, let's plot the location of our "Jet" colormap and a diagonal of "gray". We can guess the effect of various compressions artifacts have jittered the pixels slightly away from their original location.

Let's look at where the jet colormap is supposed to fall:

In [9]:
rep(img, 'jet')
<matplotlib.axes._subplots.Axes3DSubplot at 0x111c9cc88>

Ok, that's pretty accurate, we also see that our selected graph does nto use the full extent of jet.

in order to find all the pixels that uses "Jet" efficiently we will use scipy.spatial.KDTree in the colorspace. In particular we will subsample the initial colormap in sub=256 subsamples, and collect only pixels that are within d=0.2 of this subsample, and map each of these pixels to the closer subsample.

As we know the subsampling of the initial colormap, we can also determine the output colors.

The Pixels that are "too far" from the pixels of the colormap are keep unchanged.

increasing 256 to higher value will give a smoother final colormap.

In [10]:
from scipy.spatial import cKDTree
In [11]:
def convert(sub=256, d=0.2, cin='jet', cout='viridis', img=img, show=True):
    viridis = plt.get_cmap(cout)
    cmapin = plt.get_cmap(cin)
    cmap256 = colors.makeMappingArray(sub, cmapin)[:, :3]
    original_shape = img.shape
    img_data = img.reshape((-1,3))
    # this will efficiently find the pixels "close" to jet
    # and assign them to which point (from 1 to 256) they are on the colormap.
    K = cKDTree(cmap256)
    res = K.query(img_data, distance_upper_bound=d)
    indices = res[1]
    l = len(cmap256)
    indices = indices.reshape(original_shape[:2])
    remapped = indices


    mask = (indices == l)

    remapped = remapped / (l-1)
    mask = np.stack( [mask]*3, axis=-1)

    # here we add only these pixel and plot them again with viridis.
    blend = np.where(mask, img, viridis(remapped)[:,:,:3])
    if show:
        fig, ax = plt.subplots()
    return blend
In [12]:
res = convert(img=img)
<matplotlib.axes._subplots.Axes3DSubplot at 0x113791278>