Scientific Python on M1 Macbook pro
For the past five years I've been working on a 2015 intel macbook pro which is starting to show its age.
I've been pondering getting a new machine as it was starting to get difficult to be on video call and do anything else at the same time. I tried macs with touch bars, but no functions keys was a deal-breaker for me. I was considering the framework laptop, but ended up getting a new 2021 macpro (base model).
Though it is apple silicon, I know it was going to be likely problematic, so here is my experience getting most of my python stack working on it.
Joining QuanSight.
April 30th 2020 will be my Last Day at the University of California Merced, I
will be joining QuanSight and more particularly QuanSight Labs starting May
1st, and start hopefully to do more Python and Community work again.
A non typical background
While mostly being known for writing Python software my background is
actually as a (Bio)-Physicist. I've been (mostly) self-taught in everything
related to programming and Python related, which I learned during my PhD
under the guidance of open-source mentors from the other end of the world when
I first started to contribute to IPython in late 2011.
Directly after my PhD I joined UC Berkeley as a Post Doc working full time on
Jupyter and IPython as part of the Berkeley Institute for Data Science. My
experience as an academic, programmer and open-source contributor and member of
the Scientific (Python) community gave me critically needed knowledge about
which tools were needed to push Science Forward.
After 2 years I had the opportunity to join University Of California Merced as
a Research Facilitator ; as I was anyways spending a large
amount of my time helping users of Python tools online and improving features
it was a good idea to officialise this role and engage in this new adventure.
Moreover it was helping with the famous 2 body problem.
UC Merced
The University of California Merced is the latest of the University of
California campus and is situated in the Middle of the California Central
Valley. It is currently shy of having 10 000 students and is a quickly growing
campus which carres the mission of the University of California with a focus on
promoting and focusing on Diversity.
As both a new and growing University, UC Merced comes with a number of challenges
and opportunities.
The size of the campus (which close to doubled during my time here) means that
the person-to-person interaction are way easier and frequent than on larger
campuses. The Research IT team is also embedded in the research buildings (I was
next door to the Math, Physics and Chemistry department) making it easy to get
to know Faculty, Staff and Students alike.
Many of the procedures and processes are still in motion at UC Merced leading to
usually way less overhead to getting things done, and also leaving the
opportunity to do things the right way and still shape a lot of things. The
challenging counterpart being that with the growth, what is setup one day
likely needs revisions every 6 month.
With a brand new campus also come state of the art installations. I had the
chance to teach Software Carpentry in a brand new media room which provided at
least one presenter screen for every 5 attendees allowing way more screen real
estate, and normal size fonts.
Speaking about real-estate, I also had the chance to help planning our 2000+
core cluster move to a brand new data center room, with about 20 racks reserved
for current and future Research Usage. This room will also allow the
storage available for Research to increase dramatically. One storage node on
its way to the new research facility (that we nicknamed the Borg Cube)
currently holds more storage capacity than the whole cluster had when I joined
UC Merced. We are on our way to have more than 1PB of effective storage on
site.
On top of what we had, we now have brand new os on those storage nodes (CentOS 8),
with ZFS, snapshots, deduplication, RDMA etc, and we're thinking about growing
to a distributed filesystem (BeeGFS?). And researchers have been quite
supportive of us pushing the cluster forward and understating when things might
fail. We of course have our HPC system running JupyterHub (with Dask) which
could use better Slurm integration and JupyterLab plugins :-). There are still
many things to be done (Unified user id on compute resource, and central Auth,
better monitoring, automation...etc), and in the current context, researchers
and students are looking for even more powerful infrastructure to run code, or
teach. I'm thus looking forward to see the Research IT team keep growing.
The layers below
Even more nowadays with most researchers working from home on their computer,
and using cloud or on premise compute, one must not underestimate all the
work that goes on infrastructure.
During the last 18 month at UC Merced I went in practice way further down the
stack than I did before. I learned a lot on how to properly manage a system, the
trade-off between which file system to use, how to configure them and what impact this
can have on overall performance, and how users can inadvertently create issues.
But at some point you hit the hardware limit, you don't want to go reboot
hundreds of machines by hand, so need proper out-of band control, and HPC tend
to consume a lot of power, so you need a proper redundant power distribution
and power load balancing. You may not think about it with your classical home
power outlet, but when you start to need to order devices that uses NEMA L5-30
and have to worry about balancing power across all the phases of your data
center there is no answer you can copy paste from Stack Overflow.
I learnt about many of those aspects during my time at UC Merced and still have
much more to learn. The team managing all of this is doing a fantastic job and
is critical to every software running on top. I'm looking forward to stay
involved but feel my skill are more on the development and higher level view of
things ; I also do miss a lot of the broader Scientific Python ecosystem,
nonetheless and despite trying my best to keep up and maintain IPython it is a
tough task when using those things less on a day-to-day basis.
Joining QuanSight (Time to unwind the stack)
Starting May 1st (Friday) I'll be joining the fantastic team at QuanSight Labs,
to add my expertise to the growing team that works – among many other things –
on sustainability in open-source. QuanSight employs a number of open source
maintainers and experts, and if you need this expertise or guarantees about the
open-source projects you use, come talk to us,
and have a look at QuanSight Training and
Residency programs.
I have a much better understanding of how HPC works now, and I'll be unwinding
the stack relatively fast, back to application layer. Up until now I've been
keeping myself up-to-date with the regular open-source directions podcast and
webinar, and followed latest
project on QuanSight Labs Blog.
I'm quite excited to join all the fantastic people there (Ralf Gommers, Carol
Willing, Anthony Scopatz, Melissa Mendonça, Aaron Meurer... and many other) and
spend more time back interacting with the Python community. Sustainability in
Open source, mentoring and taking proper care of the Community are things that
I deeply care about, and QuanSight values all of these as well.
I'm guessing you will also see me more around GitHub and on various mailing
list, I'm thus looking forward to your pull-requests and issues.
Array, slices and indexing
Array, Slices and Fancy indexing¶
In this small post we'll investigate (quickly) the difference between slices (which are part of the Python standard library), and numpy array, and how these could be used for indexing. First let's create a matrix containing integers so that element at index i,j has value 10i+j for convenience.
In [1]:
import numpy as np
from copy import copy
Let's create a single row, that is to say a matrix or height 1 and width number of element.
We'll use -1 in reshape to mean "whatever is necessary". for 2d matrices and tensor it's not super useful, but for higher dimension object, it can be quite conveneient.
In [2]:
X = np.arange(0, 10).reshape(1,-1)
X
Out[2]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
now a column, same trick.
In [3]:
Y = (10*np.arange(0, 8).reshape(-1, 1))
Y
Out[3]:
array([[ 0],
[10],
[20],
[30],
[40],
[50],
[60],
[70]])
By summing, and the rules of "broadcasting", we get a nice rectangular matrix.
In [4]:
R = np.arange(5*5*5*5*5).reshape(5,5,5,5,5)
In [5]:
M = X+Y
M
Out[5]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
Slicing¶
Quick intro about slicing. You have likely use it before if you've encoutered the objet[12:34] objet[42:96:3] notation. The X:Y:Z part is a slice. This way of writing a slice is allowed only in between square bracket for indexing.
X, Y and Z are optional and default to whatever is convenient, so ::3 (every three), :7 and :7: (until 7), : and :: (everything) are valid slices.
A slice is an efficent object that (usually) represent "From X to Y by Every Z", it is not limitted to numbers.
In [6]:
class PhylosophicalArray:
def __getitem__(self, sl):
print(f"From `{sl.start}` to `{sl.stop}` every `{sl.step}`.")
arr = PhylosophicalArray()
arr['cow':'phone':'traffic jam']
From `cow` to `phone` every `traffic jam`.
You can construct a slice using the slice builtin, this is (sometime) convenient, and use it in place of x:y:z
In [7]:
sl = slice('cow', 'phone', 'traffic jam')
In [8]:
arr[sl]
From `cow` to `phone` every `traffic jam`.
In multidimentional arrays, slice of 0 or 1 width, can be used to not drop dimensions, when comparing them to scalars.
In [9]:
M[:, 3] # third column, now a vector.
Out[9]:
array([ 3, 13, 23, 33, 43, 53, 63, 73])
In [10]:
M[:, 3:4] # now a N,1 matrix.
Out[10]:
array([[ 3],
[13],
[23],
[33],
[43],
[53],
[63],
[73]])
This is convenient when indices represent various quatities, for example an athmospheric ensemble when dimension 1 is latitude, 2: longitude, 3: height, 4: temperature, 5: pressure, and you want to focus on height==0, without having to shift temprature index from 4 to 3, pressure from 5 to 4...
Zero-width slices are mostly used to simplify algorythmes to avoid having to check for edge cases.
In [11]:
a = 3
b = 3
M[a:b]
Out[11]:
array([], shape=(0, 10), dtype=int64)
In [12]:
M[a:b] = a-b
In [13]:
M # M is not modified !
Out[13]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
When indexing an array, you will slice each dimention individually.
Here we extract the center block of the matrix not the 3 diagonal elements.
In [14]:
M[4:7, 4:7]
Out[14]:
array([[44, 45, 46],
[54, 55, 56],
[64, 65, 66]])
In [15]:
sl = slice(4,7)
sl
Out[15]:
slice(4, 7, None)
In [16]:
M[sl, sl]
Out[16]:
array([[44, 45, 46],
[54, 55, 56],
[64, 65, 66]])
Let's change the sign the biggest square block in the upper left of this matrix.
In [17]:
K = copy(M)
el = slice(0, min(K.shape))
el
Out[17]:
slice(0, 8, None)
In [18]:
K[el, el] = -K[el, el]
K
Out[18]:
array([[ 0, -1, -2, -3, -4, -5, -6, -7, 8, 9],
[-10, -11, -12, -13, -14, -15, -16, -17, 18, 19],
[-20, -21, -22, -23, -24, -25, -26, -27, 28, 29],
[-30, -31, -32, -33, -34, -35, -36, -37, 38, 39],
[-40, -41, -42, -43, -44, -45, -46, -47, 48, 49],
[-50, -51, -52, -53, -54, -55, -56, -57, 58, 59],
[-60, -61, -62, -63, -64, -65, -66, -67, 68, 69],
[-70, -71, -72, -73, -74, -75, -76, -77, 78, 79]])
That's about for slices, it was already a lot.
In the next section we'll talk about arrays
Fancy indexing¶
Array are more or less what you've seem in other languages. Finite Sequences of discrete values
In [19]:
ar = np.arange(4,7)
ar
Out[19]:
array([4, 5, 6])
When you slice with array, the elements of each arrays will be taken together.
In [20]:
M[ar,ar]
Out[20]:
array([44, 55, 66])
We now get a partial diagonal in out matrix. It does not have to be a diaonal:
In [21]:
M[ar, ar+1]
Out[21]:
array([45, 56, 67])
The result of this operation is a 1 dimentional array (which is a view – when possible – on the initial matrix memory),
in the same way as we flipped the sign of the largest block in the previous section, we'll try indexing with the same value:
In [22]:
S = copy(M)
In [23]:
el = np.arange(min(S.shape))
el
Out[23]:
array([0, 1, 2, 3, 4, 5, 6, 7])
In [24]:
S[el, el] = -S[el,el]
S
Out[24]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, -11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, -22, 23, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, -33, 34, 35, 36, 37, 38, 39],
[ 40, 41, 42, 43, -44, 45, 46, 47, 48, 49],
[ 50, 51, 52, 53, 54, -55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, -66, 67, 68, 69],
[ 70, 71, 72, 73, 74, 75, 76, -77, 78, 79]])
Here we flipped the value of only the diagonal elements. It of couse did not had to do the diagonal elements:
In [25]:
S[el, el+1]
Out[25]:
array([ 1, 12, 23, 34, 45, 56, 67, 78])
In [26]:
S[el, el+1] = 0
S
Out[26]:
array([[ 0, 0, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, -11, 0, 13, 14, 15, 16, 17, 18, 19],
[ 20, 21, -22, 0, 24, 25, 26, 27, 28, 29],
[ 30, 31, 32, -33, 0, 35, 36, 37, 38, 39],
[ 40, 41, 42, 43, -44, 0, 46, 47, 48, 49],
[ 50, 51, 52, 53, 54, -55, 0, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, -66, 0, 68, 69],
[ 70, 71, 72, 73, 74, 75, 76, -77, 0, 79]])
Nor are we required to have the same elements only once:
In [27]:
el-1
Out[27]:
array([-1, 0, 1, 2, 3, 4, 5, 6])
In [28]:
sy = np.array([0, 1, 2, 0, 1, 2])
sx = np.array([1, 2, 3, 1, 2, 3])
ld = S[sx, sy] # select 3 elements of lower diagonal twice
ld
Out[28]:
array([10, 21, 32, 10, 21, 32])
More in the scipy lectures notes, Numpy quickstart, Python DataScience Handbook
Some experiments¶
In [29]:
S = copy(M)
S[0:10, 0:10] = 0
S
Out[29]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [30]:
S = copy(M)
S[0:10:2, 0:10] = 0
S
Out[30]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
In [31]:
S = copy(M)
S[0:10, 0:10:2] = 0
S
Out[31]:
array([[ 0, 1, 0, 3, 0, 5, 0, 7, 0, 9],
[ 0, 11, 0, 13, 0, 15, 0, 17, 0, 19],
[ 0, 21, 0, 23, 0, 25, 0, 27, 0, 29],
[ 0, 31, 0, 33, 0, 35, 0, 37, 0, 39],
[ 0, 41, 0, 43, 0, 45, 0, 47, 0, 49],
[ 0, 51, 0, 53, 0, 55, 0, 57, 0, 59],
[ 0, 61, 0, 63, 0, 65, 0, 67, 0, 69],
[ 0, 71, 0, 73, 0, 75, 0, 77, 0, 79]])
In [32]:
S = copy(M)
S[0:10:2, 0:10:2] = 0
S
Out[32]:
array([[ 0, 1, 0, 3, 0, 5, 0, 7, 0, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[ 0, 21, 0, 23, 0, 25, 0, 27, 0, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[ 0, 41, 0, 43, 0, 45, 0, 47, 0, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 0, 61, 0, 63, 0, 65, 0, 67, 0, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
In [33]:
S = copy(M)
S[0:10:2, 0:10] = 0
S[0:10, 0:10:2] = 0
S
Out[33]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 11, 0, 13, 0, 15, 0, 17, 0, 19],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 31, 0, 33, 0, 35, 0, 37, 0, 39],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 51, 0, 53, 0, 55, 0, 57, 0, 59],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 71, 0, 73, 0, 75, 0, 77, 0, 79]])
In [34]:
S = copy(M)
S[0:8, 0:8] = 0
S
Out[34]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 8, 9],
[ 0, 0, 0, 0, 0, 0, 0, 0, 18, 19],
[ 0, 0, 0, 0, 0, 0, 0, 0, 28, 29],
[ 0, 0, 0, 0, 0, 0, 0, 0, 38, 39],
[ 0, 0, 0, 0, 0, 0, 0, 0, 48, 49],
[ 0, 0, 0, 0, 0, 0, 0, 0, 58, 59],
[ 0, 0, 0, 0, 0, 0, 0, 0, 68, 69],
[ 0, 0, 0, 0, 0, 0, 0, 0, 78, 79]])
In [35]:
S = copy(M)
S[np.arange(0,8), np.arange(0,8)] = 0
S
Out[35]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 0, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 0, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 0, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 0, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 0, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 0, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 0, 78, 79]])
In [36]:
S = copy(M)
S[range(0,8), range(0,8)] = 0
S
Out[36]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 0, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 0, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 0, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 0, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 0, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 0, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 0, 78, 79]])
In [37]:
S = copy(M)
S[np.arange(0, 10), np.arange(0, 10)] = 0 ## will fail
S
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in
1 S = copy(M)
----> 2 S[np.arange(0, 10), np.arange(0, 10)] = 0 ## will fail
3 S
IndexError: index 8 is out of bounds for axis 0 with size 8
In [ ]:
The Pleasure of deleting code
Good Code is Deleted Code
The only code without bugs is no
code. And the less code you have,
the less mental load as well. This is why it is often a pleasure to delete a lot
of code.
In IPython we recently bumped the version number to 7.0 and dropped support for
Python 3.3. This was the
occasion to clean, and remove a lots of code that insure compatibility with
multiple minor Python version, and while it may seem easy it required a lot of
thinking ahead of time to make the process simple.
Finding what can (and should be deleted)
The hardest part is not deleting the code itself, but finding what can be
deleted. In many compiled languages, the compiler may help you, but with Python
it can be quite tougher, and some of Python usual practices make it harder.
Here are a few tips on how to prepare your code (when you write it) for
deletion.
EAFP vs LBYL
Python tend to be more on the Easier to ask Forgiveness than Permission, than
Look Before You Leap. It is thus common to see code like:
try:
from importlib import reload
except ImportError :
from imp import reload
In this particular case though, why do we use the try/except ? Unless there is
a comment attached, it is hard guess that from imp import reload was
deprecated since python 3.4, the comment can easily get out of sync with the
actual code.
A better way would be to explicitly check sys.version_info
if sys.version_info < (3, 4):
from imp import reload
else:
from importlib import reload
(Note, tuple from unequal length can be compared in python).
It is now obvious which code should be removed and when. You can see that as
"Explicit is better than implicit" rule.
Deprecated code
Removing legacy deprecated code is also always a challenge, as you may be
worried of other library might be still relying deprecation. To help with that
let's see how we can improve typical deprecation, here is a typical deprecated
method from IPython::
def unicode_std_stream(stream='stdout'):
"""DEPRECATED"""
warn("IPython.utils.io.unicode_std_stream is deprecated", DeprecationWarning)
...
How much are you confident you can remove this ? A few question should pop into
your head:
- Since when has this function been deprecated ?
def unicode_std_stream(stream='stdout'):
"""DEPRECATED"""
warn("IPython.utils.io.unicode_std_stream is deprecated since IPython 4.0", DeprecationWarning)
...
With this new snippet I'm confident it's been 3 versions and I am more willing
to delete. This also helps downstream libraries to know whether they need
conditional code or now. I'm still unsure downstream maintainer have updated
their code. Let's add a stacklevel (to help them find where the deprecated
function is used, and add more informations about how they can replace code uses
this function:
def unicode_std_stream(stream='stdout'):
"""DEPRECATED, moved to nbconvert.utils.io"""
warn("IPython.utils.io.unicode_std_stream has moved to nbconvert.utils.io since IPython 4.0", DeprecationWarning, stacklevel=2)
...
Well with this information I'm even more confident downstream maintainer have
updated their code. They have an actionable item: replace one import for
another, and are more likely to do that, than dig for 1h in history to figure
out what to do.
TLDR
Be explicit in your conditional import that depends on version of underlying
python or library.
take time to write good deprecation warning with :
Stacklevel (=2 most of the time)
Since When it was deprecated.
What should replace deprecated call for consumers.
The time you put in these will greatly help your downstream consumers, and
benefit you later to simplify getting rid of lots of code easily.
Sign commits on GitHub
Signing Commit on Tags on GitHub
I've recently set-up keybase and integrated my public key
with git to be able to sign commits.
I decided to not automatically sign, as auto-signing would allow any attacker
that takes control of my machine to create signed commit. The git Merkle
tree of git still insure repos are
not tampered with, as long as you issue $ git fsck --full on a repo or $ git
config --global transfer.fsckobjects true once and forget it.
Using $ git log --show-signatur you can now check that commits (and tags) are
correctly signed. Be careful though, correct signature does not mean trusted,
and if you have a PGP key set; GitHub will helpfully signed the commit you make
on their platform with their key.
* commit 5ced6c6936563fea7ba7efccecbc4248d84cfabb (tag: 5.2.1, origin/5.2.x, 5.2.x)
| gpg: Signature made Tue Jan 2 19:51:17 2018 CET
| gpg: using RSA key 99B17F64FD5C94692E9EF8064968B2CC0208DCC8
| gpg: Good signature from "Matthias Bussonnier " [ultimate]
| Author: Matthias Bussonnier
| Date: Tue Jan 2 19:49:34 2018 +0100
|
| Bump version number to 5.2.1 for release
|
* commit 5a28fb0a121c286e35db309fe11b53693969b2d6
|\ gpg: Signature made Tue Jan 2 13:58:08 2018 CET
| | gpg: using RSA key 4AEE18F83AFDEB23
| | gpg: Good signature from "GitHub (web-flow commit signing) " [unknown]
| | gpg: WARNING: This key is not certified with a trusted signature!
| | gpg: There is no indication that the signature belongs to the owner.
| | Primary key fingerprint: 5DE3 E050 9C47 EA3C F04A 42D3 4AEE 18F8 3AFD EB23
| | Merge: 3fd21bc 065a16a
| | Author: Min RK
| | Date: Tue Jan 2 13:58:08 2018 +0100
| |
| | Merge pull request #326 from jupyter/auto-backport-of-pr-325
| |
| | Backport PR #325 on branch 5.2.x
| |
| * commit 065a16aad2e84d506b36bb2c874a7c287c53c61f (origin/pr/326)
|/ Author: Min RK
| Date: Tue Jan 2 10:57:13 2018 +0100
|
| Backport PR #325: Parenthesize conditional requirement in setup.py
So in the previous block, you can see that 5ced6c6... have been done and
signed by me, while 5a28fb0... has be allegedly done by Min, but signed by
GitHub.
By default you do not have GitHub Signature locally, so the GitHub Signed
commits can appear as unverified.
To do so fetch the GitHub Key:
$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 4AEE18F83AFDEB23
Where 4AEE18F83AFDEB23 is the key you do not have locally.
And remember Valid Signature, does not mean trusted.
verifying Tags
Tags can be signed, and need to be checked independently of commits :
$ git tag --verify 5.2.1
object 5ced6c6936563fea7ba7efccecbc4248d84cfabb
type commit
tag 5.2.1
tagger Matthias Bussonnier 1514919438 +0100
release version 5.2.1
gpg: Signature made Tue Jan 2 19:57:18 2018 CET
gpg: using RSA key 99B17F64FD5C94692E9EF8064968B2CC0208DCC8
gpg: Good signature from "Matthias Bussonnier " [ultimate]
So you can check that I tagged this commit.
learn more
As usual the git documentation
has more to say about this. And signing is not really useful without checking
the integrity of Git history, so please set $ git config --global transfer.fsckobjects true as well !
Open in Binder Chrome Extension
Two weeks ago I was pleased to announce the release of the Open-with-Binder for
Firefox extension.
After asking on twitter if people were interested in the same for
Chrome (29 Yes, 67 No,
3 Other) and pondering whether or not to pay the Chrome Developer Fee for the
Chrome App store, I decided to take my chance and try to publish it last week.
I almost just had to use Mozilla WebExt Shim for Chrome, downgrade a few artwork
from SVG to PNG (like really??) and upload all by hand, like really again ?
The Chrome Store has way more fields and it is quite complicated –
compared to the Mozilla Addons website at least – It is sometime confusing
whether fields are optional or not, or if they are per addons on per
developer ?
It does though allow you to upload more art that will be show in a store
which that looks nicer.
Still I had to pay to go through a really ugly crappy website and had to pay
for it to publish a free extension. So Mozilla you win this.
Please rate the extension, or it may not appear in search results for others AFAICT:
install Open with Binder for chrome
It works identically to the Firefox one, you get a button on the toolbar and
click on it when visiting GitHub.
Enjoy.
Open in Binder Browser Extension
Today I am please to announce the release of a first project I've been working
on for a bout a week: A Firefox extension to open the GitHub repository you are
visiting using MyBinder.org.
If you are in a hurry, just head there to Install version 0.1.0 for
Firefox.
If you like to know more read on.
Back to Firefox.
I've been using Chrome for a couple of years now, but heard a lot of good stuff
about Rust and all the good stuff it has
done or Firefox.
Ok that's a bit of marketing but it got me to retry Firefox (Nightly please),
and except for my password manager which took some week to update to the new
Firefox API, I rapidly barely used Chrome.
MyBinder.org
I'm also spending more and more time working with the JupyterHub team on
Binder, and see more and more developer adding binder
badges to their repository. Mid of last week I though:
You know what's not optimal? It's painful to browse repositories that don't
have the binder badge on MyBinder.org, also sometime you have to find the
badge which is at the bottom of the readme.
You know what would be great to fix that ? A button in the toolbar doing the
work for me.
Writing the extension
As I know Mozilla (which has a not so great new
design BTW, but
personal opinion) cares about making standard and things simple for their users,
I though I would have a look at the new
WebExtension.
And 7 days later, after a couple of 30 minutes break, I present to you a
staggering 27 lines (including 7 line business logic) extension that does that:
(function() {
function handleClick(){
browser.tabs.query({active: true, currentWindow: true})
.then((tabs) => {return tabs[0]})
.then((tab) => {
let url = new URL(tab.url);
if (url.hostname != 'github.com'){
console.warn('Open in binder only works on GitHub repositories for now.');
return;
};
let parts = url.pathname.split('/');
if (parts.length < 3){
console.warn('While you are on GitHub, You do not appear to be in a github repository. Aborting.');
return;
}
let my_binder_url = 'https://mybinder.org/v2/gh/'+parts[1] +'/'+parts[2] +'/master';
console.info('Opening ' + url + 'using mybinder.org... enjoy !')
browser.tabs.create({'url':my_binder_url});
})
}
console.info('(Re) loading open-in-binder extension.');
browser.browserAction.onClicked.addListener(handleClick);
console.info('❤️ If you are reading this then you know about binder and javascript. ❤️');
console.info('❤️ So you\'re skilled enough to contribute ! We\'re waiting for you on https://github.com/jupyterhub/ ❤️');
})()
You can find the original source here
The hardest part was finding the API and learning how to package and set the
icons correctly. There are still plenty of missing
features
and really low hanging
fruits,
even if you have never written an extension before (hey it's my first and I
averaged 1-useful line/day writing it...).
General Feeling
Remember that I'm new to that and started a week ago.
The Mozilla docs are good but highly varying in quality, it feels (and is) a
wiki. More opinionated tutorials might have been less confusing. A lot of
statements are correct but not quite, and leaving the choice too users is just
confusing. For example : you can use SVG or PNG icons, which I did, but then
some area don't like SVG (addons.mozilla.org), and the WebExtensions should work
on Chrome, but Chrome requires PNG. Telling me that I could use SVG was not
useful.
The review of addons is blazingly fast (7min from first submissions to Human
approved). Apple could learn from that if what I've heard here and there is
correct..
The submission process has way to many manual steps, I'm ok for first
submission, but updates, really ? I want to be able to fill-in all the
information ahead of time (or generate them) and then have a cli to submit
things. I hate filling forms online.
The first submission even if marked Beta will not be considered beta. So
basically I published a 0.1.0beta1, then 0.1.0beta2 which did not trigger
automatic update because the beta1 was not considered beta. Super confusing. I
could "force" to see the beta3 page but with a warning that beta3 was an older
version than beta1 ? What ?
There is still this feeling that this last 1% of polishing the process has not
been done (That's usually where Apple is know to shine). For example your store
icon will be resized to 64x64 (px) and display in a 64x64 (px) square but I have
a retina screen ! So even if I submitted a 128x128 now my icon looks blurry !
WTF !
You can contribute
As I said earlier there is a lot of low hanging fruits ! I went through the
process of figuring things out, so that you can contribute easily:
detect if not on /master/ and craft corresponding binder URL
Switch Icons to PNGs
test/package for Chrome
Add options for other binders than MyBinder.org
Add Screenshots and descriptions to the Addon Store.
So see you
there !
JupyterCon - Display Protocol
This is an early preview of what I am going to talk about at Jupyter Con
Leveraging the Jupyter and IPython display protocol¶
This is a small essay to show how one can make a better use of the display protocol. All you will see in this blog post has been available for a couple of years but noone really built on top of this.
It is usually know that the IPython rich display mechanism allow libraries authors to define rich representation for
their objects. You may have seen it in SymPy, which make extensive use of the latex representation, and Pandas which dataframes have nice HTML view.
What I'm going to show below, is that one is not limited to these – you can alter the representation of any existing object without modifying its source – and that this can be used to alter the view of containers, with the example of lists, to make things easy to read.
Modifying objects reprs¶
This section is just a reminder of how one can change define representation for object which source code is under your
control. When defining a class, the code author needs to define a number of methods which should return the (data, metadata) pair for a given object mimetype. If no metadata is necesary, these can be ommited. For some common representations short methods name ara availables. These methond can be recognized as they all follow the following pattern _repr_*_(self). That is to say, an underscore, followed by repr followed by an underscore. The star * need to be replaced by a lowercase identifier often refering to a short human redable description of the format (e.g.: png , html, pretty, ...), ad finish by a single underscore. We note that unlike the python __repr__ (pronouced "Dunder rep-er" which starts and ends wid two underscore, the "Rich reprs" or "Reprs-stars" start and end with a single underscore.
Here is the class definition of a simple object that implements three of the rich representation methods:
"text/html" via the _repr_html_ method
"text/latex" via the _repr_latex_ method
"text/markdown" via the _repr_markdown method
None of these methonds return a tuple, thus IPython will infer that there is no metadata associated.
The "text/plain" mimetype representation is provided by the classical Python's __repr__(self).
In [1]:
class MultiMime:
def __repr__(self):
return "this is the repr"
def _repr_html_(self):
return "This is html"
def _repr_markdown_(self):
return "This **is** mardown"
def _repr_latex_(self):
return "$ Latex \otimes mimetype $"
In [2]:
MultiMime()
Out[2]:
This is html
All the mimetypes representation will be sent to the frontend (in many cases the notebook web interface), and the richer one will be picked and displayed to the the user. All representations are stored in the notebook document (on disk) and this can be choosen from when the document is later reopened – even with no kernel attached – or converted to another format.
External formatters and containers¶
As stated in teh introduction, you do not need to have control over an object source code to change its representation. Still it is often a more convenient process. AS an example we will build a Container for image thumbnails and see how we can use the code written for this custom container to apply it to generic Python containers like lists.
As a visual example we'll use Orly Parody books covers, in particular a small resolution of some of them so llimit the amount of data we'll be working with.
In [3]:
cd thumb
/Users/bussonniermatthias/dev/posts/thumb
let's see some of the images present in this folder:
In [4]:
names = !ls *.png
names[:20], f"{len(names) - 10} more"
Out[4]:
(['10x-big.png',
'adulting-big.png',
'arbitraryforecasts-big.png',
'avoiddarkpatterns-big.png',
'blamingthearchitecture-big.png',
'blamingtheuser-big.png',
'breakingthebackbutton-big.png',
'buzzwordfirst-big.png',
'buzzwordfirstdesign-big.png',
'casualsexism-big.png',
'catchingemall-big.png',
'changinstuff-big.png',
'chasingdesignfads-big.png',
'choosingbasedongithubstars-big.png',
'codingontheweekend-big.png',
'coffeeintocode-big.png',
'copyingandpasting-big.png',
'crushingit-big.png',
'deletingcode-big.png',
'doingwhateverdanabramovsays-big.png'],
'63 more')
in the above i've used an IPython specific syntax (!ls) ton conveniently extract all the files with a png extension (*.png) in the current working directory, and assign this to teh names variable.
That's cute, but, for images, not really usefull. We know we can display images in the Jupyter notebook when using the IPython kernel, for that we can use the Image class situated in the IPython.display submodule. We can construct such object simply by passing the filename. Image does already provide a rich representation:
In [5]:
from IPython.display import Image
In [6]:
im = Image(names[0])
im
Out[6]:
The raw data from the image file is available via the .data attribute:
In [7]:
im.data[:20]
Out[7]:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\x90'
What if we map Images to each element of a list ?
In [8]:
from random import choices
mylist = list(map(Image, set(choices(names, k=10))))
mylist
Out[8]:
[,
,
,
,
,
,
,
,
]
Well unfortunately a list object only knows how to represent itself using text and the text representation of its elements. We'll have to build a thumbnail gallery ourself.
First let's (re)-build an HTML representation for display a single image:
In [9]:
import base64
from IPython.display import HTML
def tag_from_data(data, size='100%'):
return (
'''
''').format(''.join(base64.encodebytes(data).decode().split('\n')), size)
We encode the data from bytes to base64 (newline separated), and strip the newlines. We format that into an Html template – with some inline style – and set the source (src to be this base64 encoded string). We can check that this display correctly by wrapping the all thing in an HTML object that provide a conveninent _repr_html_.
In [10]:
HTML(tag_from_data(im.data))
Out[10]:
Now we can create our own subclass, hich take a list of images and contruct and HTML representation for each of these, then join them together. We define and define a _repr_html_, that wrap the all in a paragraph tag, and add a comma between each image:
In [11]:
class VignetteList:
def __init__(self, *images, size=None):
self.images = images
self.size = size
def _repr_html_(self):
return '
'+','.join(tag_from_data(im.data, self.size) for im in self.images)+'
' def _repr_latex_(self): return '$ O^{rly}_{books} (%s\ images)$ ' % (len(self.images)) We also define a LaTeX Representation – that we will not use here, and look at our newly created object using previously defined list: In [12]: VignetteList(*mylist, size='200px') Out[12]: , , , , , , , , That is nice, though it forces us to unpack all the lists we have explicitely into a VignetteList – which may be annoying. Let's cleanup a bit the above, and register an external formatter for the "text/html" mimetype that should be used for any object which is a list. We'll also improve the formatter to recusrse in objects. THat is to say: If it's an image return the PNG data in an tag, If it's an object that has an text/html reprensetation, use that. Otherwise, use th repr. With this we loose some nice formatting of text lists with the pretty module, we could easily fix that; but we leve it as an exercice for the reader. We're also going to recusrse into objects, that have a html representation. That it to say, make it work with lists of lists. In [13]: def tag_from_data_II(data, size='100%'): return ''''''.format(''.join(base64.encodebytes(data).decode().split('\n')), size) def html_list_formatter(ll): html = get_ipython().display_formatter.formatters['text/html'] reps = [] for o in ll: if isinstance(o, Image): reps.append(tag_from_data_II(o.data, '200px') ) else: h = html(o) if h: reps.append(h) else: reps.append(repr(o)+'') return '['+','.join(reps)+']' Same as before, with square bracket after and before, and a bit of styling that change the drop shadow on hover. Now we register the above with IPython: In [14]: ipython = get_ipython() html = ipython.display_formatter.formatters['text/html'] html.for_type(list, html_list_formatter) In [15]: mylist Out[15]: [,,,,,,,,] Disp¶External integration for some already existing object is available in disp, in particular you will find representation for SparkContext, requests's Responses object (collapsible json content and headers), as well as a couple others. Magic integration¶ The above demonstatratino show that a kernel is more than a language, it is a controling process that manage user requests (in our case code execution) and how the results are returned to the user. There is often the assumtion that a kernel is a single language, this is an incorrect assumtion as a kernl proces may manage several language and can orchestrate data movement from one language to another. In the following we can see how a Python process make use of what we have defined above to make sql querries returning rich results. We also see that the execution od SQL queries have side effects in the Python namespace, showing how the kernel can orchestrate things. In [16]: load_ext fakesql In [17]: try: rly except NameError: print('`rly` not defined') `rly` not defined In [18]: %%sql SELECT name,cover from orly WHERE color='red' LIMIT 10 Out[18]: [['buzzwordfirst-big.png',],['buzzwordfirstdesign-big.png',],['goodenoughtoship-big.png',],['noddingalong-big.png',],['resumedrivendevelopment-big.png',],['takingonneedlessdependencies-big.png',]] In [19]: rly[2] Out[19]: ['goodenoughtoship-big.png',] It would not be hard to have modification of the Python namespace to affect the SQL database, this is left as an exercise to the user as well (hint use properties) and to have integration with other languages like R, Julia, ... Note: This notebook has initially been written to display prototype features of IPython and the Jupyter notebook, in particular completions of cell magic (for the Sql Cell), and UI element allowing to switch between the shown mimetype. This will not be reflected in static rendering and is not mentioned in the text, which may lead to a confusing read.Migration to Python 3 only
This is a personal experience of having migrated IPython from being single
source Py2-Py3 to Python 3 only.
The migration plan
The migration of IPython to be Python 3 only, started about a year ago. For the
last couple of years, the IPython code base was "single source", meaning that yo
could run it on Python 2 and Python 3 without a single change to the source
code.
We could have made the transition to a Python 3 only code base with the use of a
transpiler (like 2to3, but 3to2), though there does not seem to be any commonly
used tools. This would also have required taking care of functionality backport,
which can be a pain, and things like async-io are quasi impossible to backport
cleanly to Python 2
So we just dropped Python 2 support
The levels of Non-support
While it is easy to use the term "non-supported" there are different level of
non-support.
Do not release for Python 2, but ou can "compile" or clone/install yourself.
Officially saying "this software is not meant to run on Python 2", but it
still does and is released.
CI Tests are run on Python 2 but "allow failure"
likely to break, but you accept PRs to fix things
CI Tests are not run on Python 2, PR fixing things are accepted
PR to fix things on Python 2 are not accepted
You are actively adding Python 3 only code
You are actively removing Python 2 code
You are actively keeping Python 2 compatibility, but make the software delete
user home directory.
We settle somewhere in between adding python 3 only feature, and removing Python
2 code.
Making a codebase Python 3 only is "easy" in the sens that adding a single yield
from is enough to make your code not valid Python 2, and no __future__
statement can fix that.
Removing code
One of the things you will probably see in the background of this section is
that static languages would be of great help for this task. I would tend to say
"thank you captain obvious", but there is some truth. Though Python is not a
static language and we are trying to see how we can write Python in a better way
to ease the transition.
the obvious
There are obvious functions that are present only for Python 2. In general
present in if Py2 blocks. These can simply be deleted, and hopefully now your
linter will complain about a ton of unused variable and import you can remove.
This is not always the case with function definition as most linter assume
function are exported. You can help with coverage, but then you have to make
sure your function is not tested separately on Python 3.
One of the indirect effect in many places was the reduced indentation.
Especially at module level this lead to much greater readability as module-level
function are easily confused for object methods when indented in an if py2:
EAFP vs LBYL
It is common in Python to use try/except in place of if/else condition.
The well-known hasattr works by catching an exception, and if/else is subject
to race conditions. So it's not uncommon to hear that "Easier to Ask Forgiveness
than Permission" is preferred to "Look Before you Leap". That might be a good
move in a codebase with requirement that will never change, though in the
context of code removal it is an hassle. Indeed when encountering a try/except
which is likely meant to handle a change of behavior between versions of
Python is hard to know for which version(s) of Python this was written – some
changes are between minor versions ; in which order is the try/except written
(Python 2 in the try, or in the except clause), and especially it is quasi
impossible to find these location.
In the other hand explicit if statement (if sys.version_info < (3,)) are easy
to find – remember you only need to compare the first item of the tuple – and
easy to reduce to the only needed branch. It's also way easier to apply (and
find) these for minor versions.
The zen of Python had it right: Explicit is better than implicit.
For me at least, try/except ImportError, AttributeError is a pattern I'll
avoid in favor of explicit if/else.
byte/str/string/unicode
There is a couple location where you might have to deal with
bytes/unicode/str/string – oh boy, these names are not well chosen. In
particular in area where you are casting thing that are bytes to unicode and
vice-versa. And I can never remember when I read cast_bytes_py2 if it's doing
nothing on Python 2, or nothing on Python 3. Though once you got the hang of it
the code is soooo much shorter and simpler and clearer in your head.
Remember bytesunicode at boundary and keep things Unicode everywhere in
your programs if you want to avoid headache. Good Python Code is boring Python
code.
Python 2-ism
Dealing with removing Python 2 code made me realise that there is still a lot of
Python-2-ism in most of the Python 3 code I write.
inheriting classes
Writing classes that do not need to inherit from object feels weird, and I
definitively don't have the habit (yet) of not doing it. Having the ability to
use a bare super() is great as I fevered remembered the order of parameter.
Pathlib
IPython uses a lot of path manipulation, so we keep using os.path.join in many
paces, or even just use the with open(...) context manager. If you can afford
it and target only recent python version pathlib and Path object are great
alternative that we tend to forget exist.
decode
Most of decode/encode operation do the right things, there is almost no need to
precise the encoding anywhere. This make handling bytes-> str conversion even
easier.
Python 3 ism
This are the feature of Python 3 which do not have equivalent in Python 2 and
would make great addition in many code base. I tend to forget they exist and do
not design code around them enough.
async/await
I'm just scratching the surface of async/await, and I definitively see great
opportunities here. You need to design code to work in an async-fashion, but it
should be relatively straightforward to use async code from synchronous one. I
should learn more about sans-io (google is your friend) to make code reusable.
type anotations
Type annotation are an incredible feature that even just as visual annotation
replace numpydoc. I have a small grudge against the pep8 that describe the
position of space, but even without mypy the ability to annotate type is a huge
boon for documentation. Now docstring can focus on why/how of functions.
kwarg only
Keyword arguments only is a great feature of Python 3, often under-appreciated
the *-syntax is IMHO a bit clunky – but I don't have a better option. It give
you a great flexibility in api without sacrifying backward compatibility.
I wish I had position only as well.
Writing an async REPL - Part 1
This is a first part in a series of blog post which explain how I implemented the ability to await code at the top level scope in the IPython REPL. Don't expect the second part soon, or bother me for it. I know I shoudl write it, but time is a rarte luxury.
It is an interesting adventure into how Python code get executed, and I must admit it changed quite a bit how I understand python code now days and made me even more excited about async/await in Python.
It should also dive quite a bit in the internals of Python/CPython if you ever are interested in what some of these things are.
In [1]:
# we cheat and deactivate the new IPython feature to match Python repl behavior
%autoawait False
Async or not async, that is the question¶
You might now have noticed it, but since Python 3.5 the following is valid Python syntax:
In [2]:
async def a_function():
async with contextmanager() as f:
result = await f.get('stuff')
return result
So you've been curious and read a lot about asyncio, and may have come across a few new libraries like aiohttp and all hte aio-libs, heard about sans-io, read complaints and we can take differents approaches, and maybe even maybe do better. You vaguely understand the concept of loops and futures, the term coroutine is still unclear. So you decide to poke around yourself in the REPL.
In [3]:
import aiohttp
In [4]:
print(aiohttp.__version__)
coro_req = aiohttp.get('https://api.github.com')
coro_req
1.3.5
Out[4]:
In [5]:
import asyncio
res = asyncio.get_event_loop().run_until_complete(coro_req)
In [6]:
res
Out[6]:
In [7]:
res.json()
Out[7]:
In [8]:
json = asyncio.get_event_loop().run_until_complete(res.json())
json
Out[8]:
{'authorizations_url': 'https://api.github.com/authorizations',
'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}',
'current_user_url': 'https://api.github.com/user',
'emails_url': 'https://api.github.com/user/emails',
'emojis_url': 'https://api.github.com/emojis',
'events_url': 'https://api.github.com/events',
'feeds_url': 'https://api.github.com/feeds',
'followers_url': 'https://api.github.com/user/followers',
'following_url': 'https://api.github.com/user/following{/target}',
'gists_url': 'https://api.github.com/gists{/gist_id}',
'hub_url': 'https://api.github.com/hub',
'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
'issues_url': 'https://api.github.com/issues',
'keys_url': 'https://api.github.com/user/keys',
'notifications_url': 'https://api.github.com/notifications',
'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}',
'organization_url': 'https://api.github.com/orgs/{org}',
'public_gists_url': 'https://api.github.com/gists/public',
'rate_limit_url': 'https://api.github.com/rate_limit',
'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}',
'repository_url': 'https://api.github.com/repos/{owner}/{repo}',
'starred_gists_url': 'https://api.github.com/gists/starred',
'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}',
'team_url': 'https://api.github.com/teams',
'user_organizations_url': 'https://api.github.com/user/orgs',
'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}',
'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}',
'user_url': 'https://api.github.com/users/{user}'}
It's a bit painful to pass everything to run_until_complete, you know how to write async-def function and pass this to an event loop:
In [9]:
loop = asyncio.get_event_loop()
run = loop.run_until_complete
url = 'https://api.github.com/rate_limit'
async def get_json(url):
res = await aiohttp.get(url)
return await res.json()
run(get_json(url))
Out[9]:
{'rate': {'limit': 60, 'remaining': 50, 'reset': 1491508909},
'resources': {'core': {'limit': 60, 'remaining': 50, 'reset': 1491508909},
'graphql': {'limit': 0, 'remaining': 0, 'reset': 1491511760},
'search': {'limit': 10, 'remaining': 10, 'reset': 1491508220}}}
Good ! And the you wonder, why do I have to wrap thing ina function, if I have a default loop isn't it obvious what where I want to run my code ? Can't I await things directly ? So you try:
In [10]:
await aiohttp.get(url)
File "", line 1
await aiohttp.get(url)
^
SyntaxError: invalid syntax
What ? Oh that's right there is no way in Pyton to set a default loop... but a SyntaxError ? Well, that's annoying.
Outsmart Python¶
Hopefully you (in this case me), are in control of the REPL. You can bend it to your will. Sure you can do some things. First you try to remember how a REPL works:
In [11]:
mycode = """
a = 1
print('hey')
"""
def fake_repl(code):
import ast
module_ast = ast.parse(mycode)
bytecode = compile(module_ast, '', 'exec')
global_ns = {}
local_ns = {}
exec(bytecode, global_ns, local_ns)
return local_ns
fake_repl(mycode)
hey
Out[11]:
{'a': 1}
We don't show global_ns as it is huge, it will contain all that's availlable by default in Python. Let see where it fails if you use try a top-level async statement:
In [12]:
import ast
mycode = """
import aiohttp
await aiohttp.get('https://aip.github.com/')
"""
module_ast = ast.parse(mycode)
File "", line 3
await aiohttp.get('https://aip.github.com/')
^
SyntaxError: invalid syntax
Ouch, so we can't even compile it. Let be smart can we get the inner code ? if we wrap in async-def ?
In [13]:
mycode = """
async def fake():
import aiohttp
await aiohttp.get('https://aip.github.com/')
"""
module_ast = ast.parse(mycode)
ast.dump(module_ast)
Out[13]:
"Module(body=[AsyncFunctionDef(name='fake', args=arguments(args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))], decorator_list=[], returns=None)])"
In [14]:
ast.dump(module_ast.body[0])
Out[14]:
"AsyncFunctionDef(name='fake', args=arguments(args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))], decorator_list=[], returns=None)"
As a reminder, as AST stands for Abstract Syntax Tree, you may construct an AST which is not a valid Python, program, like an if-else-else. AST tree can be modified. What we are interested in it the body of the function, which itself is the first object of a dummy module:
In [15]:
body = module_ast.body[0].body
body
Out[15]:
[<_ast.import at>, <_ast.expr at>]
Let's pull out the body of the function and put it at the top level of a newly created module:
In [16]:
async_mod = ast.Module(body)
ast.dump(async_mod)
Out[16]:
"Module(body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))])"
Mouahahahahahahahahah, you managed to get a valid top-level async ast ! Victory is yours !
In [17]:
bytecode = compile(async_mod, '', 'exec')
File "", line 4
SyntaxError: 'await' outside function
Grumlgrumlgruml. You haven't said your last word. Your going to take your revenge later. Let's see waht we can do in Part II, not written yet.