Changing ByteStr REPR
A recent rebutal against Python 3 was recently written by the (in)famous Zed Shaw, with many responses to various arguments and counter arguments.
One particular topic which caught my eye was the bytearray
vs unicodearray
debate. I'll try explicitely avoid the term str
/string
/bytes
/unicode
naming as it is (IMHO) confusing, but that's a debate for another time. If one pay attention to above debates, you might see that there are about two camps:
-
bytearray
andunicodearray
are two different things, and we should never convert from one to the other. (that's rought the Pro-Python-3 camp) -
bytearray
andunicodearray
are similar enough in most cases that we should do the magic for users.
I'm greatly exagerating here and the following is neither for one side or another, I have my personal preference of what I think is good, but that's irrelevant for now. Note that both sides argue that their preference is better for beginners.
You can often find posts trying to explain the misconception string/str/bytes, like this one which keep insisting on the fact that str
in python 3 is far different from bytes.
The mistake in the REPR¶
I have one theory that the bytes
/str
issue is not in their behavior, but in their REPR. The REPR is in the end the main informatin communication channel between the object and the brain of the programmer, user. Also, Python "ducktyped", and you have to admit that bytes
and str
kinda look similar when printed, so assuming they should behave in similar way is not far fetched. I'm not saying that user will conciously assume bytes/str are the same. I'm saying that human brain inherently may do such association.
From the top of your head, what does requests.get(url).content
returns ?
import requests_cache
import requests
requests_cache.install_cache('cachedb.tmp')
requests.get('http://swapi.co/api/people/1').content
... bytes...
I'm pretty sure you glanced ahead in this post and probaly thought it was "Text", even probably in this case Json. It might be invalid Json, I'm pretty sure you cannot tell.
Why does it returns bytes ? Because it could fetch an image:
requests.get('https://avatars0.githubusercontent.com/u/335567').content[:200]
And if you decode the first request ?
requests.get('http://swapi.co/api/people/2').content.decode()
Well that looks the same (except leading b
...). Go explain a beginner that the 2 above are totally different things, while they already struggle with 0 base indexing, iterators, and the syntax of the language.
Changing the repr¶
Lets revert the repr
of bytesarray
to better represent what they are. IPython allows to change object repr easily:
text_formatter = get_ipython().display_formatter.formatters['text/plain']
def _print_bytestr(arg, p, cycle):
p.text('<BytesBytesBytesBytesBytes>')
text_formatter.for_type(bytes, _print_bytestr)
requests.get('http://swapi.co/api/people/4').content
Make a usefull repr¶
<bytesbytesbytes>
may not an usefull repr, so let's try to make a repr, that:
- Convey bytes are, in genral not text.
- Let us peak into the content to guess what it is
- Push the user to
.decode()
if necessary.
Generally in Python objects have a repr which start with <
, then have the class name, a quoted representation of the object, and memory location of the object, a closing >
.
As the _quoted representation of the object may be really long, we can ellide it.
A common representation of bytes could be binary, but it's not really compact. Hex, compact but more difficult to read, and make peaking at the content hart when it could be ASCII. So let's go with ASCII reprentation where we escape non ASCII caracterd.
ellide = lambda s: s if (len(s) < 75) else s[0:50]+'...'+s[-16:]
def _print_bytestr(arg, p, cycle):
p.text('<bytes '+ellide(repr(arg))+' at {}>'.format(hex(id(arg))))
text_formatter.for_type(bytes, _print_bytestr)
requests.get('http://swapi.co/api/people/12').content
requests.get('http://swapi.co/api/people/12').content.decode()
Advantage: It is not gobbledygook anymore when getting binary resources !
requests.get('https://avatars0.githubusercontent.com/u/335567').content