Nov 12, 2020
PHP Internals News: Episode 70: Explicit Octal Literal
PHP Internals News: Episode 70: Explicit Octal Literal
London, UK
Thursday, November 12th 2020, 09:33 GMT
In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed to add an Explicit Octal Literal to PHP.
The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript
Derick Rethans 0:15
Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:24
This is Episode 70. Today I'm talking with George Peter Banyard, about a new RFC that he's just proposed for PHP 8.1, which is titled explicit octal literal. Hello George, would you please introduce yourself? George Peter Banyard 0:38
Hello Derick, I'm George Peter Banyard, I'm a student at Imperial College London, and I contribute to PHP in my free time. Derick Rethans 0:46
Excellent, and the contribution that you're currently have up is titled: explicit octal literal. What is the problem that this is trying to solve? George Peter Banyard 0:56
Currently in PHP, we have four types of integer literals. So decimal numbers, hexadecimal, binary, and octal. Decimal is just your normal decimal numbers; hexadecimal starts with 0x, and then hexadecimal characters so, null to nine and A to F, and then binary starts with 0b, and then it's only zeros and ones. However, octal notation is just a decimal, something which looks like a decimal number, which was a leading zero, which doesn't really look that much different than a decimal number, but it comes from the days from C and everything which just uses like a zero as a prefix. Derick Rethans 1:48
But I have seen is people using like array keys for the, for the month names right and they use 01, 02, 03, you get 07, and 08 and 09, and then they look at the arrays. They notice that they actually had the zeroth element in there but no, but no eight or nine. That's something that is that PHP no longer does I believe. No, it's mostly that the parser doesn't pick it up anymore. Instead of silently ignoring the eight, it'll just give you an error. You've mentioned that there's these four types of numbers with octal being the one started with zero. But what's the problem with is that a moment? George Peter Banyard 2:31
Sometimes when you want to use, which looks like decimal number. So, for example, you're trying to order months, and use like the full two digits for the month number, instead of just one, you use 01, as an array key. When you get to array, it will parse error because it can't pass 08 as an octal number, which is very confusing, because it. Most people don't deal with octal numbers that often, and you would expect everything to be decimal. Because numeric strings are always decimal, but not integers literals. So, the proposal is to add an explicit octal notation, which would be 0o. So python does that, JavaScript has it, Rust also has it, to allow like a by more explicit to say oh I'm dealing with an octal number here. This is intended. Derick Rethans 3:33
Beyond having the 0b for binary, and the 0x for hexadecimal, the addition of 0o for octal is the plan to add. And is that it? George Peter Banyard 3:45
That's more or less the proposal. It's non-BC, because the parser before would just parse or if you had 0o, so there's no PC very possible numeric strings are not affected because since PHP 7.0 hexadecimal strings are not handled anymore as numeric strings. Numeric strings will always be decimal integers, literals will have your four different variants, and maybe a future proposal is to deprecate the implicit octal notation to always make a decimal, even if you have leading zeros. Derick Rethans 4:21
At the moment, if I do as a string literal 014, and do an echo that I get 12. George Peter Banyard 4:27
Because then it's interpreted as an octal. The most bizarre example is if you do var_dump string of 014 double equal to 014, you will get false, because one is interpreted as well 14, like the numeric string is interpreted as 14, whereas the octal number, which says 014 as an integer literal is interpreted as an octal number, which is 12, which is slightly confusing for most people, because that also if you because PHP, most, we all deal with like HTTP requests, and I GET and POST a data, which everything is in strings because it's a text protocol. And if you get user output, which is like I don't know, naught 14 and you're, are you intending to compare munz numbers which are or. 01201, and then you get to array, well then you just fail. Derick Rethans 5:22
Of course, removing that support means a BC breaking change, which phones happen until PHP nine, of course, which might be a while away from now let's say that. George Peter Banyard 5:31
Probably five years, if we're going through the timelines from PHP seven to PHP 8, but to be able to deprecated and remove it. Well, you need to add support for something else. So that's more the long term plan. Derick Rethans 5:46
And your proposal is basically to make it equivalent to binary and hexadecimal numbers, so that it is less confusing in general. George Peter Banyard 5:55
Yes, that's why the RFC is very short. Derick Rethans 5:58
What are octal numbers actually used for? George Peter Banyard 6:02
The only practical use case that I've seen is for Linux permissions, so chmod. Execute read and write, are those who permissions which chmod will use an octal number. Derick Rethans 6:15
In a different order though but George Peter Banyard 6:17
Yes, I don't know chmod though on top of my heart. Derick Rethans 6:20
Is it only Linux permissions that you can think of? Is there anything else? I can't either so I'm asking you. George Peter Banyard 6:25
No, I can't. That's why I find it very odd that like the leading zero just makes it octal instead of anything else. I mean it has precedence because many other languages do that like C, Java, I don't know, many any language I suppose was just picked it up from. I think C. But when I looked into the history, weirdly enough before C. They had a prefix for like binary, octal, and dec, and hexadecimal. But then the one for octal just got dropped, for some reason. Derick Rethans 6:57
Maybe because the zero and the "o" next each other look very the same. We've already touched on whether there are BC breaks or not, BC standing for backwards compatibility. And, there shouldn't be any because it's something that a parser currently doesn't understand. But do other build-in extensions need to be modified for example? George Peter Banyard 7:18
We have two extensions, which one which deals with numbers, so which is GMP, which is arbitrary precision arithmetic. And then there's the filter extension to filter octal, which filters data and tells you if it's valid or not and it gives you back a, like a correct integer or something like that, which is the filter extension, which has an octal filter. Both of these extensions have been modified to support like the prefix notation, and interpreted as a valid octal number. And then we have like the function which is oct2dec, which is basically octal to decimal, which which weirdly enough already supported like the octal prefix. Derick Rethans 7:59
But that accepts strings, I suppose? George Peter Banyard 8:01
Yes that that accepts strings. Derick Rethans 8:04
And it already supported the 0o prefix? George Peter Banyard 8:07
Yes, which is very on point for PHP I feel. Some things are just supported randomly in one side but not everywhere else. Derick Rethans 8:15
It's a surprise for me that is what I can say. So, yeah, you mentioned as a short RFC, you think there will be any extensions to this in the future? You already mentioned having it maybe de…