snapsvg

2014-10-01

What's wrong with JavaScript in the template?

Those of you keeping score will know that I recently started a new job. This one is Perl, not PHP, and so a certain level of standards is expected from the code. What with Perl having all these neato features and excellent web frameworks, I at least consider it on a par with Python and Ruby in its utility.

Perusing the new-to-me codebase I of course discover some of the hysterical raisins that live there, much of which is easily forgiven because the original coder had the foresight to apologise in a comment for doing it in the first place. But one thing stood out to me as a prime candidate for refactoring: JavaScript in the templates.

I said as much and was surprised to be posed the question, "What's wrong with JavaScript in the templates?"

Surprised not to be asked the question, but because I didn't know what the answer was. I've worked enough on the front end of previous jobs to have enough experience in the matter that seeing JS in template code makes me flinch, but never have I been asked to actually introspect this reaction and explain it.

Questions like that are primo blog post material, and it's been a while since I properly got my teeth into one, so on my journey home I put my mind to formalising quite what it was about it that made me want to rip it out and refactor the life out of it.

What it's not

Some obvious answers come to mind, with varying validity.

  • Is it because it's hard to find? No. Everything's hard to find. ack for it - you'll find it soon enough.
  • Is it because it violates separation of concerns? No. In fact, you could argue that it improves it, by encapsulating JavaScript only useful to a template inside that very template.
  • Is it because the only reason most people put JS in a template is so they can use the templating language to build JS? Well yes, but that's just the same question. What's wrong with it?
  • Is it because it's not reusable? Well, yes and no. Most template JS is not intended to be reusable; it's quite specific to that particular template, and there's little use for it elsewhere. More on this point later.
  • Is it the same reason we don't put CSS in the template either? Or inline in the HTML? Yes! By Jupiter, yes! We find the answer in the template itself. It's the other, main part of the template that we've not mentioned yet - the HTML.

What lies beneath

To answer the question, we must deconstruct the web page itself and look at the parts. What are we really looking at when we look at a web page? What are we really providing when we build a template? What is the purpose of the HTML, the TT2 or Jade or Mustache code that wraps or creates it?

Most web pages follow a similar structure: There's the <html> with its <head> and <body>; the body has a <div class="header"> or, better yet, a <header>, and some sort of <div id="content">. Then at last there's a bunch of stuff that finally gets to the point, i.e. displays whatever it is the page is displaying.

Most template structures separate all the pre/postamble from the content itself. Even in the CGI days we, naively but with good intent, would have a header.html and a footer.html and we would render the header, then the body, then the footer, to STDOUT. More recently, we have a single file with the pre- and postamble in it, and we import the rendered content into that. We tend to also have a considerable number of satellite template files representing handy widgets and reusable code and all the other things that I've alredy said aren't really the reason why we don't do the title of this article.

We knew then, as we know now, something we always forget to talk about; something implicit in everything we do here. While we make all these templates rendering data in consistent ways we somehow lose sight of the simplest of notions: we are representing resources.

Resource and Framing

"Resource" is a fully-functional word, writ deep into the very clay with which we make our internets; vis-a-vis HTTP. HTTP works with a verb and a noun, i.e. it says "Do this to this". "Framing" is a word I've picked to describe what it is we website-makers do to resources to make them look nice for people using browsers that conform to the standards set out to allow us to do so.

HTTP's nouns are URIs. URI means Uniform Resource Identifier. The R in URI (or URL or IRI) means resource. It means thing; it's identifying the nouns of the internet. We respond to a (request to a) URI with a resource, represented in HTML format for the purposes of this discussion. We know this, but we never say this - and so whenever we get discussions, no one ever uses it as a basis for finding answers. But the concept of resource contains the answer to our question.

When we divide our templates up into separate files there is the tacit goal that the template we use to represent the actual, specific resource contain as little HTML as possible. Why? Well, mostly for consistency. We want to frame all our resources - at least those related to each other - in the same way. That means that if we put as little HTML as we can get away with into our resource templates, we can put as much as we can get away with into our framing templates, and thus have as little variation between the rendered resources as we can. A side effect, and therefore a second benefit, is that if we want to reuse or amend our framing, we can do this in one place - it's DRY.

We already recognise the difference between frame and resource: it's encoded right there in <div id="content">. How many of your templates resemble this structure?

<body>
  <stuff></stuff>
  <div id="content">
    <% content %>
  </div>
  <more stuff></more stuff>
</body>

That right there is the boundary between Alliance and Reaver space. Uh, I mean, the place where the framing goes away and the resource begins. The resource is all the data that change when you ask for a different ID, or a different resource type. The resource is that which, if you took all the HTML away, would still be what you asked for.

I've nearly made my point

Not all resources are data. Some resources are forms. I'm choosing forms as an example for another resource type because we're all familiar with them doing stuff.

Forms contain no data, but instead prompt you for data, and allow you to create more resources. Nominally, they represent the structure of the resource type, but don't represent any particular record of that type. The form holds the key to the answer: behaviour.

Consider:

<form action="/upload_image" method="post" enctype="multipart/form-data">
  <label for="image">Upload image:
    <input name="image" type="file">
  </label>

  <input type="submit">
</form>

This is a form with a file control, as you well know. It renders as a box with a "Browse" button. This one renders with a label, "Upload image:".

If you click on the label, the text of the input, or the browse button, you get the same behaviour: a file browser pops up. When you select a file and confirm it, the name of the file appears in the text part of the input, unless some jackass has installed Uploadify or similar, and broken it.

It also renders a single submit button. The button looks like all the other buttons on your website because you don't put CSS in your templates. The reason for that is being explained as we speak. I mean, as you read. I mean now.

When you click the submit button, the browser composes an HTTP POST request to the URL /upload_image on the host that served this resource. This request contains the entirety of the selected file, encoded in such a way that the receiving server can understand it. Presumably, the resource at that URL knows what to do with it.

Now, kindly point out to me the part of the HTML snippet above that implements any of that behaviour.

It's not there.

Nouns and adjectives - that's what the HTML is made of. There is not a single verb in the entirety of that form, and yet those few lines perform, implicitly, functionality that you would probably have to look up on Wikipedia to implement yourself.

Not all resources are forms, either. Here's a video resource, shamelessly stolen from Wikipedia, and represented in HTML format:

<video src="/movie.webm" poster="/movie.jpg" controls> </video>

Here's a more familiar one:

<img src="/images/avatar.png" alt="avatar" title="Get your pointer off my face">

Noun-adjective-adjective-adjective. Noun adjective-adjective-adjective. The <video> noun:

  • Fetches the resource at '/movie.jpg' of the host that served this HTML resource, and renders it at the place in the page concordant with the styling associated with it and the rest of the HTML.
  • Puts some sort of controls on this image, probably a play button, which, when clicked, causes the resource at '/movie.webm' to be fetched.
  • Renders the fetched video file in situ, replacing the still image, and plays any sound that comes with it.
  • Renders further controls, such as a scrubber, pause, volume slider.
  • Affects the right-click menu of the browser to provide appropriate options to a video: save video, get URL, get URL at this time, etc.

Plus anything else I've forgotten. The <img> noun has similar, albeit many fewer, effects: the image is fetched and rendered without user interaction. Indeed, if the image is an animated gif, it will animate! On its own!

This borderline-facetious set of examples serves to point out that the browser has already got verbs. The nouns (HTML elements) say which verbs you want to use (and where to put the visuals for the user's interaction), and the adjectives (the attributes of the elements) control the parameters that the verbs need. (Fetch which video? Play automatically?)

This is called semantics.

Semantics!

I'm going to define semantics as the use of nouns to imply verbs1. Form fields come with behaviour, and you say which behaviour you want through nouns, i.e. the choice of which input you use. Semantics also covers those adjectives that fine-tune the noun's behaviour by describing it further.

Semantics tell things how to behave based on what the resource contains. An HTML resource often contains framing. Semantics go into the HTML to tell anyone who cares which bit they can ignore. Semantics is the way you phrase things; it's how you describe the resource.

Consider:

<div id="content">

A web scraper can use this sort of thing to know what to ignore. Ignore is a verb. The HTML doesn't say "ignore this"; that's for the client to decide.

The browser isn't going to ignore it - but the browser doesn't care about this particular piece of semantics2. If the CSS says to do something to it then the browser will do that to it, but the browser doesn't do that by default.

The web scraper will skip anything outside this div - provided it knows what the 'content' ID means - and the browser will do nothing based on this ID because it hasn't been told to.

That right there is the answer. There is a difference between all the things it is possible for a browser to do and all the things the browser can already do. You can stick together awesome websites entirely using HTML5 and CSS3, but often you want behaviour that is not already built-in to the browser. Maybe you want div#content to have special styling or behaviour, but browsers don't come with that built-in.

And indeed, styling is just a form of behaviour - CSS tells the browser how to behave when it renders certain elements in certain configurations. JavaScript tells the browser how to behave when the user does things.

This is the point where people start putting JavaScript into templates. A specific form needs special behaviour, so you add a <script> tag and then output the form.

Smash! go the semantics. Fie! cry the tortured frontenders.

None of the behaviour you ever write is useful only once. I told you I'd get back to the reusability point. The JavaScript doesn't go in the template because it's not reusable, sure, but why is that a problem?

The problem is the JavaScript defines verbs. Semantic HTML is that HTML which uses only nouns, and lets the browser select the correct verbs.

JavaScript, therefore, is correctly a separate resource that adds verbs to the browser, and defines the nouns to which they apply. That's why everything eventually ends up as a JavaScript plugin; and sometimes as core browser behaviour.

Essentially, we're saying that JavaScript is a CSS file that defines behaviour, not styling. Where CSS tells the browser how to interpret the semantics of your HTML in terms of colouring, positioning and so on, JavaScript tells the browser how to interpret the semantics in terms of direct functionality - behaviour.

Indeed, not only should JavaScript never go into the template, it should never go into <script> tags either. Just like CSS should never go into <style> tags.

The Related Resource

Resources have related resources. If you strip out all the framing of your HTML resource (e.g. you render it as JSON instead) you are still going to keep many of the hyperlinks - the contents of any <a> tag inside the content div, perhaps some of the image sources. That's because the HTML framing is just rendering the content in a human-readable way3. The relations between resources are actually part of the resource itself, or at least metadata to it.

This is important because it addresses one of the main reasons people put JavaScript in templates: so that they can use the template language on the JavaScript, and thus build resource-specific JS that renders, e.g., a list of related resources when you click some "See related" button.

If the resources are related they should already be in the page. I seriously cannot stress that enough. Either the related resources are, or are not, relevant to this representation of the resource.

If the HTML went away and you were returning JSON, would you, or would you not, list those related resources as metadata, one way or another?

They cannot be part of the framing: the framing is consistent across the whole site! They are unique to this resource; and the style of list that is invisible until a button is pressed is unique to this type of resource.

But is "style of list" not an adjective about this list? Is list not a noun? Cannot you use the noun-adjective semantics to say, "This is a list of related resources, and it is of type pop-up-on-button"? HTML is amply equipped to represent this semantically: we even have the rel attribute to let you specify which button should activate the list.

Related resources belong in the page. Either as a hyperlink, or directly in the HTML. If you want to save bandwidth, you don't put the whole list in, but you put in a hyperlink placeholder instead. The important thing is that the HTML is accurately representing the resource. Just like the JSON would. Don't force non-browser consumers of your HTML resource to figure out how to run the JavaScript just to get related data.

e.g.

This|http://harvesthq.github.io/chosen/

is Chosen. You've probably seen it before. You start typing in a form field, and it lists all matching options, filtering as you type.

Chosen can either use an existing set of options, such as from a select box, or a URL from which to fetch options that match the string.

Both of these can be in the HTML before the JS even runs. The list of options is a related resource; it is simply represented in different ways. The first way puts all of the related resources in with the main resource; the second way puts a hyperlink to a single other related resource, from which they can be fetched when it's appropriate to do so.

At no time is it necessary to put this data into the JavaScript. JavaScript can read. Hell, the JavaScript should work on the JSON representation and all you'd have to change would be how it finds the data.

The Answer

The answer, then, is semantics. Of course it is. But it's what semantics means that turned out to be the difficult thing to define here.

Semantics is about saying what this resource is; it's metadata about the resource itself. Semantics allows the client to make the decisions about what parts of the resource are relevant and what parts are not.

It's exactly the same principle by which responsive web design works.

It's exactly the same reason you don't put inline CSS into your HTML.

It's exactly the same reason you've never written a video player, or had to decode the JPEG file format manually in JavaScript and blit the resulting bitstring onto a canvas element.

It's exactly the same reason you don't know how to launch a file browser dialogue box.4

It's exactly the same reason web components exist.

It's exactly the same reason JSON resources don't come with a stylesheet or JavaScript.

It's exactly the same reason we now have <nav> and <section> elements.

It's exactly the same reason we can produce screen-reader-friendly representations of HTML pages when the HTML page is correctly structured.

It's because you are describing what the resource is, and letting the client decide what it does.

*drops mic*



1 A separate discussion


2 Not all HTML is for the browser. HTML is a perfectly sensible representation format for machine use as well.


3 Perhaps better: the HTML framing is a machine-readable way of getting the browser to render the content in a human-readable way.


4 In principle. HTML5 advances in file handling mean it is more common for the file dialogue to be called directly from JS.

2014-04-27

Changing OpenElec's /tmp size


OpenElec has a limited /tmp partition. Very limited, i.e. 10MiB. Many things fall over because they need more than this on the occasion - especially if it's not the only thing using the tmpfs.

In order to change this you either have to hack around with automatically-created symlinks in startup scripts, or change it yourself.

The size of the /tmp partition is stored in /etc/init.d/01_mount-filesystem

mount -n -t tmpfs -o size=10m tmpfs /var

The problem is, that file is readonly. The reason it's readonly is that the entire root filesystem is stored in a squashfs partition.

To amend it, it is simply a case of unsquashing it, fixing it, and resquashing it.

Fix it

Pull the SD card out of your RPi (I'm assuming that's where you have it) and put it into your card reader. Let your system mount it.

You should have a SYSTEM drive somewhere on your computer. Lubuntu mounts it at /media/altreus/SYSTEM, so let's go with that.

$ mkdir squash
$ cd squash
$ cp /media/altreus/SYSTEM/SYSTEM SYSTEM.bak
$ unsquashfs SYSTEM.bak

Now we have a copy of the OpenElec root filesystem in a .bak file so we can undo it when we screw it up later. We also have the files themselves unpacked into squashfs-root. This is the default place unsquashfs puts them.

$ vi squashfs-root/etc/init.d/01_mount-filesystem

Change the file to have a better size /tmp. I used 500mb because my SD card is 8GB. Ignore the first instance of tmpfs in the file; we want to change the 10mb one.

$ sudo mksquashfs ./squashfs-root SYSTEM

It's important that you do this with sudo. The file /etc/shadow has permissions 000, making it only accessible by root. This is how we got it when we unsquashed it, so this is how we want to keep it. My /etc/shadow is 600, but they presumably wanted theirs to be 000. If we want to do the above step without root, we'd have to change the permissions so our user can see it - we can't change the permissions after it's squashed, so the only way to get a 000 file into the filesystem is to squash it with root.

Anyway, done.

$ cp SYSTEM /media/altreus/SYSTEM

Your new squashfs file will be mounted by OpenElec and your tmpfs will now be mounted with the size you gave it.

I'm not 100% certain this is stable. My Pi has started rebooting occasionally; but I might be giving it more than it can handle. It is an old model, but if I've introduced a bug because 500mb is too much, or something, I'm sure I'll get to the bottom of it and update the post,

2014-02-27

Code review time!

Look! A horrible piece of code in a horrible language in a horrible frame for a sickeningly twee ceremony that should have been made obsolete along with the Inquisition!

Let's review it.

Here's the code, with line numbers.

01    <?
02      function do_wed() {
03        if ($objections != true) {
04          function do_vow() {
05            $vow = 1;
06            do {
07              if ($richer === 1
08                  && $poorer === 1
09                  && $sickness === 1
10                  && $health === 1) {
11                function have_hold($a,$b) {
12                  ini_set('session.gc_maxlifetime','forever');
13              }
14              have_hold('husband','wife');
15              define('friend', true);
16              define('partner', true);
17              define('faithful', true);
18              if ($i = 'do') {
19                   $f = 'finger';
20                   $r = 'ring;
21                   $f = $f + $r;
22                   }
23               }
24               $vow = $vow + 1;
25              } while ($vow != 2);
26            }
27            do_vow();
28            $register = array_fill($details);
29            print_r($register)
30            return $kiss;
31            }
32          }
33        do_wed();
34    ?>

Let's go!

line 1

We use long tags here. <?php

line 3

Undefined variable $objections.

$objections != true better written !$objections. But this is not what you meant; you meant count($objections) == 0, since it will be an array of them

line 4

Don't define functions inside other functions.

lines 6, 25

You know how many vows you want. Use a for loop. Better, use an array of vows and populate it with two Vow objects, which represent the conditions each person agrees to. This means you can marry more than 2 people. The do_wed() function should take the people to wed as arguments. Use func_get_args() to loop over all of them, or (...$parties) in the next version of PHP.

Useless loop anyway. do_vow() should be called twice with the person currently vowing.

"Twice" is a western concept. This code is not internationalised.

lines 7-10

Undefined variables. None of these equals 1. It is unlikely that all four of these things would equal 1 at the same time. You want to test the party's agreement to these concepts, not the value of these variables. You need Person objects.

line 11

A function in a function in a function? This function takes two parameters and uses neither. Get rid of them.

line 12

This ini parameter takes an integer. 'forever' is not an integer.

line 13

This closing brace does not line up with the function definition on line 13. It does line up with the if on line 7, which implies you've forgotten to close the function, but scrutiny shows that you've misaligned the brace.

line 14

have_hold does not take any parameters any more.

This is exclusivist. Not all marriages are between a husband and a wife. These should be parameters to do_wed().

This function is run twice, both times with the same parameters. It should swap over for the second iteration.

line 16

'partner' is presumably the person we are not currently dealing with.

line 17

'faithful' is not a boolean value and should be configured per app. It needs to be a data structure containing parameters of faithfulness, i.e. boundaries.

line 18

This is always true. Remove this condition. $i is never used, so remove the assignment too.

lines 19, 20

Useless variables. Either accept them as parameters or use the literal strings directly.

line 21

If you'd not used these useless variables you'd realise you're trying to numerically add strings. . is the concatenation operator. What is a 'fingerring'?

$f is discarded. Just omit this entire block.

line 22

What is this supposed to line up with?

line 23

This closes the if that looks like it is closed on line 13. But it does not line up with it.

line 24

Better written $vow++, but we've replaced this with an array of Vow objects containing agreement parameters, so don't do this any more.

line 25

The only reason this would be a while loop is if you're just going to keep asking until both (all) parties agree. This is not how one should enter into a marriage.

line 26

This closes do_vow() but does not line up with it.

line 27

This is what should be run n times, once per party in the agreement.

line 28

array_fill takes three parameters. Register should be an object.

line 29

Syntax error - missing semicolon.

print_r is not the best thing to use here. Serialise this properly, perhaps with JSON so it can be consumed by an API or HTML so it can be styled and displayed properly.

line 30

Undefined variable $kiss. Kiss is a verb and should be a function.

lines 31, 32

These braces should line up with what they close.

line 33

Don't run a function when it is defined - that's not how you create a library.

This function could at least be parameterised with the names of the people being married. Isn't Etsy about crafts and hence personalisation?

2014-02-06

Model student

Models! Model trains, model students, model aeroplanes, model citizens. Fashion model, data model, business model. Ford Model T. Model number.

All these different uses of the word model have a commonality, the understanding of which is important to the understanding of what it is we mean when we talk about models in computing. This commonality may be considered the abstract meaning of "model": the meaning that exists behind all the real-world uses of it.

This concept is that of representation. Physical models are scaled-down representations of the things they model. A fashion model is really the representation of real people who would wear clothes (showing quite how divorced from reality fashion really is). A business model is a wordy representation of how the business will operate. Even the term "Ford Model T" is actually referring to the blueprint of all cars of that type: "Model" is referring to the type, not the car itself.

In computing, then, a model is a representation, a blueprint, a prototype that encapsulates the important details about the thing it is modelling. A good model will be a minimal but sufficient representation of the system it is modelling.

An easy example is the rolling of dice.

1d6

Dice are a familiar system to everyone, I hope. They neatly encapsulate our idea of randomness, at least that one we're taught in primary school, whereby the outcome of the system is not predictable from the input.

When we roll a d6 we expect to see one of its six faces pointing upwards but we don't know which one until it does so. Indeed on most dice we see the number represented as a pattern of dots; the number of dots being the number it shows.

This, if you're not used to thinking in these terms, is very specific. There are many extra features of a d6 that have nothing to do with the randomness of the d6. Every feature of the die except its shape (and mass distribution) can be altered and it would still exhibit the same properties of randomness.

Modelling systems, therefore, requires a keen eye about what are the underlying mechanics that allow the system to work, and what are the superficial parts of it that happen to be the case in this particular instance.

At its barest, a d6 is a system that, when run, produces a random integer from 1 to 6. The random distribution is even across all numbers: which is to say, the more times it is rolled, the more we expect to see the counts for each result become equal.

To model a d6, therefore, we simply need a system that can produce the same result.

Math.ceil(Math.random() * 6)

This piece of Javascript models a 6-sided die. Run it in your browser's console if you don't believe me. Run it lots. Here's what happened when I ran it 50 times1:

[2, 2, 6, 3, 5, 4, 3, 3, 2, 4, 
 1, 5, 3, 4, 6, 1, 6, 6, 4, 5,
 3, 1, 6, 5, 2, 4, 6, 6, 6, 5,
 3, 6, 1, 2, 3, 2, 3, 3, 1, 5,
 2, 5, 3, 2, 4, 3, 5, 6, 6, 5]

And sorted:

[1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,
 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
 5, 5, 5, 5, 5, 5, 5, 5, 5, 6,
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

At this level, Javascript's RNG2 should be roughly uniform in distribution, and with true randomness we should not expect uniform results at such small quantities. This distribution certainly seems random and within parameters for uniform distribution, so we've simplified the concept of a d6 into a minimal and sufficient algorithm.

dn

Not all modelling is about functionality. Much of data modelling is about just that: data!

A model like a d6 is fundamentally fairly useless. Indeed the idea of a d6 is just a very tight constraint on a very useful concept - randomness. It serves little purpose to model a d6 specifically, because the number of uses for a d6 is, in the grand scheme of things, small.

In the real world, we use models in computing for two basic purposes: retrieval and prediction. The first one is used to store representations of things that exist, such as people or products. Those are data models. We store these data models to let people log into a system, or to display a list of the products to customers. The second is used to try to work out what would happen in certain situations, based on the understanding that we have about the system in the first place - such as weather. These are functional models, of which the d6 above is one example.

In both situations the model is useless without the things being modelled having data. Properties of the objects store information about the objects and supply parameters to the algorithms we've devised.

We have hit upon the idea of parameterising algorithms. As noted, the d6 algorithm is somewhat useless because all it does is model a d6, which is of limited utility.

We can increase the utility by modelling the algorithm of any die. This is the second thing to be aware of when learning to abstract away the fundamentals from the real-world example. Earlier, we learned that we can turn a gazillion atoms' worth of die into a few electrons' worth of RNG by simply taking a number between 1 and 6 - this is the fundamental behaviour of a d6.

Now, we can look at other real-world dice and see how their behaviour relates to the d6:

  • A d4 picks a number between 1 and 4
  • A d6 picks a number between 1 and 6
  • A d12 picks a number between 1 and 12
  • A d20 picks a number between 1 and 20
  • A d100 picks a number between 1 and 100

It doesn't take a complex neural network to see the pattern here. A dn picks a random number between 1 and n.

If we wanted to model a d4 we could amend our d6 model:

Math.ceil(Math.random() * 4)

And we're done. Well done! You've invented job security. Now we've got two models for two different scenarios, and we know how to repeat the process for any die we like.

You should at least by now have the feeling I'm leading you to a point; and if you haven't guessed it yet I'll make the point.

We haven't modelled the pattern.

You can model dice until you're blue in the face but a good model captures the fundamental principles. The d6 model captured the fundamental principles of a d6, but we want a model that captures the fundamental principles of all dice. We need to model the abstract; the pattern that we spotted when we listed our dice.

Abstraction

"Abstract" is another one of those words that no one understands until they're faced with it, and then it confuses them until they understand it, and then they realise why it's been used all along. Most people know abstract as a form of art, and therefore associate it with meaningless shapes and random colours or something.

The abstract of something is those features about the thing that remain behind when you take the actual thing away. The abstracts are those conceptual things that mean you can describe it without actually having one; but which, if you had never seen one, would mean you may recreate a different thing.

This is what we did with the d6. We took the abstract concept of a d6, which is to randomly generate a number between 1 and 6, and then we recreated it in an algorithm that looks nothing like a die. It's a string of characters on a screen, now. It doesn't even roll. Or bounce.

Abstracting across many things is an art form in itself. For a start, the things have to be related, or else there's no real abstraction to make. Secondly, the degree to which things are actually related to one another can vary wildly, so knowing what level of abstraction to make is also a challenge. Thirdly, abstractions themselves may be similar; in which case you can start relating things that look the same in the abstract but are entirely unrelated in real life.

Now that I've thoroughly lost you, let me bring you back to earth. When we laid out all the dice we know and examined how they work we saw a pattern, which is that a die with n sides is an RNG between 1 and n. A pattern is something we can model; we model it with parameterisation.

Parameterisation is when you take a series of concrete examples and you remove one of the things from it and replace it with a variable; in this case, we replaced all the numbers with n3. The multiple types of die have been reduced to a single type, whose number of faces is now variable.

The number of faces the die has is now a property of the die. We have a model with data!

How do we represent it? Well in Javascript terms, parameters are given to functions, and objects have properties. We can divide the model into the two parts, functionality and data, by using a function to represent rolling a die and an object to represent an actual die.

function rollDie(die) {
    return Math.ceil(Math.random() * die.sides);
}

var d6 = { sides: 6 };
var d12 = { sides: 12 };

Here we have one function that will roll a die and return the result. Then we have two dice, each of which is a simple object with the property sides. Inside the rollDie function we use the sides property of something called die, which we can see is mentioned in the parentheses in the function definition. This together means that whatever is given to rollDie is assumed to be a model of a die, and to have a property sides that represents the number of sides it has.

rollDie(d6);
rollDie(d12);

If we provide a die model as a parameter to the rolling function, the rolling function can inspect the property of the model, extract the data, and use the data in the original algorithm. The algorithm has not, fundamentally, changed. It is simply the case that now it is parameterised; which is to say that instead of duplicating the function for every possible invocation, we can create data models that represent the thing we are dealing with, and provide the data to the function. We have abstracted the pattern (1dn returns a number between 1 and n) by making the variable, n, well—variable!

Verbs and nouns

The world is made of verbs and nouns. Systems verb nouns. People roll dice. People buy products. Computers authenticate passwords. Ecommerce systems suggest related products. Search engines search documents. URLs refer to resources.

Our data models therefore comprise verbs and nouns. Our d6 model was a verb4, but the noun was hard-coded. Hard-coding is the failure to parameterise. Instead of accepting a parameter, the noun - d6 - was assumed by the verb, because the verb was the whole of "roll a d6".

Our later model had a verb, rollDie, which could roll any noun that looked like a die. It had two dice, d6 and d12, which represented 6- and 12-sided dice, respectively. But the rollDie verb did not rely on those dice. The verb was abstracted from the nouns because with the new verb, anyone can create a die of any size and roll it:

var d27 = { sides: 27 };
rollDie(d27);

... so long as they have access to the verb part - the functionality - of our model.

By parameterisation we can turn a verb into a verb and a noun - "roll a d6" turns into "roll" and "a d6". By doing the opposite, we can turn a separate verb and noun into a single verb. Good modelling comes from learning when it is right to include the noun in the verb, and when the noun is a parameter. In some cases, the noun is fetched from somewhere else - a different verb (to fetch) and a different part of the model, with its own nouns.

In the real world, computer modelling is much more involved than this. Data are often linked to other data, such that if one changes another must reflect it. A shopping basket, for example: if you add an item to the basket, the total must increase. If you change the quantity of an item, the subtotal for that item must increase, and so must the basket total.

In that example, we already introduced nouns and verbs that we can model. Basket; item; total; subtotal; quantity. Some of these are things, and some of them are properties. Some are both! Items are real things, but the list of items is a property of the basket. The total is a property of the basket, and the subtotal is a property of the item when in context of a basket and having a quantity!

Sometimes we replace nouns with verbs: instead of storing the total, we may choose to calculate the total on demand based on the items.

Sometimes we replace verbs with nouns: when you roll a die, its value remains the same until you roll it again, but you should be able to ask it what value it shows. Our model could not do this. Alas! Our simple and sufficient model is no longer sufficient.

Sometimes we separate a verb into a verb and a noun: we turn rolling a d6 into rolling, and create a d6 to roll. This allows us to either roll a different die, or do something different to the die.

Sometimes we combine a verb and noun into a single verb: when we get the total of a basket, we don't separate it into "get" and "total"; if you change the noun here, the verb makes no sense!

Even a simple example like a die can escalate, and it is easy to get overwhelmed by the interactions—imagine the complexity of a "simple but sufficient" model of an entire shop!—but ultimately we are modelling nouns and verbs; all we have to do is parameterise correctly and find the correct abstractions.

Modelling systems

Hopefully you will have, by means of a concrete example and a lot of nebulous ideas, some concept of what it is to model things in computer systems. Ultimately, you will need some way of defining functions - a programming language - and some way of storing data - maybe a database.

Modelling a system therefore involves a good eye for what is a verb and what is a noun. That is to say, if you want to "roll a d6", does this suffice as a verb? Or is "d6" a noun? What if you want to "calculate the total"?

There is no cheat sheet here. Experience is your best recourse. But perhaps we can jot down some things to consider when modelling a system.

  • How big is the system? The d6 system was small, but the shop system was large. Can it be smaller systems?
  • How big are the nouns? A d6 has 6 faces, but the number 6 is enough to model that. Meanwhile, a basket has many items, but more information is needed; items are separate things, but faces are not.
  • Can you de-noun your verb? Does the verb make sense on other things? Does it actually? You can roll anything with sides; but can you get something other than a total from a basket? Can you get a total from something other than a basket?
  • Can you combine a verb and noun? Have you gone too far parameterising? If your shop has only one basket, the basket is not a parameter: the verbs can assume it.
  • Can your verb fetch a parameter, instead of accepting or assuming it? When you roll a die, perhaps you can establish elsewhere which die you are rolling. Perhaps the items on a basket know they are items; and there is only one basket, so you can get the items when you need them.

That's all for now on models. In future posts we will take a look at how data get around inside these systems, how we store them, and the transient nature of data while the system is actually running.

1 var a = [], i = 0; for (i = 0; i < 50; i++) { a.push(Math.ceil(Math.random() * 6)); } a;

2 Random number generator

3 Replacing all the ds with m may be a tempting thing to do here, but we shouldn't. That's because d has been constant across all of our examples; it simply serves to refer to the thing we are modelling in the first place. n is the new variable, because the thing it has replaced varies. d, being constant, is the thing our model is taking away entirely! It serves no purpose to know that we are rolling dice, any more; the d is therefore simply our reminder about what we are aiming for.

4 Commonly one would not copy-paste an algorithm into a console and run it. Instead, the algorithm would be packaged in a function and the user would be told to run the function. We did this later, when we parameterised, but to simplify and save on explanations, we avoided using a function in the first examples.

2014-01-23

Declaring your intent

In Perl it is necessary to declare a variable with my (or our) before using it. This behaviour is enabled with the strict pragma; and recently it has become the default.

Why?

Today's theme explores the idea that, when writing code, there is meaning in every statement. A good portion of code will comprise statements that actually implement the logic that causes the program to do what it does; but often overlooked are the statements such as these my and our declarations, which explain your intention for the variable before it's ever even used.

We'll look at some of the simpler reasons behind it, and later on we shall look at the less apparent ones.

Requesting

In these cases the intention you are declaring is simple: "I want to use this symbol."

The humble typo is the most obvious reason espoused for requesting new variables: it stops you using something else. But in Perl this actually covers at least three separate types of typo, all of which are solved by declaring things before you use them.

Misspelling it later

Misspelling the variable later on is the most common failure.

my $hard_to_spell_name;
$hard_tp_spell_name = 'cats';
Global symbol "$hard_tp_spell_name" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

Saying you want to use symbol A and then using symbol B is an error it is trivial to pick up on.

Misspelling it now

This is less common because you usually spell the variable name right when you create it because you've just spent ages trying to come up with the name in the first place. It's the same declaration, except you meant B and B, rather than A and A.

my $hard_tp_spell_name;
$hard_to_spell_name = 'cats';
Global symbol "$hard_to_spell_name" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

Forgetting

This requires a module, but declaring your intent allows the warnings pragma to tell you when you didn't use a variable you asked for.

Install warnings::unused from CPAN in the usual way.

use warnings::unused;
use strict;
use warnings;

my $foo;
my $bar = 'cats';

say $bar;
Unused variable my $foo at script.pl line 5.

Typing

By this I mean the type of the variable, not the typing you're doing when you make a typo.

In this case, you've declared an array and then accidentally used a scalar, or forgotten it's not an arrayref, or something along those lines. This is also the sort of protection you get from languages with a more C-style typing system, where you have to declare a variable by defining its symbol name and its type (int i;). Basically even though you spelled the symbol name right, you're using it wrongly.

my @array_of_cats;
push @$array_of_cats, 'cat';
Global symbol "$array_of_cats" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

"You're using it wrongly" is a perfectly reasonable statement here. That's because you declared what "right" is: "wrongly" is directly determined by your own my statement.

Overwriting

Reuse

If you are required to declare your variables the first time you use them then you will always do so. This means that the keyword my is not only used to declare that a variable is supposed to be available, but also to declare that the variable is supposed to be new.

Hence, if you try to introduce a variable that already exists, it tells you off, and thus you avoid clobbering an existing variable.

This behaviour is actually only a warning, so comes from use warnings; rather than use strict;. However, it is still a result of declaring your intent.

use strict;
use warnings;
my $cats = 'cat';
my $cats = 'horse';
"my" variable $cats masks earlier declaration in same scope at script.pl line 4.

Clobbering

It is easy to forget that the use of my and our produce lexical variables. These are variables that are only visible within the block in which they are defined (treating a file as a block for this definition).

With my you simply cannot clobber this variable from anywhere else. It is either a compiler error, or a different variable.

# This sub is useless and does nothing
sub one {
  my @cats;
  push @cats, @_;
  return @cats;
}

# This sub can't see @cats from the other sub!
sub two {
  push @cats, @_; # line 10
  return @cats;
}
Global symbol "@cats" requires explicit package name at script.pl line 10.
Execution of script.pl aborted due to compilation errors.

Or:

# This compiles, but is a new, separate array of cats.
# It is fractionally more useful than sub one.
sub two {
  my @cats = ('default_cat');
  push @cats, @_; # line 11
  return @cats;
}

A bonus of my is that when the block has executed, the variable is tidied up. That is, it falls out of scope. This also works in loop bodies, allowing you to trash and recreate data in every iteration by putting a my line inside the loop.

package Cat {

  my @cats;

  # Both of these use the same @cats - the one above!
  sub one {
    push @cats, @_;
    return @cats;
  }

  sub two {
    @cats = ('default_cat'); # whups, overwrote the whole set!
    push @cats, @_;
    return @cats;
  }
}

@Cat::cats = ('cat_one', 'cat_two'); 

Here, @cats is available to be clobbered anywhere in the Cat package1. However, because it is lexical, it is only available within that block2. Line 18 appears to be altering the same variable (@cats within the package Cat), but in fact this is creating a new package variable in Cat3.

The intent of using my to declare @cats therefore is to have a variable available throughout the package, but not to be available without the package.

There is a subtler declaration of intent. The position of this my statement declares that this variable is intended to be used throughout the entire package; therefore it should be applicable to the majority of the behaviour in the package. Were this not the intention, the my statement could be put in a block that encapsulates the variable and any places it is supposed to be used.

our is a similar beast, but it adds the ability for outsiders to also alter the variable, so long as they do so explicitly. The following code differs only in the use of our:

package Cat {

  our @cats;

  sub one {
    push @cats, @_;
    return @cats;
  }

  sub two {
    @cats = ('default_cat');
    push @cats, @_;
    return @cats;
  }
}

@Cat::cats = ('cat_one', 'cat_two'); 

Now, the variable @cats inside the package's block can also be accessed as @Cat::cats from outside of it. This is the intent you declare when using our.

1 Normally, the package would be defined in its own file, but this format is common for single-use packages, especially in tests.

2 When the package is defined in its own file, the file itself is the scope for such variables.

3 The reader should be aware that this is the reasoning behind the message Global symbol "$foo" requires explicit package name when strictures tells you off for an undeclared variable. Any variable name can be used, so long as it explicitly declares a package name like in this example. The difference between a lexical variable and a package variable is not in scope of this blog post.

2013-10-18

Fixing PHP

PHP is not a bad language.

Come back. Let me rephrase that.

PHP is a terrible implementation of what under the surface is a perfectly adequate, dynamic scripting language. Unfortunately it is implemented as a poorly-thought-out, logically bereft templating language, peppered with pitfalls and irritating inconsistencies.

But it can be fixed. It can be fixed with some simple, non-backwardly-compatible, sensible, welcome-to-the-real-world, feasible alterations. Let us begin.

1. Get rid of <?php ?>

The fact that PHP used to be a templating language is archaeologically apparent in this vestigial remnant from a bygone era. These tags are still all over the place because PHP is trying to be two things at once: both a templating language and a scripting language.

Once you grow up (or metastasise) and become a real language, you have to put away childish things.

These break-in-break-out tags were fine when PHP was designed to be parsed by a Perl script and run as a simple if-this, for-each-that dynamic HTML page generator. They remain fine, if you want to use PHP as the templating language it is. But if PHP wants to be taken seriously, the first thing it needs to do is stop hanging on to that I-can-do-templates-me attitude, and hand over to one of the many modern alternatives that have come along since the Internet was still finding its feet.

In fact there's no real reason PHP should not remain a templating language. After all, Mason (and indeed Template Toolkit) allow you to inject actual Perl into your web templates for those times when you simply can't be arsed to abstract your logic to where it's supposed to go. However, if PHP is going to behave like this, it needs to understand there is a difference between a PHP template and a PHP script.

Therefore I propose

1a. Create .php and .phpt file types

Or suchlike. .php files would naturally be PHP scripts and do away with that ridiculous <?php header that persists throughout PHP projects like a blight. .phpt or suchlike would be recognised as text files containing PHP segments, and they can use the old break-in-break-out paradigm to inject program logic into the template.

Of course it is not recommended in Mason or TT2 that you use actual Perl in your actual templates, because then the temptation is just to merge your views with your controller logic, and then you get into a Right Mess. Better would be simply to have a PHP port of TT2 or Mason, or use Twig or Smarty, and allow those to have their own this-bit-is-PHP-and-I'm-sorry directives.

1b. Make it a decent templating language too

It's a bit of an issue that PHP is stupid, as well. Modern templating languages offer myriad text processing options as part of the language itself. An example is the way Template::Toolkit allows you to filter output text through, e.g., the HTML filter, sanitising the data just before it's output.

PHP's best answer to this so far is user-written PHP classes that render PHP templates (two entirely different things written in the same language) by sanitising the data assigned to them at some time or other just before the template file itself is actually rendered.

That's just one example. PHP is not really a templating language any more either, because templating languages have evolved past the very basic output-string behaviour that PHP was originally tasked with. PHPT would need to catch up as well, and separate itself from PHP proper.

2. Stop pretending everything is an HTTP request

That PHP never left its template roots shows when you try to write command-line interfaces into your business software. You realise that you've been assuming throughout the code that the $_SERVER variable actually contains a URI of some description; that there's a protocol; that you're outputting HTML.

As soon as the first file that started with <?php and didn't contain a ?> was created, PHP was broken. As soon as you create a file that contains utility functions, or classes, you have a file that you can run without a webserver . As soon as you have that , you have a scripting language. That was the point at which people should have stood back, taken a look, and dived in to PHP 4 or whatever with the attitude that this time we're going to do it right.

No one did.

PHP still outputs HTML whenever it feels like it - see var_dump . It still has global, HTTP-centred variables. It doesn't do exit codes properly. The fact that exit and die are the same damn thing just shows that someone somewhere has completely misunderstood the point of these things. Heck I don't even know whether error messages actually go on stderr.

At about the time PHP was swapping its soft teething toy for its first big-boy spoon, the rest of the world was discovering that if you interface your HTTP server with your scripting language via stdout, you can maintain a separation of interests wherein your entire business logic is a collection of useful modules or classes or whatever, which when used in a web environment can be wrapped in an HTML layer and called a website - the layer being swappable for a CLI one that outputs the same information in a salient format. Or a JSON one, for public APIs, or even private, socket-based APIs that don't touch either HTTP or even TCP!

Nope. In PHP's land of unicorns and rainbows the whole world is an HTTP request. The world springs into existence when the request begins and disappears when the response is sent, and if anything happens to be left around since the last universe's brief lifespan came and went then that's just something we have to deal with as part of our new one. Trying to leverage command-line support, or non-HTTP support, into this assembly of spit and chewing gum is baby's first steak knife to PHP.

3. Use your own exception mechanism

Nothing is as irritating while working with PHP as when it throws its toys out of the pram. Now, I'm quite happy to accept that a parsing error is completely unrecoverable, but that is it, and absolutely it. Anything and everything that happens at runtime should be tryable, and anything that ever goes wrong should be catchable.

This expected feature of the language should not be taken as a comment on the sense in doing so. Trying to call $app->run() and catching it when it fails is going to be a bit less useful than letting it fail and tell you what was wrong.

But being able to catch it - now that's a tool we need. Since the original error mechanism was put in place a new, superior nonlocal return is available, and one which puts control in the hands of the user (without horrible set_error_handler hacks). Might as well use it.

4. Tidy up the root namespace

We get it. You like functions. Well, take stock and look around you. Not only have you implemented exceptions and then completely failed to use them, you've also implemented classes, interfaces, namespaces, closures and traits and failed to use those as well!

Right. For a start, having all those functions is confusing because there's no consistency in them. I'm not going to rewrite the entirety of A Fractal Of Bad Design , but I'm going to borrow from it here. Some of the functions have underscores, some don't ( strpos / str_rot13 ). Some take arguments one way, some the other ( array_filter($input, $callback) / array_map($callback, $input) ). Every time we use a built-in function we have to look up how it's spelled and what order the arguments are in and there are so. Damn. Many.

Secondly, certainly PHP has to lookup every called symbol in both the user's own symbol tables as well as the language's. That sort of thing is surely expensive, especially if this language is aimed at beginner programmers who are only ever going to use 10% of the functions 90% of the time.

Thirdly, every single built-in function or class is just another name that the user can neither use for their own functions nor override to replace. Sure, PHP has modules that you can jump through hoops to install at the C level, but who needs that?

All of this might be forgivable if this overabundance of global functions covered literally every possible operation a user could conceivably want; but it doesn't! Worse still, a majority of them can trivially be abstracted into one generic function that takes a callable. All the array_* functions, for example: the sort functions are all just user sort with different sort procedures passed in. The filter functions are all the same with different identity functions passed in - and, for a specific example, recently I needed a version of array_search that took a custom identity function! How dare I want the key of a value that has a sub-value that matches my input! PHP says I may not do that and therefore I may not do that.

Ridiculous. The fact the PHP team haven't abstracted this stuff sensibly does not speak in favour of their ability to write the code behind PHP in the first place, does it? It doesn't take a genius to tidy all this up, and yet no one has - nor has anyone written the tidied version alongside. That attitude of constant implacability hurts the language and the community and the reputation of the people behind it, and damages confidence.

Hypothetical inefficiency aside it's just poor maintenance. The language has a mechanism by which to automatically find class files when a non-existent class is requested. So, put all the less-common functions in autoloaded classes and put those classes somewhere discoverable. Everyone else is modular these days. Is it stubbornness or incompetence that's leaving PHP behind?

Also, quit adding useless prefixes or suffixes to your functions. I know you're going to push onto an array because you push onto arrays. So call it push , not array_push .

Also also, don't fob us off with mb_ crap. Fix your Unicode. There's no excuse whatsoever for a language prevalent in the 21st century to be coded by people who can't cope with Unicode, or its various representations. I know, it's hard. Writing a language is hard. If you can't, don't.

5. Expressions, for the love of god

PHP's compiler is apparently written by chimps. Do we still really believe that there is a difference between a statement and an expression? Do we really still have to have "language constructs" (PHP's term) that are parsed and treated differently from any other expression?

No. Maybe back in the stone age we did things that way but here in the age of enlightenment we have come to realise that the only real difference between a statement and an expression is that a statement actually has a persistent effect.

In PHP, for example, the x or y construct has become possible. Except when y is not an expression - which is 90% of the bloody language. return is not an expression. continue is not an expression. die is not an expression, but it is special-cased to work with or , and has been since before we even had the x or y construct in the first place. Because Perl did it. exit is not an expression and does not have the same special-casing in the language that die does, even though it is the exact same thing .

Another example. Normally, () is used to group things, i.e. to override precedence. I'm quite OK with the way it's required for function calls, conditions etc. In PHP, however, these seem to form a magical, ref-breaking construct that is parsed under its own rules. That is to say, in PHP, $a is not guaranteed to be the same as ($a) . That's because PHP is a language whose every feature is a special case in the parser. If $a is a ref, ($a) is not any more.

So what's the point of all these examples? Well hopefully they all bring up the obvious question: why? Why are these things different? For a given X, why does the way you use X have to be allowed by the compiler?

A language built out of expressions is obvious - expressions are what make the operands to operators. And an operator is itself another, larger expression. Suddenly the parsing should seem trivial; you look at a line of code, decide which operators and expressions it contains and run them in a well-defined order. You can see it in the language that when you use an expression it behaves exactly like you'd expect any other expression to behave. At least, it compiles like that - runtime behaviour may be bizarre.

It's trivial to draw up a simple table of PHP's main features in terms of expressions; in all of this the reader is invited to consider in what situations these do not work in PHP's current implementation, and what it means about the compiler for that to be the case. In the table, X and Y mean any expression, i.e. literally anything that compiles.

Construct Meaning Examples Notes
${X} The value referred to by X ${$foo} # $$foo

${f()}

$a = &$b; ${$a}
When X returns a string, look up that variable. Otherwise, treat it as a reference. When X is another variable, the {} can be omitted.
X [Y] Return the element Y from the array X $array['foo']

f()['foo']

x()[y()]

['a', 'b', 'c'][0]
This implements the "feature" that is "special" in PHP 5.5 of array literal dereferencing (example 3)
X() Run the closure X f()()

$x()

['a' => function() {}, ...][$x]($y)
Actual functions like f() are separate, since f is not a valid expression.
X or Y If X is false, run Y $type = $config['type'] or continue;
X and X If X is true, run Y $val = $config['x'] and return $val;

The reader should take away from this at least the awareness that all of the examples in this table would already work if PHP used a proper expression-based grammar; but instead we have been sold these things piecemeal over the past few versions as new features important enough to go on the front page of the release notes.

6. Complete the complement of magic methods

__toString is a pretty good method. It uses an established consistent convention that double-underscore means special-to-PHP. It uses dynamic dispatch so that if it exists it's used, and if it doesn't there's no "default" behaviour - it just complains.

There are also __isset , __set , __get etc. These do what you'd expect: test for setness, default setter, default getter...

Where's __toInt ? __toFloat ? __toArray ? Why is __toString represented and not the others? Furthermore, if you can use a string as an integer and only complain after this conversion, why don't you use __toString first and then try to turn the result into an integer?

Consistency is paramount in a structured, logical world such as programming. Expectations being formed and then violated is the worst of things. It's the Principle of Least Astonishment . Use it.

7. Stop pretending you have types. Or: Have proper types.

What in god's name is this? (int) $val

"Casting," I hear you cry. "It is casting the type of $val to int !"

"Rollocks," I reply in a PG way. For casting is the act of converting a type through known mechanisms to another type. But we don't have __toInt to convert all possible $val s to int , and we don't have mechanisms to convert all possible types in place of int in the first place.

Nope, it is another special case in the PHP compiler, where someone saw another language doing something and implemented the same syntax but completely failed to understand what it was doing, and implement the theory rather than the practice .

What about this? function foo(array $arg)

"Type hinting!" comes the call from the thousands-strong crowd. But if I ask them to explain this mechanism they roll out the usual approximately-right answers they read in the documentation but cannot explain the concept.

PHP is a dynamic language; that's one of its strengths. Dynamic means that PHP exhibits certain runtime features that static languages require at compile time. For the purposes of this section the dynamic features we are interested in are:

  • Runtime method lookup. If an object can perform a method, the method will be performed. If not, a runtime exception is thrown. Inheritance introduces methods from other classes into the object's symbol table, assisting DRY, but otherwise there is no reason every method could not simply be dynamically dispatched to a function somewhere using magic.
  • Automatic type conversion. If an operation requires a string and an integer is provided, or an integer and a string is provided, or a string and an object is provided, PHP will transparently perform the conversion at runtime and only complain if it didn't work.

Now apply your theories about type hinting to this. What can it do but cripple PHP's dynamicity? Duck typing is the principle by which, if you have dynamic method lookup, an object only has to be able to perform a task in order to be considered suitable for the task. That is, until runtime, until you actually try to run the method on the object, there is no way to know that the object cannot do it. If there were you would have sacrificed dynamic method lookup for static compilation already. Type hinting for classes is completely non-semantic if you have the option of duck typing, because there is literally nothing special about your particular class that makes it important that an object is of this type.

How about non-object type hinting? Well you can't actually do that, because int and string aren't types to hint about - probably because any scalar can be used as a string! And any string can be used as an integer! So why enforce the check? Or, from the other perspective, why aren't they types? I can cast to them; why can't I require them?

And why can I require classes but not cast to them?

If we look at the whole type system of PHP as a looser concept than PHP makes it, it makes a lot more sense.

Classes are not some promissory aspect of a piece of data that ensure the datum can perform tasks, but an organisational structure allowing you to introduce functionality from other classes into new ones by inheritance or merging traits. From this perspective, duck typing makes sense - you don't need a specific class to ensure an object can perform tasks; any class can theoretically do it, especially if it consumes a trait that provides it. Type hinting for classes, from this perspective, is logically inconsistent with traits - which are considerably more useful - because you can't test for what a class can do , which is the only thing that's important.

Similarly, basic types are not remotely based on reality either: even if you could ask for a string or an integer, assuming we get the rest of the family of magic methods, any object could have __toString or __toInt . And even if we don't get __toInt , a string can be an int . So if you ask for an int, you could give a string, and you won't know the data the string contains are bad until you try to use it as an int. And you should be able to give an object to a parameter that wants an int simply by casting it to a string and then an int - something PHP should be doing for us already.

Hopefully the reader has spotted the inconsistency between type hinting and a dynamic language: the language cares about what the datum can be , but the type hinting cares about what the datum is . There is absolutely no logical association between what the datum is and what it can do , because Dyamic Point 1 allows for any object - independently of class , thanks to traits and __call - to be able to perform any task; and Dynamic Point 2 allows any type - thanks to __toString and the proposed __toInt and __toArray - to be any other type.

If you're going to have type hinting, therefore, you have to have statically compiled types: you have to enforce the relationship between type and behaviour; otherwise, your type hints are just extra bytes in a file that are going to appear in a commit log at some point in the future deleted by some frustrated developer trying to implement a trait and use it in a method that doesn't expect it.

That's all

I'm sure I could find many more examples of things PHP can fix at a basic level and stop being so irritating about simple things. You'll note I didn't complain about the tiresome conflation of array and dictionary, despite it being the biggest misunderstanding in programming history.

But surely this is a start? We can keep most of the PHP grammar; the syntax doesn't change (much); and so many of the pitfalls and gotchas that a programmer falls into will be resolved in one fell swoop!

As with many things PHP has reached sufficient mass that nothing important will ever change, because the politics of the mailing lists drag everything down, with half-right people expressing their ill-informed opinions on stuff that really, actually matters.

And there's the rub; the alternative is to start again. Start a new, similar language, on the right foot. A language that doesn't have those tags; a language that interfaces with the standard streams properly; a language detached from the web server, that doesn't assume a web environment; a standalone, dynamic, modular language, easy to learn, easy to stick together, easy to run on any decent OS and the not-decent one.

But why? We already have Perl and Ruby and Python. The amount of changes required to PHP means that literally the only reason to improve it at all is that it's associated with the name PHP. Installing it, upgrading it; these things would take an identical amount of effort as simply using an alternative. It wouldn't be sufficiently backwardly compatible that existing PHP code would run, because all the crap you have to do in existing PHP code wouldn't be possible or necessary.

It can still be done, though. But it won't.

2013-07-11

Introducing Pod::Cats

You may notice the title of the blog has changed to Pod::Cats

Pod::Cats is a module I wrote for the original incarnation of the blog at podcats.in (no longer a thing).

The module extends POD conceptually, allowing for arbitrary C<elements> and =commands , and adding new +begin and -end commands.

Check out the docs , and the github repository if you want to help out.