The optimizations implemented in OPT, mostly concern the main parser class and the compiler. The first one is the part that is loaded every time the script wants to use templates, so it must provide huge quality standards. There are no sophisticated algorithms, so the speed improvements are:

  1. To minimize the size of the code.
  2. To minimize the amount of points that try to access the hard drive.
  3. To use the fastest elements of the language.

The point is this class must be as light as possible; I would even say: almost transparent for the script.

The situation is quite different, when it comes to the compiler. It is loaded from time to time, so it can be big and splitted for more files (of course without exaggeration). The key role is played by the limits set in the PHP configuration, because in the pesimistic situations, the compiler runs out of resources to complete the task. Unfortunately, some of the new features cost a lot, and if the users wanted XML tag parsing, they must face the consequences: every such tag requires its own node object in the document tree and they need some preparation tasks to be done. My role is to minimize the time and memory necessary to do this.

The elements that can be exhausted during the compilation are:

  1. The stack
  2. The memory

The algorithms used in the compiler operate on a data structure called tree, and the use lots of recursion. It consumes the stack, because the number of currently executed methods and functions grows quite fast. In PHP, the default limit for the stack depth is 64 and if you are going to use OPT, the worst you can do is to lower it to ridiculous values like "16" that would affect not only OPT, but also many other scripts :). However, every recursion has an imperative version, constructed for example by queues. Maybe it does not consume less memory, but at least it does not depend on the limits. Such changes have been applied to many compiler methods so far, but some of them still require my attention, because they are not so trivial:

  1. Processing the tree by the instruction processors
  2. Expression compilation

In the first one the situation is clear: we get a node of some instruction and want to process it. We find its processor and redirect the node to it. It generates some PHP code and decides, what to do with its children. By default, it wants to process them, too. Here comes the recursion and the process starts from the beginning. There are some difficulties. The algorithm is split over many classes, and the programmers may create their own. What is more, some instructions need to start the subnode processing immediately, because their next decision depends on the code generated by the children. Obviously, this eliminates queues. An elegant replacement could be made, if we had an access to the low level programming.

Let's take a look, how it affects the compilation process. We have a huge template, with the tags nested up to the level 30. OPT with the compiler can take another 10 calls, and in the deepest place we have a complicated expression with brackets. To sum up, the library uses approx. 45 calls from 64 available, which is more than 70% of the limit! I know this case is quite pesimistic and in normal life such HTML documents are very rare, but it shows, how demanding the compilation can be.

If we can't remove the recursion without ugly tricks which would be never accepted by the instruction authors, how we can get around it? The idea is not to remove recursion completely, but we keep it in some places. I use here the fact that the immediate access to the subnode compilation results is necessary only for some instructions, and for example the HTML tags do not need it at all. If we get a random node from the tree, we have a big chance that it would not need a recursion here. So, why don't we provide a dual version of this algorithm? By default, it is executed with a queue, but for more sophisticated needs, some nodes can switch it to recursion. Now it can happen that a tree with a depth 30 is processed in one single loop, with no recursion! I predict that in normal situations, the algorithm will consume only 3, 4 spare "items" in the stack, which is fully acceptable.

The second issue concerns the memory usage. It's not so tragic here, but we can notice a significant rise when compared to the OPT 1.x compiler, even with using some patents from that version. The most memory is consumed by the tree. I've measured some single nodes recently and the results were from 2 kilobytes for a single node to 3 kilobytes for a node with one attribute. When the instructions start to add the PHP code to the buffers, the things are getting worse, but it happens only in some nodes in the whole template. We can safe some memory here. The objects use various arrays, which are initialized just in the beginning, for example:

abstract class optScannable extends optNode implements Iterator
{
	protected $subnodes = array();

In most of the nodes, the arrays are empty, so it it no sense for them to exist all the time. I fixed the code so that it creates an array only when it is necessary. From the declarations, the = array() parts were removed. The effect is quite interesting. On a single array, we safe 400 bytes. The new results are:

  1. Single node with no attributes: 1400 b.
  2. Single node with one attribute: 2350 b.

The additional memory may be safed by releasing the huge data that will not be used anymore, for example the template content.

What do we have to remember, while using the new OPT, when it comes to the requirement issues:

  1. With the terribly low limits, the compiler will always crash. I thing 16 megabytes for the script (the default limit) is big enough both for the compiler and the script.
  2. The best place to run the template parsing is the end of the script. We can delete some unncessary data earlier to free some memory.
  3. The templates using the inheritance are more demanding, especially when overwriting snippets, because the compiler must keep also the old versions in the memory.
  4. The quirks mode will consume less memory (simpler tree - no XML tags), but it is not implemented yet.
  5. The compilation is run once for a long time. During the normal execution, we can forget about the limit issues etc.

I think this note shows, how difficult the library creation can be. The key is to know, what we can use in current place and that there is also the rest of the script which uses the same memory and has its own requirements.