Posts Downloads Tags About RSS feed of posts

Look Ma, No Data Structures!

Theory and Hand-waving

Data structures are an integral part of programming. As Niklaus Wirth put it, algorithms + data structures = computer programs. They are to computers like time and space are to us. We need both.

Well, both of them, or one that contains both. Turns out that our physical universe is actually better understood as a four dimensional space-time, in which events in time can actually be seen as static things spanning both time and space. This isn't even the final good point of view, but it suffices for our analogy.

We already have a similar concept in computation. It even has the same name, but we usually only mention it when talking about the space-time tradeoff. This refers to the situation, where development of an advanced algorithm or data structure hits a point where improvement in one leads to detriment of the other. It may not sound like it, but this is generally a good sign.

The physical space-time point of view makes it easy to see how data structures and algorithms could be unified. A data structure holds information spanning mainly in the space-direction (memory) with very short constant time steps required in the time (computation) direction. In the case of more algorithmically heavy space-time inhabitants, the same data is built during runtime in the time direction. In more traditional terms, one might for example construct a tree for traversing a graph, or one might devise a recursive algorithm for traversing in the same order, which ends up building the same tree implicitly to call frames or continuations.

The logical extreme of the temporal direction would be an algorithm which represents all data implicitly using simpler algorithms. Surprisingly, not only is this possible, but it is actually quite fun and even has somewhat practical applications.

Practice

We'll use functions as the building blocks of algorithms. We can only use functions which combine the arguments given to them. Such functions are called combinators. The simplest combinator is a function of one value, which just returns the value it was given. This is usually called the identity function.

let id = function (a) { return a; }
(define id (lambda (a) a))

The next step up the complexity ladder could be functions of two values, which only return one of them. These could be called projection functions. Below are the binary (two argument) projection functions.

let b1 = function (a, b) { return a; }
let b2 = function (a, b) { return b; }
b1(11, 22) // = 11
(define b1 (lambda (a b) a)) 
(define b2 (lambda (a b) b)) 
(b1 11 22) ;; = 11

Suppose we know a value to be a two-argument projection function. We could call this state of affairs knowing its type, if we are so inclined. We can use this knowledge to call it with two known arguments and figure out which projection function it is. We can therefore use binary projection functions to represent one bit of data. This would be a perfect way to represent boolean values. Lets rename the functions and define a few operations on them. We'll prefix the names with an l to to avoid clashing with existing values.

var ltrue  = function (a, b) { return a; }
var lfalse = function (a, b) { return b; }
var lnand  = function (a, b) { return a(b(lfalse, ltrue), ltrue); }
var lnot   = function (a)    { return lnand(a, a); }
var land   = function (a, b) { return lnot(lnand(a, b)); } 
var lor    = function (a, b) { return lnand(lnot(a), lnot(b)); } 
lor(lfalse, ltrue); // -> ltrue
(define ltrue    (lambda (a b) a))
(define lfalse   (lambda (a b) b))
(define lnand    (lambda (a b) (a (b lfalse ltrue) ltrue)))
(define lnot     (lambda (a) (lnand a a)))
(define land     (lambda (a b) (lnot (lnand a b))))
(define lor      (lambda (a b) (lnand (lnot a) (lnot b))))
(lor lfalse ltrue) ;; -> ltrue

This approach seems to work well for values that are just members of a finite set. Typically we need data structures that contain other values in them. The simplest such value is the n-tuple, which is something capable of holding n values. We will encode this as a function which receives the n values, and returns another function accepting an n-argument projection function. The projection function is then used to select which of the values to return. We'll use a 3-tuple as an example. Trinary (3-argument) projection functions are prefixed with t. The constructor for a 3-tuple is called tri.

var t1    = function (a, b, c) { return a; }
var t2    = function (a, b, c) { return b; }
var t3    = function (a, b, c) { return c; }
var tri   = function (a, b, c) { return function (p) { return p(a, b, c); }}
tri(1, tri(21, 22, 23), 3)(t2)(t1) // -> 21
(define t1  (lambda (a b c) a))
(define t2  (lambda (a b c) b))
(define t3  (lambda (a b c) c))
(define tri (lambda (a b c) (lambda (p) (p a b c))))
(((tri 1 (tri 21 22 23) 3) t2) t1) ;; -> 21

Now that we have projection functions, which are like enums, and tuples, which are like records or arrays, we can combine them to design yet more complex data structures. A simple example is a linked list, which can be used to hold zero or more values. It allows items to be added or removed from the front cheaply. A linked list node is either a value representing the empty list, or a value node holding one arbitrary value and the rest of the list. We therefore need a way to distinguish whether the node in question is the last node, and ways to extract the value and rest of the list from a value node.

Here we start to have more options for the implementation. An obvious choice would be to use the 3-tuples defined above. The first value could be a boolean denoting whether the node is the last one. If it is false, the second value could be the stored value and third one could be the rest of the list. We'll make a minor optimization to this and make the emptiness tag more implicit. lnull is the empty list, lpair constructs a value node, lnullp checks whether a list node is empty, lcar extracts the value of a value node and lcdr extracts the rest of the list. The names car and cdr are holy relics from LISP lore, and it was customary to suffix predicates, or functions returning a boolean, with a p before the question mark was allowed in code.

var ltrue    = function (a, b) { return a; }
var lfalse   = function (a, b) { return b; }
var t1       = function (a, b, c) { return a; }
var t2       = function (a, b, c) { return b; }
var t3       = function (a, b, c) { return c; }
var lnullp   = function (a) { return a(t1)(true, false); }
var lcar     = function (a) { return a(t2); }
var lcdr     = function (a) { return a(t3); }
var lnull    = function (a) { return ltrue; }
var lpair    = function (a, b) { return function (c) { return c(lfalse, a, b); }} 
lcar(lcdr(lpair(11, lpair(22, lpair(33, lnull))))); // -> 22
(define ltrue  (lambda (a b) a))
(define lfalse (lambda (a b) b))
(define t1     (lambda (a b c) a))
(define t2     (lambda (a b c) b))
(define t3     (lambda (a b c) c))
(define lnullp (lambda (a) ((a t1) #true #false)))
(define lcar   (lambda (a) (a t2)))
(define lcdr   (lambda (a) (a t3)))
(define lnull  (lambda (a) ltrue))
(define lpair  (lambda (a b) (lambda (c) (c lfalse a b))))
(lcar (lcdr (lpair 11 (lpair 22 (lpair 33 lnull))))) ;; -> 22

Enumerable values, tuples and lists are now available. Let's finish off by defining some numbers. Traditionally numbers are represented by functions by using Church encoding, which encodes the value n by applying a given function n times to an argument. In other words, they encode unary numbers resulting in O(n) computation time on the number n. We'll use a binary representation and optimize most of the operations on them to more logarithmic range.

A number will be a list of booleans. The first value of the list will be the least significant bit. This makes writing operations on numbers easier. For example 13 can be written as 2³ + 2² + 2⁰, so the bits 0, 2 and 3 are 1, making the binary representation 1101. We want the lowest bit first in the data structure, so 13 will be represented as lpair(ltrue, lpair(lfalse, lpair(ltrue, lpair(ltrue, lnull)))). 0 is just lnull.

Incrementing a binary number in this order is easy. 0 turns to 1, a 0-bit turns to 1, and a 1-bit turns to a 0-bit and recursively incrementing the rest of the number. This is left as an exercise. We'll go straight to addition, multiplication and exponentiation.

var ltrue    = function (a, b) { return a; }
var lfalse   = function (a, b) { return b; }
var t1       = function (a, b, c) { return a; }
var t2       = function (a, b, c) { return b; }
var t3       = function (a, b, c) { return c; }
var ltruep   = function (a) { return a(true, false); }
var lcar     = function (a) { return a(t2); }
var lcdr     = function (a) { return a(t3); }
var lnull    = function (a) { return ltrue; }
var lnullp   = function (a) { return (ltruep(a(t1))); }
var lpair    = function (a, b) { return function (c) { return c(lfalse, a, b); }} 
var zero     = lnull
var one      = lpair(ltrue, zero)
var thirteen = lpair(ltrue, lpair(lfalse, lpair(ltrue, lpair(ltrue, lnull))));
var lpair    = function (a, b) { return function (c) { return c(lfalse, a, b); }} 
var ladd     = function (a, b) { 
   if (lnullp(a)) { 
      return b; 
   } else if (lnullp(b)) { 
      return a; 
   } else if (ltruep(lcar(a))) {
      if (ltruep(lcar(b))) {
         return lpair(lfalse, ladd(lcdr(a), ladd(lcdr(b), one))); // carry
      } else {
         return lpair(ltrue, ladd(lcdr(a), lcdr(b)));
      }
   } else {
      return lpair(lcar(b), ladd(lcdr(a), lcdr(b)));
   } 
}
var lmul     = function (a, b) {
   if (lnullp(a)) {
      return zero;
   } else if (lnullp(b)) {
      return zero;
   } else if (ltruep(lcar(a))) {
      return ladd(lmul(lcdr(a), lpair(lfalse, b)), b);
   } else {
      return lmul(lcdr(a), lpair(lfalse, b));
   }
}
var lexptAux  = function(a, b, c) {
   if (lnullp(b)) {
      return c;
   } else {
      return lexptAux(lmul(a, a), lcdr(b), (lcar(b))(lmul(a,c), c));
   }
}
var lexpt    = function(a, b) { return lexptAux(a, b, one); }
function lnum2string(n) {
   if (lnullp(n)) {
      return '';
   } 
   return lnum2string(lcdr(n)) + (lcar(n))('1', '0');
}
parseInt(lnum2string(lexpt(lexpt(thirteen, thirteen), thirteen)), 2);   
// 1.8047894343662854e+188
(define ltrue    (lambda (a b) a))
(define lfalse   (lambda (a b) b))
(define t1       (lambda (a b c) a))
(define t2       (lambda (a b c) b))
(define t3       (lambda (a b c) c))
(define ltruep   (lambda (a) (a #true #false)))
(define lcar     (lambda (a) (a t2)))
(define lcdr     (lambda (a) (a t3)))
(define lnull    (lambda (a) ltrue))
(define lnullp   (lambda (a) (ltruep (a t1))))
(define lpair    (lambda (a b) (lambda (c) (c lfalse a b))))
(define zero     lnull)
(define one      (lpair ltrue zero))   
(define thirteen (lpair ltrue (lpair lfalse (lpair ltrue (lpair ltrue lnull)))))
(define (ladd a b)
   (cond
      ((lnullp a) b)
      ((lnullp b) a)
      ((ltruep (lcar a))
         (if (ltruep (lcar b))
            (lpair lfalse (ladd (lcdr a) (ladd (lcdr b) one)))
            (lpair ltrue  (ladd (lcdr a) (lcdr b)))))
      (else
         (lpair (lcar b) (ladd (lcdr a) (lcdr b))))))
(define (lmul a b)
   (cond
      ((lnullp a) zero)
      ((lnullp b) zero)
      ((ltruep (lcar a))
         (ladd (lmul (lcdr a) (lpair lfalse b)) b))
      (else
         (lmul (lcdr a) (lpair lfalse b)))))
(define (lexpt-aux a b c)
   (if (lnullp b)
      c
      (lexpt-aux (lmul a a) (lcdr b) ((lcar b) (lmul a c) c))))
(define (lexpt a b) 
   (lexpt-aux a b one))
(define (lnum->digits n)
   (if (lnullp n)
      '()
      (append (lnum->digits (lcdr n)) 
              (list ((lcar n) #\1 #\0)))))
(string->number (list->string (lnum->digits 
   (lexpt (lexpt thirteen thirteen) thirteen))) 2) 
;; 180478943436628555303336751114405012462595328815514062867747656044277
;; 930781610134138789049399256868999679563587940457575571005022396505002
;; 166104757041282978485607382742190819719615210838573

Notes

A similar technique is used to build most non-trivial data structures in owl.

Encoding of simple values as lambda expressions goes way back to work of Alonzo Church, Alan Turing and many others. Many of the best programming techniques predate computers.

You may have noticed that data is defined generally as lambda expressions, not applications of them. We want the expressions denoting data data to be static with regards to evaluation, so we need to use what is called a normal form. Programming languages typically evaluate functions, or lambda expressions, to weak head normal form. These expressions are lambdas, but may in their bodies contain expressions that could be evaluated.

Evaluation order and encoding of control structures are interesting enough to deserve their own post later.


Posted: 26.9.2020 by aoh

#post #programming