Data types#

All data types are encoded as one or more consecutive 16bit words (cells). Most significant bit of each 16bit word is reserved for memory manager usage and remaining 15bits are used for identifying data types and encoding data values. Most significant data bits of first word identifies data type. Most significant bit is referenced as bit 15 and least significant bit as bit 0.

Implementation mostly uses static inline functions defined in header file, instead of preprocessor definitions, to provide API more suitable for foreign function interface.

Number#

Data type representing signed integer values of arbitrary length. Although encoding itself doesn’t limit value size, to provide easier interface for data manipulation, values are limited to signed integers represented with 32bit dual complement encoding. Single number is encoded with one or more words:

address

data

n

m 0 1 s v v v v v v v v v v v v

n + 1

m 1 v v v v v v v v v v v v v v

n + i

m 1 v v v v v v v v v v v v v v

n + m

m 0 v v v v v v v v v v v v v v

where:

  • Bit 15 of each word (m) is reserved for memory management.

  • Bit 14 of first word (0) identifies number type.

  • Bit 13 of first word and bit 14 of other words represents “more follows”. Only in last word (n + m) is this bit set to 0.

  • Bit 12 of first word (s) identifies sign.

  • Rest of bits are used as dual complement encoded integer value where word n contains most significant bits and word n + m contains least significant bits.

Pair#

Data type representing two addresses referencing word locations (usually known as cons cell). Address values are limited to 14bit unsigned integers which enables encoding of this type with two words:

address

data

n

m 1 0 a a a a a a a a a a a a a

n + 1

m a b b b b b b b b b b b b b b

where:

  • Bit 15 of each word (m) is reserved for memory management.

  • Bits 14 and 13 of first word (10) identify pair type.

  • 14 a bits encode first address value.

  • 14 b bits encode second address value.

String#

Data type representing zero of more 8bit values. Single string is represented with one or more words:

address

data

n

m 1 1 0 0 s s s s s s s s s s s

n + 1

m a a a a a a a a b b b b b b b

n + 2

m b c c c c c c c c d d d d d d

where:

  • Bit 15 of each word (m) is reserved for memory management.

  • Bits 14, 13, 12 and 11 of first word (1100) identify string type.

  • 11 s bits represent string length (maximum string length is 2047).

  • Bits a, b, c, … represent 8bit values.

This encoding schema tries to optimize memory usage but at the same time introduces significant overhead in manipulating string data.

Symbol#

Symbols are used as human readable labels associated with data values. They are encoded as 8bit characters similarly as string data:

address

data

n

m 1 1 0 1 s s s s s s s s s s s

n + 1

m a a a a a a a a b b b b b b b

n + 2

m b c c c c c c c c d d d d d d

where:

  • Bit 15 of each word (m) is reserved for memory management.

  • Bits 14, 13, 12 and 11 of first word (1101) identify symbol type.

  • 11 s bits represent symbol length (maximum symbol length is 2047).

  • Bits a, b, c, … represent 8bit character values.

Builtin function#

Builtin functions are referenced by function’s index and encoded with single word:

address

data

n

m 1 1 1 0 0 i i i i i i i i i i

where:

  • Bit 15 (m) is reserved for memory management.

  • Bits 14, 13, 12, 11 and 10 (11100) identify builtin function type.

  • 10 i bits represent builtin function index.

Builtin syntax#

Builtin syntaxes are referenced by syntax’s index and encoded with single word:

address

data

n

m 1 1 1 0 1 i i i i i i i i i i

where:

  • Bit 15 (m) is reserved for memory management.

  • Bits 14, 13, 12, 11 and 10 (11101) identify builtin syntax type.

  • 10 i bits represent builtin syntax index.

Function#

Functions are defined by parent context, list of argument names and function body. Type identifier together with 14bit addressees of associated values are encoded within 4 words:

address

data

n

m 1 1 1 1 0 x x x x x x x x x x

n + 1

m x c c c c c c c c c c c c c c

n + 2

m x a a a a a a a a a a a a a a

n + 3

m x b b b b b b b b b b b b b b

where:

  • Bit 15 of each word (m) is reserved for memory management.

  • Bits 14, 13, 12, 11 and 10 of first word (11110) identify function type.

  • 14 c bits represent parent context address.

  • 14 a bits represent argument name list address.

  • 14 b bits represent body definition address.

  • x bits are not used.

Syntax#

Syntaxes are defined by parent context, list of argument names and syntax body. Type identifier together with 14bit addressees of associated values are encoded within 4 words:

address

data

n

m 1 1 1 1 0 x x x x x x x x x x

n + 1

m x c c c c c c c c c c c c c c

n + 2

m x a a a a a a a a a a a a a a

n + 3

m x b b b b b b b b b b b b b b

where:

  • Bit 15 of each word (m) is reserved for memory management.

  • Bits 14, 13, 12, 11 and 10 of first word (11110) identify syntax type.

  • 14 c bits represent parent context address.

  • 14 a bits represent argument name list address.

  • 14 b bits represent body definition address.

  • x bits are not used.

Source code#

cell.h#

#ifndef LISP16_CELL_H
#define LISP16_CELL_H

#include "arch.h"


typedef lsp_uint16_t lsp_cell_t;
typedef lsp_uint16_t lsp_addr_t; // 14 least significant bits used


lsp_uint16_t lsp_cell_get_size(lsp_cell_t *c);
lsp_uint16_t lsp_cell_get_number_size(lsp_int32_t value);
lsp_uint16_t lsp_cell_get_string_symbol_size(lsp_uint16_t len);


static inline lsp_bool_t lsp_cell_is_number(lsp_cell_t *c) {
    return (*c & 0x4000) == 0x0000;
}

static inline lsp_bool_t lsp_cell_is_pair(lsp_cell_t *c) {
    return (*c & 0x6000) == 0x4000;
}

static inline lsp_bool_t lsp_cell_is_string(lsp_cell_t *c) {
    return (*c & 0x7800) == 0x6000;
}

static inline lsp_bool_t lsp_cell_is_symbol(lsp_cell_t *c) {
    return (*c & 0x7800) == 0x6800;
}

static inline lsp_bool_t lsp_cell_is_builtin_function(lsp_cell_t *c) {
    return (*c & 0x7c00) == 0x7000;
}

static inline lsp_bool_t lsp_cell_is_builtin_syntax(lsp_cell_t *c) {
    return (*c & 0x7c00) == 0x7400;
}

static inline lsp_bool_t lsp_cell_is_function(lsp_cell_t *c) {
    return (*c & 0x7c00) == 0x7800;
}

static inline lsp_bool_t lsp_cell_is_syntax(lsp_cell_t *c) {
    return (*c & 0x7c00) == 0x7c00;
}

static inline lsp_bool_t lsp_cell_is_string_or_symbol(lsp_cell_t *c) {
    return (*c & 0x7000) == 0x6000;
}

static inline lsp_bool_t lsp_cell_is_builtin(lsp_cell_t *c) {
    return (*c & 0x7800) == 0x7000;
}

static inline lsp_bool_t lsp_cell_is_function_or_syntax(lsp_cell_t *c) {
    return (*c & 0x7800) == 0x7800;
}


static inline void lsp_cell_set_number(lsp_cell_t *c, lsp_int32_t value) {
    lsp_uint8_t size = lsp_cell_get_number_size(value);
    for (lsp_uint8_t i = 0; i < size; ++i) {
        lsp_uint8_t shift = (size - i - 1) * 14;
        c[i] = (value >> shift) & (i ? 0x3fff : 0x1fff);
        if (i != size - 1)
            c[i] |= (i ? 0x4000 : 0x2000);
    }
}

static inline void lsp_cell_set_pair(lsp_cell_t *c, lsp_addr_t first,
                                     lsp_addr_t second) {
    c[0] = 0x4000 | ((first >> 1) & 0x1fff);
    c[1] = ((first & 1) ? 0x4000 : 0) | (second & 0x3fff);
}

static inline void lsp_cell_set_string(lsp_cell_t *c, lsp_uint16_t data_len) {
    lsp_uint16_t size = lsp_cell_get_string_symbol_size(data_len);
    c[0] = 0x6000 | (data_len & 0x07ff);
    for (lsp_uint16_t i = 1; i < size; ++i)
        c[i] = 0;
}

static inline void lsp_cell_set_string_data(lsp_cell_t *c, lsp_uint16_t i,
                                            lsp_uint8_t data_i) {
    lsp_uint32_t bit_count = (lsp_uint32_t)i << 3;
    lsp_uint16_t start_cell = bit_count / 15;
    lsp_uint8_t bit_shift = bit_count % 15;
    lsp_uint16_t mask = 0x7f80 >> bit_shift;
    if (bit_shift < 8) {
        c[1 + start_cell] = (c[1 + start_cell] & ~mask) |
                            ((lsp_uint16_t)data_i << (7 - bit_shift));
    } else {
        c[1 + start_cell] =
            (c[1 + start_cell] & ~mask) | (data_i >> (bit_shift - 7));
        bit_shift = 22 - bit_shift;
        mask = (0xffff << bit_shift) & 0x7fff;
        c[2 + start_cell] = (c[2 + start_cell] & ~mask) |
                            (((lsp_uint16_t)data_i << bit_shift) & mask);
    }
}

static inline void lsp_cell_set_symbol(lsp_cell_t *c, lsp_uint16_t name_len) {
    lsp_uint16_t size = lsp_cell_get_string_symbol_size(name_len);
    c[0] = 0x6800 | (name_len & 0x07ff);
    for (lsp_uint16_t i = 1; i < size; ++i)
        c[i] = 0;
}

static inline void lsp_cell_set_symbol_name(lsp_cell_t *c, lsp_uint16_t i,
                                            lsp_uint8_t name_i) {
    lsp_cell_set_string_data(c, i, name_i);
}

static inline void lsp_cell_set_builtin_function(lsp_cell_t *c,
                                                 lsp_uint16_t index) {
    *c = 0x7000 | (index & 0x03ff);
}

static inline void lsp_cell_set_builtin_syntax(lsp_cell_t *c,
                                               lsp_uint16_t index) {
    *c = 0x7400 | (index & 0x03ff);
}

static inline void lsp_cell_set_function(lsp_cell_t *c, lsp_addr_t parent_ctx,
                                         lsp_addr_t args, lsp_addr_t body) {
    c[0] = 0x7800;
    c[1] = parent_ctx & 0x3fff;
    c[2] = args & 0x3fff;
    c[3] = body & 0x3fff;
}

static inline void lsp_cell_set_syntax(lsp_cell_t *c, lsp_addr_t parent_ctx,
                                       lsp_addr_t args, lsp_addr_t body) {
    c[0] = 0x7c00;
    c[1] = parent_ctx & 0x3fff;
    c[2] = args & 0x3fff;
    c[3] = body & 0x3fff;
}


static inline lsp_int32_t lsp_cell_get_number(lsp_cell_t *c) {
    lsp_int32_t v = ((c[0] & 0x1000) ? -1 : 0);
    v = (v << 12) | (c[0] & 0x0fff);
    if (!(c[0] & 0x2000))
        return v;
    for (lsp_uint8_t i = 1; 1; ++i) {
        v = (v << 14) | (c[1] & 0x3fff);
        if (!(c[i] & 0x4000))
            return v;
    }
}

static inline lsp_addr_t lsp_cell_get_pair_first(lsp_cell_t *c) {
    return ((c[0] & 0x1fff) << 1) + ((c[1] & 0x4000) ? 1 : 0);
}

static inline lsp_addr_t lsp_cell_get_pair_second(lsp_cell_t *c) {
    return c[1] & 0x3fff;
}

static inline lsp_uint16_t lsp_cell_get_string_len(lsp_cell_t *c) {
    return *c & 0x07ff;
}

static inline lsp_uint8_t lsp_cell_get_string_data(lsp_cell_t *c,
                                                   lsp_uint16_t i) {
    lsp_uint32_t bit_count = (lsp_uint32_t)i << 3;
    lsp_uint16_t start_cell = bit_count / 15;
    lsp_uint8_t start_bit = 14 - (bit_count % 15);
    if (start_bit >= 7)
        return (c[1 + start_cell] >> (start_bit - 7)) & 0xff;
    return ((c[1 + start_cell] << (7 - start_bit)) |
            ((c[2 + start_cell] & 0x7fff) >> (8 + start_bit))) &
           0xff;
}

static inline lsp_uint16_t lsp_cell_get_symbol_len(lsp_cell_t *c) {
    return lsp_cell_get_string_len(c);
}

static inline lsp_uint8_t lsp_cell_get_symbol_name(lsp_cell_t *c,
                                                   lsp_uint16_t i) {
    return lsp_cell_get_string_data(c, i);
}

static inline lsp_uint16_t lsp_cell_get_builtin_index(lsp_cell_t *c) {
    return *c & 0x03ff;
}

static inline lsp_addr_t lsp_cell_get_function_parent_ctx(lsp_cell_t *c) {
    return c[1] & 0x3fff;
}

static inline lsp_addr_t lsp_cell_get_function_args(lsp_cell_t *c) {
    return c[2] & 0x3fff;
}

static inline lsp_addr_t lsp_cell_get_function_body(lsp_cell_t *c) {
    return c[3] & 0x3fff;
}

static inline lsp_addr_t lsp_cell_get_syntax_parent_ctx(lsp_cell_t *c) {
    return lsp_cell_get_function_parent_ctx(c);
}

static inline lsp_addr_t lsp_cell_get_syntax_args(lsp_cell_t *c) {
    return lsp_cell_get_function_args(c);
}

static inline lsp_addr_t lsp_cell_get_syntax_body(lsp_cell_t *c) {
    return lsp_cell_get_function_body(c);
}

#endif

cell.c#

#include "cell.h"


lsp_uint16_t lsp_cell_get_size(lsp_cell_t *c) {
    if (lsp_cell_is_number(c)) {
        if (!(c[0] & 0x2000))
            return 1;

        for (lsp_uint8_t i = 1; true; ++i) {
            if (!(c[i] & 0x4000))
                return i + 1;
        }

    } else if (lsp_cell_is_pair(c)) {
        return 2;

    } else if (lsp_cell_is_string_or_symbol(c)) {
        return lsp_cell_get_string_symbol_size(*c & 0x07ff);

    } else if (lsp_cell_is_builtin(c)) {
        return 1;

    } else if (lsp_cell_is_function_or_syntax(c)) {
        return 4;
    }

    return 0;
}


lsp_uint16_t lsp_cell_get_number_size(lsp_int32_t value) {
    lsp_bool_t msb = ((value & 0x1000) ? true : false);
    value >>= 13;
    if (value == 0 || value == -1)
        return 1 + ((msb && !value) ? 1 : 0);

    for (lsp_uint8_t i = 0; 1; ++i) {
        msb = ((value & 0x4000) ? 1 : 0);
        value >>= 15;
        if (value == 0 || value == -1)
            return i + 2 + ((msb && !value) ? 1 : 0);
    }

    return 0;
}


lsp_uint16_t lsp_cell_get_string_symbol_size(lsp_uint16_t len) {
    return (((lsp_uint32_t)len << 3) / 15) + ((len % 15) ? 1 : 0) + 1;
}