mud/lib/doc/manual/chapter11

chapter 11 "Complex Data Types"
Intermediate LPC
Descartes of Borg
November 1993

                        Chapter 3: Complex Data Types

3.1 Simple Data Types
In the textbook LPC Basics, you learned about the common, basic LPC
data types: int, string, object, void.  Most important you learned that
many operations and functions behave differently based on the data type
of the variables upon which they are operating.  Some operators and
functions will even give errors if you use them with the wrong data
types.  For example, "a"+"b" is handled much differently than 1+1.
When you ass "a"+"b", you are adding "b" onto the end of "a" to get
"ab".  On the other hand, when you add 1+1, you do not get 11, you get
2 as you would expect.

I refer to these data types as simple data types, because they atomic in
that they cannot be broken down into smaller component data types.
The object data type is a sort of exception, but you really cannot refer
individually to the components which make it up, so I refer to it as a
simple data type.

This chapter introduces the concept of the complex data type, a data type
which is made up of units of simple data types.  LPC has two common
complex data types, both kinds of arrays.  First, there is the traditional
array which stores values in consecutive elements accessed by a number
representing which element they are stored in.  Second is an associative
array called a mapping.  A mapping associates to values together to
allow a more natural access to data.

3.2 The Values NULL and 0
Before getting fully into arrays, there first should be a full understanding
of the concept of NULL versus the concept of 0.  In LPC, a null value is
represented by the integer 0.  Although the integer 0 and NULL are often
freely interchangeable, this interchangeability often leads to some great
confusion when you get into the realm of complex data types.  You may
have even encountered such confusion while using strings.

0 represents a value which for integers means the value you add to
another value yet still retain the value added.  This for any addition
operation on any data type, the ZERO value for that data type is the value
that you can add to any other value and get the original value.  Thus:   A
plus ZERO equals A where A is some value of a given data type and
ZERO is the ZERO value for that data type.  This is not any sort of
official mathematical definition.  There exists one, but I am not a
mathematician, so I have no idea what the term is.  Thus for integers, 0
is the ZERO value since 1 + 0 equals 1.

NULL, on the other hand, is the absence of any value or meaning.  The
LPC driver will interpret NULL as an integer 0 if it can make sense of it
in that context.  In any context besides integer addition, A plus NULL
causes an error.  NULL causes an error because adding valueless fields
in other data types to those data types makes no sense.

Looking at this from another point of view, we can get the ZERO value
for strings by knowing what added to "a" will give us "a" as a result.
The answer is not 0, but instead "".  With integers, interchanging NULL
and 0 was acceptable since 0 represents no value with respect to the
integer data type.  This interchangeability is not true for other data types,
since their ZERO values do not represent no value.  Namely, ""
represents a string of no length and is very different from 0.

When you first declare any variable of any type, it has no value.  Any
data type except integers therefore must be initialized somehow before
you perform any operation on it.  Generally, initialization is done in the
create() function for global variables, or at the top of the local function
for local variables by assigning them some value, often the ZERO value
for that data type.  For example, in the following code I want to build a
string with random words:

string build_nonsense() {
    string str;
    int i;

    str = ""; /* Here str is initialized to the string
ZERO value */
    for(i=0; i<6; i++) {
        switch(random(3)+1) {
            case 1: str += "bing"; break;
            case 2: str += "borg"; break;
            case 3: str += "foo"; break;
        }
        if(i==5) str += ".\n";
        else str += " ";
    }
    return capitalize(str);
}

If we had not initialized the variable str, an error would have resulted
from trying to add a string to a NULL value.  Instead, this code first
initializes str to the ZERO value for strings, "".  After that, it enters a
loop which makes 6 cycles, each time randomly adding one of three
possible words to the string.  For all words except the last, an additional
blank character is added.  For the last word, a period and a return
character are added.  The function then exits the loop, capitalizes the
nonsense string, then exits.

3.3 Arrays in LPC
An array is a powerful complex data type of LPC which allows you to
access multiple values through a single variable.  For instance,
Nightmare has an indefinite number of currencies in which players may
do business.  Only five of those currencies, however, can be considered
hard currencies.  A hard currency for the sake of this example is a
currency which is readily exchangeable for any other hard currency,
whereas a soft currency may only be bought, but not sold.  In the bank,
there is a list of hard currencies to allow bank keepers to know which
currencies are in fact hard currencies.  With simple data types, we would
have to perform the following nasty operation for every exchange
transaction:

int exchange(string str) {
    string from, to;
    int amt;

    if(!str) return 0;
    if(sscanf(str, "%d %s for %s", amt, from, to) != 3)
      return 0;
    if(from != "platinum" && from != "gold" && from !=
      "silver" &&
      from != "electrum" && from != "copper") {
        notify_fail("We do not buy soft currencies!\n");
        return 0;
    }
    ...
}

With five hard currencies, we have a rather simple example.  After all it
took only two lines of code to represent the if statement which filtered
out bad currencies.  But what if you had to check against all the names
which cannot be used to make characters in the game?  There might be
100 of those; would you want to write a 100 part if statement?
What if you wanted to add a currency to the list of hard currencies?  That
means you would have to change every check in the game for hard
currencies to add one more part to the if clauses.  Arrays allow you
simple access to groups of related data so that you do not have to deal
with each individual value every time you want to perform a group
operation.

As a constant, an array might look like this:
    ({ "platinum", "gold", "silver", "electrum", "copper" })
which is an array of type string.  Individual data values in arrays are
called elements, or sometimes members.  In code, just as constant
strings are represented by surrounding them with "", constant arrays are
represented by being surrounded by ({ }), with individual elements of
the array being separated by a ,.

You may have arrays of any LPC data type, simple or complex.  Arrays
made up of mixes of values are called arrays of mixed type.  In most
LPC drivers, you declare an array using a throw-back to C language
syntax for arrays.  This syntax is often confusing for LPC coders
because the syntax has a meaning in C that simply does not translate into
LPC.  Nevertheless, if we wanted an array of type string, we would
declare it in the following manner:

string *arr;

In other words, the data type of the elements it will contain followed by
a space and an asterisk.  Remember, however, that this newly declared
string array has a NULL value in it at the time of declaration.

3.4 Using Arrays
You now should understand how to declare and recognize an array in
code.  In order to understand how they work in code, let's review the
bank code, this time using arrays:

string *hard_currencies;

int exchange(string str) {
    string from, to;
    int amt;

    if(!str) return 0;
    if(sscanf(str, "%d %s for %s", amt, from, to) != 3)
return 0;
    if(member_array(from, hard_currencies) == -1) {
        notify_fail("We do not buy soft currencies!\n");
        return 0;
    }
    ...
}

This code assumes hard_currencies is a global variable and is initialized
in create() as:
    hard_currencies = ({ "platinum", "gold", "electrum", "silver",
   "copper" });
Ideally, you would have hard currencies as a #define in a header file for
all objects to use, but #define is a topic for a later chapter.

Once you know what the member_array() efun does, this method
certainly is much easier to read as well as is much more efficient and
easier to code.  In fact, you can probably guess what the
member_array() efun does:  It tells you if a given value is a member of
the array in question.  Specifically here, we want to know if the currency
the player is trying to sell is an element in the hard_curencies array.
What might be confusing to you is, not only does member_array() tell us
if the value is an element in the array, but it in fact tells us which element
of the array the value is.

How does it tell you which element?  It is easier to understand arrays if
you think of the array variable as holding a number.  In the value above,
for the sake of argument, we will say that hard_currencies holds the
value 179000.  This value tells the driver where to look for the array
hard_currencies represents.  Thus, hard_currencies points to a place
where the array values may be found.  When someone is talking about
the first element of the array, they want the element located at 179000.
When the object needs the value of the second element of the array, it
looks at 179000 + one value, then 179000 plus two values for the third,
and so on.  We can therefore access individual elements of an array by
their index, which is the number of values beyond the starting point of
the array we need to look to find the value.  For the array
hard_currencies array:
"platinum" has an index of 0.
"gold" has an index of 1.
"electrum" has an index of 2.
"silver" has an index of 3.
"copper" has an index of 4.

The efun member_array() thus returns the index of the element being
tested if it is in the array, or -1 if it is not in the array.  In order to
reference an individual element in an array, you use its index number in
the following manner:
array_name[index_no]
Example:
hard_currencies[3]
where hard_currencies[3] would refer to "silver".

So, you now should now several ways in which arrays appear either as
a whole or as individual elements.  As a whole, you refer to an array
variable by its name and an array constant by enclosing the array in ({ })
and separating elements by ,.  Individually, you refer to array variables
by the array name followed by the element's index number enclosed in
[], and to array constants in the same way you would refer to simple data
types of the same type as the constant.  Examples:

Whole arrays:
variable:  arr
constant: ({ "platinum", "gold", "electrum", "silver", "copper" })

Individual members of arrays:
variable: arr[2]
constant: "electrum"

You can use these means of reference to do all the things you are used to
doing with other data types.  You can assign values, use the values in
operations, pass the values as parameters to functions, and use the
values as return types.  It is important to remember that when you are
treating an element alone as an individual, the individual element is not
itself an array (unless you are dealing with an array of arrays).  In the
example above, the individual elements are strings.  So that:
    str = arr[3] + " and " + arr[1];
will create str to equal "silver and gold".  Although this seems simple
enough, many people new to arrays start to run into trouble when trying
to add elements to an array.  When you are treating an array as a whole
and you wish to add a new element to it, you must do it by adding
another array.

Note the following example:
string str1, str2;
string *arr;

str1 = "hi";
str2 = "bye";
/* str1 + str2 equals "hibye" */
arr = ({ str1 }) + ({ str2 });
/* arr is equal to ({ str1, str2 }) */
Before going any further, I have to note that this example gives an
extremely horrible way of building an array.  You should set it: arr = ({
str1, str2 }).  The point of the example, however, is that you must add
like types together.  If you try adding an element to an array as the data
type it is, you will get an error.  Instead you have to treat it as an array of
a single element.

3.5 Mappings
One of the major advances made in LPMuds since they were created is
the mapping data type.  People alternately refer to them as associative
arrays.  Practically speaking, a mapping allows you freedom from the
association of a numerical index to a value which arrays require.
Instead, mappings allow you to associate values with indices which
actually have meaning to you, much like a relational database.

In an array of 5 elements, you access those values solely by their integer
indices which cover the range 0 to 4.  Imagine going back to the example
of money again.  Players have money of different amounts and different
types.  In the player object, you need a way to store the types of money
that exist as well as relate them to the amount of that currency type the
player has.  The best way to do this with arrays would have been to
store an array of strings representing money types and an array of
integers representing values in the player object.  This would result in
CPU-eating ugly code like this:

int query_money(string type) {
    int i;

    i = member_array(type, currencies);
    if(i>-1 && i < sizeof(amounts))  /* sizeof efun
returns # of elements */
        return amounts[i];
    else return 0;
}

And that is a simple query function.  Look at an add function:

void add_money(string type, int amt) {
    string *tmp1;
    int * tmp2;
    int i, x, j, maxj;

    i = member_array(type, currencies);
    if(i >= sizeof(amounts)) /*  corrupt data, we are in
      a bad way */
        return;
    else if(i== -1) {
        currencies += ({ type });
        amounts += ({ amt });
        return;
    }
    else {
        amounts[i] += amt;
        if(amounts[i] < 1) {
            tmp1 = allocate(sizeof(currencies)-1);
            tmp2 = allocate(sizeof(amounts)-1);
            for(j=0, x =0, maxj=sizeof(tmp1); j < maxj;
              j++) {
                if(j==i) x = 1;
                tmp1[j] = currencies[j+x];
                tmp2[j] = amounts[j+x];
            }
            currencies = tmp1;
            amounts = tmp2;
        }
    }
}

That is really some nasty code to perform the rather simple concept of
adding some money.  First, we figure out if the player has any of that
kind of money, and if so, which element of the currencies array it is.
After that, we have to check to see that the integrity of the currency data
has been maintained.  If the index of the type in the currencies array is
greater than the highest index of the amounts array, then we have a
problem since the indices are our only way of relating the two arrays.
Once we know our data is in tact, if the currency type is not currently
held by the player, we simply tack on the type as a new element to the
currencies array and the amount as a new element to the amounts array.
Finally, if it is a currency the player currently has, we just add the
amount to the corresponding index in the amounts array.  If the money
gets below 1, meaning having no money of that type, we want to clear
the currency out of memory.

Subtracting an element from an array is no simple matter.  Take, for
example, the result of the following:

string *arr;

arr = ({ "a", "b", "a" });
arr -= ({ arr[2] });

What do you think the final value of arr is? Well, it is:
    ({ "b", "a" })
Subtracting arr[2] from the original array does not remove the third
element from the array.  Instead, it subtracts the value of the third
element of the array from the array.  And array subtraction removes the
first instance of the value from the array.  Since we do not want to be
forced on counting on the elements of the array as being unique, we are
forced to go through some somersaults to remove the correct element
from both arrays in order to maintain the correspondence of the indices
in the two arrays.

Mappings provide a better way.  They allow you to directly associate the
money type with its value.  Some people think of mappings as arrays
where you are not restricted to integers as indices.  Truth is, mappings
are an entirely different concept in storing aggregate information.  Arrays
force you to choose an index which is meaningful to the machine for
locating the appropriate data.  The indices tell the machine how many
elements beyond the first value the value you desire can be found.  With
mappings, you choose indices which are meaningful to you without
worrying about how that machine locates and stores it.

You may recognize mappings in the following forms:

constant values:
whole: ([ index:value, index:value ]) Ex: ([ "gold":10, "silver":20 ])
element:  10

variable values:
whole:    map   (where map is the name of a mapping variable)
element: map["gold"]

So now my monetary functions would look like:

int query_money(string type) { return money[type]; }

void add_money(string type, int amt) {
    if(!money[type]) money[type] = amt;
    else money[type] += amt;
    if(money[type] < 1)
      map_delete(money, type);          /* this is for
          MudOS */
            ...OR...
            money = m_delete(money, type)  /* for some
          LPMud 3.* varieties */
            ... OR...
         m_delete(money, type);    /* for other LPMud 3.*
          varieties */
}

Please notice first that the efuns for clearing a mapping element from the
mapping vary from driver to driver.  Check with your driver's
documentation for the exact name an syntax of the relevant efun.

As you can see immediately, you do not need to check the integrity of
your data since the values which interest you are inextricably bound to
one another in the mapping.  Secondly, getting rid of useless values is a
simple efun call rather than a tricky, CPU-eating loop.  Finally, the
query function is made up solely of a return instruction.

You must declare and initialize any mapping before using it.
Declarations look like:
mapping map;
Whereas common initializations look like:
map = ([]);
map = allocate_mapping(10)   ...OR...   map = m_allocate(10);
map = ([ "gold": 20, "silver": 15 ]);

As with other data types, there are rules defining how they work in
common operations like addition and subtraction:
    ([ "gold":20, "silver":30 ]) + ([ "electrum":5 ])
gives:
    (["gold":20, "silver":30, "electrum":5])
Although my demonstration shows a continuity of order, there is in fact
no guarantee of the order in which elements of mappings will stored.
Equivalence tests among mappings are therefore not a good thing.

3.6 Summary
Mappings and arrays can be built as complex as you need them to be.
You can have an array of mappings of arrays.  Such a thing would be
declared like this:

mapping *map_of_arrs;
which might look like:
({ ([ ind1: ({ valA1, valA2}), ind2: ({valB1, valB2}) ]), ([ indX:
({valX1,valX2}) ]) })

Mappings may use any data type as an index, including objects.
Mapping indices are often referred to as keys as well, a term from
databases.  Always keep in mind that with any non-integer data type,
you must first initialize a variable before making use of it in common
operations such as addition and subtraction.  In spite of the ease and
dynamics added to LPC coding by mappings and arrays, errors caused
by failing to initialize their values can be the most maddening experience
for people new to these data types.  I would venture that a very high
percentage of all errors people experimenting with mappings and arrays
for the first time encounter are one of three error messages:
	Indexing on illegal type.
	Illegal index.
	Bad argument 1 to (+ += - -=) /* insert your favourite operator */
Error messages 1 and 3 are darn near almost always caused by a failure
to initialize the array or mapping in question.  Error message 2 is caused
generally when you are trying to use an index in an initialized array
which does not exist.  Also, for arrays, often people new to arrays will
get error message 3 because they try to add a single element to an array
by adding the initial array to the single element value instead of adding
an array of the single element to the initial array.  Remember, add only
arrays to arrays.

At this point, you should feel comfortable enough with mappings and
arrays to play with them.  Expect to encounter the above error messages
a lot when first playing with these.  The key to success with mappings is
in debugging all of these errors and seeing exactly what causes wholes
in your programming which allow you to try to work with uninitialized
mappings and arrays.  Finally, go back through the basic room code and
look at things like the SetExits() (or the equivalent on your mudlib)
function.  Chances are it makes use of mappings.  In some instances, it
will use arrays as well for compatibility with mudlib.n.

Copyright (c) George Reese 1993