Lesson 1.16: Maps, Semantic Lists, and Sets

The purpose of this lesson is to explain how to use the data structure sorts provided by K: maps, lists, and sets.

Maps

The most frequently used type of data structure in K is the map. The sort provided by K for this purpose is the Map sort, and it is provided in domains.md in the MAP module. This type is not (currently) polymorphic. All Map terms are maps that map terms of sort KItem to other terms of sort KItem. A KItem can contain any sort except a K sequence. If you need to store such a term in a map, you can always use a wrapper such as syntax KItem ::= kseq(K).

A Map pattern consists of zero or more map elements (as represented by the symbol syntax Map ::= KItem "|->" KItem), mixed in any order, separated by whitespace, with zero or one variables of sort Map. The empty map is represented by .Map. If all of the bindings for the variables in the keys of the map can be deterministically chosen, these patterns can be matched in O(1) time. If they cannot, then each map element that cannot be deterministically constructed contributes a single dimension of polynomial time to the cost of the matching. In other words, a single such element is linear, two are quadratic, three are cubic, etc.

Patterns like the above are the only type of Map pattern that can appear on the left-hand-side of a rule. In other words, you are not allowed to write a Map pattern on the left-hand-side with more than one variable of sort Map in it. You are, however, allowed to write such patterns on the right-hand-side of a rule. You can also write a function pattern in the key of a map element so long as all the variables in the function pattern can be deterministically chosen.

Note the meaning of matching on a Map pattern: a map pattern with no variables of sort Map will match if the map being matched has exactly as many bindings as |-> symbols in the pattern. It will then match if each binding in the map pattern matches exactly one distinct binding in the map being matched. A map pattern with one Map variable will also match any map that contains such a map as a subset. The variable of sort Map will be bound to whatever bindings are left over (.Map if there are no bindings left over).

Here is an example of a simple definition that implements a very basic variable declaration semantics using a Map to store the value of variables (lesson-16-a.k):

k
module LESSON-16-A-SYNTAX imports INT-SYNTAX imports ID-SYNTAX syntax Exp ::= Id | Int syntax Decl ::= "int" Id "=" Exp ";" [strict(2)] syntax Pgm ::= List{Decl,""} endmodule module LESSON-16-A imports LESSON-16-A-SYNTAX imports BOOL configuration <T> <k> $PGM:Pgm </k> <state> .Map </state> </T> // declaration sequence rule <k> D:Decl P:Pgm => D ~> P ...</k> rule <k> .Pgm => . ...</k> // variable declaration rule <k> int X:Id = I:Int ; => . ...</k> <state> STATE => STATE [ X <- I ] </state> // variable lookup rule <k> X:Id => I ...</k> <state>... X |-> I ...</state> syntax Bool ::= isKResult(K) [symbol, function] rule isKResult(_:Int) => true rule isKResult(_) => false [owise] endmodule

There are several new features in this definition. First, note we import the module ID-SYNTAX. This module is defined in domains.md and provides a basic syntax for identifiers. We are using the Id sort provided by this module in this definition to implement the names of program variables. This syntax is only imported when parsing programs, not when parsing rules. Later in this lesson we will see how to reference specific concrete identifiers in a rule.

Second, we introduce a single new function over the Map sort. This function, which is represented by the symbol syntax Map ::= Map "[" KItem "<-" KItem "]", represents the map update operation. Other functions over the Map sort can be found in domains.md.

Finally, we have used the ... syntax on a cell containing a Map. In this case, the meaning of <state>... Pattern ...</state>, <state>... Pattern </state>, and <state> Pattern ...</state> are the same: it is equivalent to writing <state> (Pattern) _:Map </state>.

Consider the following program (a.decl):

int x = 0;
int y = 1;
int a = x;

If we run this program with krun, we will get the following result:

<T>
  <k>
    .
  </k>
  <state>
    a |-> 0
    x |-> 0
    y |-> 1
  </state>
</T>

Note that krun has automatically sorted the collection for you. This doesn't happen at runtime, so you still get the performance of a hash map, but it will help make the output more readable.

Exercise

Create a sort Stmt that is a subsort of Decl. Create a production of sort Stmt for variable assignment in addition to the variable declaration production. Feel free to use the syntax syntax Stmt ::= Id "=" Exp ";". Write a rule that implements variable assignment using a map update function. Then write the same rule using a map pattern. Test your implementations with some programs to ensure they behave as expected.

Semantic Lists

In a previous lesson, we explained how to represent lists in the AST of a program. However, this is not the only context where lists can be used. We also frequently use lists in the configuration of an interpreter in order to represent certain types of program state. For this purpose, it is generally useful to have an associative-list sort, rather than the cons-list sorts provided in Lesson 1.12.

The type provided by K for this purpose is the List sort, and it is also provided in domains.md, in the LIST module. This type is also not (currently) polymorphic. Like Map, all List terms are lists of terms of the KItem sort.

A List pattern in K consists of zero or more list elements (as represented by the ListItem symbol), followed by zero or one variables of sort List, followed by zero or more list elements. An empty list is represented by .List. These patterns can be matched in O(log(N)) time. This is the only type of List pattern that can appear on the left-hand-side of a rule. In other words, you are not allowed to write a List pattern on the left-hand-side with more than one variable of sort List in it. You are, however, allowed to write such patterns on the right-hand-side of a rule.

Note the meaning of matching on a List pattern: a list pattern with no variables of sort List will match if the list being matched has exactly as many elements as ListItem symbols in the pattern. It will then match if each element in sequence matches the pattern contained in the ListItem symbol. A list pattern with one variable of sort List operates the same way, except that it can match any list with at least as many elements as ListItem symbols, so long as the prefix and suffix of the list match the patterns inside the ListItem symbols. The variable of sort List will be bound to whatever elements are left over (.List if there are no elements left over).

The ... syntax is allowed on cells containing lists as well. In this case, the meaning of <cell>... Pattern </cell> is the same as <cell> _:List (Pattern) </cell>, the meaning of <cell> Pattern ...</cell> is the same as <cell> (Pattern) _:List</cell>. Because list patterns with multiple variables of sort List are not allowed, it is an error to write <cell>... Pattern ...</cell>.

Here is an example of a simple definition that implements a very basic function-call semantics using a List as a function stack (lesson-16-b.k):

k
module LESSON-16-B-SYNTAX imports INT-SYNTAX imports ID-SYNTAX syntax Exp ::= Id "(" ")" | Int syntax Stmt ::= "return" Exp ";" [strict] syntax Decl ::= "fun" Id "(" ")" "{" Stmt "}" syntax Pgm ::= List{Decl,""} syntax Id ::= "main" [token] endmodule module LESSON-16-B imports LESSON-16-B-SYNTAX imports BOOL imports LIST configuration <T> <k> $PGM:Pgm ~> main () </k> <functions> .Map </functions> <fstack> .List </fstack> </T> // declaration sequence rule <k> D:Decl P:Pgm => D ~> P ...</k> rule <k> .Pgm => . ...</k> // function definitions rule <k> fun X:Id () { S } => . ...</k> <functions>... .Map => X |-> S ...</functions> // function call syntax KItem ::= stackFrame(K) rule <k> X:Id () ~> K => S </k> <functions>... X |-> S ...</functions> <fstack> .List => ListItem(stackFrame(K)) ...</fstack> // return statement rule <k> return I:Int ; ~> _ => I ~> K </k> <fstack> ListItem(stackFrame(K)) => .List ...</fstack> syntax Bool ::= isKResult(K) [function, symbol] rule isKResult(_:Int) => true rule isKResult(_) => false [owise] endmodule

Notice that we have declared the production syntax Id ::= "main" [token]. Since we use the ID-SYNTAX module, this declaration is necessary in order to be able to refer to the main identifier directly in the configuration declaration. Our <k> cell now contains a K sequence initially: first we process all the declarations in the program, then we call the main function.

Consider the following program (foo.func):

fun foo() { return 5; }
fun main() { return foo(); }

When we krun this program, we should get the following output:

<T>
  <k>
    5 ~> .
  </k>
  <functions>
    foo |-> return 5 ;
    main |-> return foo ( ) ;
  </functions>
  <fstack>
    .List
  </fstack>
</T>

Note that we have successfully put on the <k> cell the value returned by the main function.

Exercise

Add a term of sort Id to the stackFrame operator to keep track of the name of the function in that stack frame. Then write a function syntax String ::= printStackTrace(List) that takes the contents of the <fstack> cell and pretty prints the current stack trace. You can concatenate strings with +String in the STRING module in domains.md, and you can convert an Id to a String with the Id2String function in the ID module. Test this function by creating a new expression that returns the current stack trace as a string. Make sure to update isKResult and the Exp sort as appropriate to allow strings as values.

Sets

The final primary data structure sort in K is a set, i.e., an idempotent unordered collection where elements are deduplicated. The sort provided by K for this purpose is the Set sort and it is provided in domains.md in the SET module. Like maps and lists, this type is not (currently) polymorphic. Like Map and List, all Set terms are sets of terms of the KItem sort.

A Set pattern has the exact same restrictions as a Map pattern, except that its elements are treated like keys, and there are no values. It has the same performance characteristics as well. However, syntactically it is more similar to the List sort: An empty Set is represented by .Set, but a set element is represented by the SetItem symbol.

Matching behaves similarly to the Map sort: a set pattern with no variables of sort Set will match if the set has exactly as many bindings as SetItem symbols, and if each element pattern matches one distinct element in the set. A set with a variable of sort Set also matches any superset of such a set. As with map, the elements left over will be bound to the Set variable (or .Set if no elements are left over).

Like Map, the ... syntax on a set is syntactic sugar for an anonymous variable of sort Set.

Here is an example of a simple modification to LESSON-16-A which uses a Set to ensure that variables are never declared more than once. In practice, you would likely just use the in_keys symbol over maps to test for this, but it's still useful as an example of sets in practice:

k
module LESSON-16-C-SYNTAX imports LESSON-16-A-SYNTAX endmodule module LESSON-16-C imports LESSON-16-C-SYNTAX imports BOOL imports SET configuration <T> <k> $PGM:Pgm </k> <state> .Map </state> <declared> .Set </declared> </T> // declaration sequence rule <k> D:Decl P:Pgm => D ~> P ...</k> rule <k> .Pgm => . ...</k> // variable declaration rule <k> int X:Id = I:Int ; => . ...</k> <state> STATE => STATE [ X <- I ] </state> <declared> D => D SetItem(X) </declared> requires notBool X in D // variable lookup rule <k> X:Id => I ...</k> <state>... X |-> I ...</state> <declared>... SetItem(X) ...</declared> syntax Bool ::= isKResult(K) [symbol, function] rule isKResult(_:Int) => true rule isKResult(_) => false [owise] endmodule

Now if we krun a program containing duplicate declarations, it will get stuck on the declaration.

Exercises

  1. Modify your solution to Lesson 1.14, Exercise 2 and introduce the sorts Decls, Decl, and Stmt which include variable and function declaration (without function parameters), and return and assignment statements, as well as call expressions. Use List and Map to implement these operators, making sure to consider the interactions between components, such as saving and restoring the environment of variables at each call site. Don't worry about local function definitions or global variables for now. Make sure to test the resulting interpreter.

Next lesson

Once you have completed the above exercises, you can continue to Lesson 1.17: Cell Multiplicity and Cell Collections.