-
Notifications
You must be signed in to change notification settings - Fork 89
Nemerle for OOP Programmers Week 2
This week we will learn some basics of functional programming. You're not assumed to know anything about this topic, as we will go into much detail here. There is a lot of new material this week -- this is where we leave the safe and known OO land.
The basic idea of Functional Programming (FP) is to consider the entire program as an expression to be evaluated, and not to think about computation as being performed step by step and updating memory. So the functional program in general does not update this and that memory location, but rather creates new objects (this is why virtually all functional languages rely on the garbage collector.) This is how functions work in math -- they take values and return values, but don't change anything in particular.
There are pure functional languages, which prohibit any change to existing objects (in other words both fields of objects and variables are always immutable.) They are also very often lazy, which means parameters passed to a function are not evaluated until the function actually needs them (which may seem like an optimization to you, unless you've ever tried Haskell :-).
Nemerle is neither pure nor lazy, but it does support the key FP concept of immutability.
The second key concept is the ability to use functions as data, that is, to pass functions as parameters and store them in data structures. This feature is known as first class functions. Nemerle provides this.
There are other related concepts -- for example type inference and parametric polymorphism (generics) -- which are very often used in functional languages. Variant data structures and pattern matching are also notable concepts. We will find out more about these later, but for now we will focus on core features of FP.
To recap, the basic ideas behind FP in Nemerle are that:
- You can make fields of persistent data structures immutable (this is even the default, last week you saw that to make a field updatable, you need to use the
mutable
modifier). - You can define your temporary local variables to be immutable (
def
is shorter thanmutable
isn't it? :-) - You can use functions as values, as well as other FP goodies.
In addition to defining methods in classes or modules, one can define some small helper functions inside other functions. For example:
class LocalFunctionExample {
public static Main () : void
{
def say_hello () : void {
System.Console.WriteLine ("Hello!");
}
say_hello ();
say_hello ();
}
}
It shouldn't come as a big surprise to see "Hello!" written twice by this
program. The say_hello
local function is defined inside Main
,
and can only be referenced within Main
(it's local to it, right?).
You can also skip Main
altogether, as we've seen earlier:
def say_hello () : void {
System.Console.WriteLine ("Hello!");
}
say_hello ();
say_hello ();
Moreover, because say_hello
is local, type inference is
supported, so we can skip the void
type annotation:
def say_hello () {
System.Console.WriteLine ("Hello!");
}
say_hello ();
say_hello ();
Once you have defined a local function, you can pass it to another
function as a parameter. For example we can have a special function
that runs the passed function f
twice:
def run_twice (f) {
f ();
f ();
}
We didn't specify a type for f
, because type inference takes
care of this. Nemerle sees that f
is used as a function in
the body of run_twice
.
Now, we can define a function say_hello
, and pass it to run_twice
:
def say_hello () {
System.Console.WriteLine ("Hello!");
}
run_twice (say_hello);
Local functions can see all the identifiers defined in their parent function. For example:
def hello = "Hello!";
def say_hello () {
System.Console.WriteLine (hello);
}
say_hello ();
say_hello ();
When a local function references an identifier, it sees the nearest one
defined before it (in program text). This property is called lexical scoping.
So here, when say_hello
refers to hello
, it gets the
closest previous definition: "Hello!". Note that other hello
identifiers may be defined in other contexts and used without conflict.
The rules of lexical scoping keep them separate.
Lists are a very important data structure in FP, and are used even more often than arrays in imperative programming.
Lists are constructed out of list cells. Each cell has a head (the element held in the cell) and tail -- a pointer to another cell. There is a special cell called nil which serves as an empty list. List cells cannot be changed once they are constructed. On the one hand, it means you cannot update the cell number or order of a list once it is constructed, but on the other hand mischief can't be done to a list that you pass to others.
While lists cannot be changed once constructed, they can share the tail. For example:
42 | V 10-->34-->27-->nil
Here the lists [10, 34, 27]
and [42, 34, 27]
share
the list [34, 27]
. Because lists are immutable this sharing
cannot do any harm. All Nemerle lists share the empty list.
The list constructor is ::
, while nil is referred to as
[]
. Therefore 1 :: []
means we want a list cell
with 1 in the head and the empty list in the tail, which is another
way of saying we want a single element list with 1. After this object
is constructed we can make a two element list:
def l1 = 1 :: [];
def l2 = 42 :: l1;
l1
to l2
to make a longer list.
Of course, you can define this list in one expression: 42 :: 1 :: []
.
If you don't like all those colons, there is shorter notation:
[42, 1]
. This short form is convenient to use in
pattern matching.
The basic idea behind a matching like this:
match (some_value) {
| pattern1 => expr1
| pattern2 => expr2
...
}
is to inspect some_value
, check if it looks like
pattern1
, and if so, evaluate expr1
return it as
the result. Patterns are checked until a match is found, and the
corresponding expression is returned. If all patterns fail to match
(which is uncommon, because patterns should be exhaustive, that is,
cover all possible values), an exception is thrown.
There are several kinds of patterns, we shall briefly introduce some of them here.
Wildcards are written as _
and match any value, including
the null
pointer (don't confuse this with the nil
list cell). It's like the default:
case in switch
in C-like languages.
Each literal (like 42
, 3.1415
, "foobar"
,
and null
) is a valid pattern. This allows one to implement
switch
functionality with match
:
match (some_number) {
| 1 => "one"
| 2 => "two"
| 3 => "three"
| _ => "a lot"
}
When matching list patterns, both the ::
constructor and
the [1, 2]
syntax are available:
match (some_list) {
| [42, 42] => "two forty-two"
| [42, _] => "forty-two on first position of two-element list"
| [_, 42] => "forty-two on second position of two-element list"
| 42 :: _ => "forty-two on first position"
| _ :: 42 :: _ => "forty-two on second position"
| [] => "an empty list!"
| _ => "another list"
}
Note that the ordering of pattern matching clauses matters, because it is possible they can overlap (here the first case overlaps the second and third, while the second overlaps the fourth, and so on). In the case of multiple possible matches, the first match wins.
The variable pattern matches any value, just like the wildcard pattern,
but additionally binds the value matched to a specified identifier.
The identifiers used have to start with a lowercase letter (in order
to avoid conflicts with variant options).
They are always immutable (there is no mutable
modifier
allowed here).
For example a function to display all elements of a list would be:
using System.Console;
def display (l) {
match (l) {
| head :: tail =>
Write ($ "$head, ");
display (tail)
| [] =>
WriteLine ("end")
}
}
display ([1, 2, 3])
// Output: 1, 2, 3, end
The display
function takes a list and matches it against
the list head :: tail
. If the match list is not empty, the
element of the current cell is bound to head
and the rest
of the list is bound to tail
. The function prints the value
of head
and recursively calls itself to print out the
rest of the list.
When the function hits the end of the list, nil []
is
matched, end
is written, and the function terminates.
We now know the basic tools for list manipulation -- list construction and matching. Therefore we can proceed with some example functions.
filter
is a function that is given a list and a functional
value as a parameter. It will return the list of elements from the
original list for which the functional value returned true.
For example:
def is_even (x) { x % 2 == 0 }
System.Console.WriteLine (filter ([1, 2, 3, 4], is_even));
should print out [2, 4]
.
In Nemerle, all basic data structures have the ToString
method
overridden to return contents of the data structure in a human readable
form, therefore WriteLine does a good job displaying the list.
So we're back to the filter
code:
def filter (l, f) {
match (l) {
| x :: xs =>
def xs' = filter (xs, f);
if (f (x)) x :: xs'
else xs'
| [] => []
}
}
First, the function does a match on the list argument. For the non-
empty list case the head is bound to x
and the tail to xs
.
It is a common convention to call the head x
or y
and
the rest of the list xs
or ys
.
Once the head and tail are bound the function calls itself to filter
the rest of the list. The result is put in value xs'
.
In Nemerle the apostrophe ('
) character is a valid part
of identifier names. A transform of x
is often called x'
(say x-prime); this convention comes from mathematics.
Finally, depending on the return value of f (x)
, the function
adds element x
to the result, or skips it. If you're looking
for a return statement here, you won't find it! As you will recall from
Week 0, Nemerle functions simply return the the last value computed,
which in this example is either x :: xs'
or xs'
.
Of course when the list is empty, filtering it yields an empty list.
While testing the filter example, you might want to use a shorthand that allows you to filter on any arbitrary expression on the fly. To that end, it is possible to rewrite the even filter as:
System.Console.WriteLine (filter ([1, 2, 3, 4], fun (x) { x % 2 == 0 }));
If we use a local function just once, it is possible to make it anonymous. The expression:
fun (some_parms) { some_expression }
is equivalent to:
{
def tmp (some_parms) { some_expression }
tmp
}
that is, defining a local function and returning it right away. Anonymous functions allow us to skip the superflous naming of truly one time, 'throw-away' functions.
The first_n
function returns first N elements of the list passed.
def first_n (l, n) {
if (n == 0)
[]
else
match (l) {
| x :: xs => x :: first_n (xs, n - 1)
| [] => throw System.ArgumentException ("List too short")
}
}
System.Console.WriteLine (first_n (["foo", "bar", "baz"], 2))
// Output: ["foo", "bar"]
Note how both arguments to first_n
change as the function
executes: it recursively calls itself for the rest of the list with
the counter decremented. It is also interesting to see how the
function return value includes a call to itself.
Another new thing here is the throw
expression. It raises
an exception passed as an argument. System.ArgumentException
is a class from BCL. Handling exceptions will be discussed later
(next week probably).
We have used local functions which rely on type inference throughout this lesson. However type inference is not available for global functions; therefore it is good to know the exact way things get typed in Nemerle.
The list type is parametric (generic), so a list of integers has
type list [int]
and a list of strings has type list
[string]
. Note how square brackets ([]
) are used for
specifying the argument to the type. This is unlike C++, Java and C#,
where the triangle brackets (<>
) are used.
There are other generic types provided by the BCL and standard Nemerle
library. For example Nemerle.Collections.Hashtable [K, V]
is a mutable collection mapping values of type K
to values
of type V
(it inherits from System.Collections.Generic.Dictionary
[K, V]
).
Another important example of parametric types are arrays. Array is a
list-like type, so we felt there is no reason to make a special case of
it. The syntax for array types in Nemerle is array [''type'']
,
where type is the type of the elements in the array.
The type of a function taking a string
and returning an
int
, like the int.Parse
method, is: string ->
int
. The ->
symbol is called the arrow, so you
can read the previous type as string arrow int. When a function
returns nothing, we use the void
type. For example the
System.Threading.Thread.Sleep
method, which sleeps for
n
milliseconds has the type int -> void
. Similarly,
when a function takes no arguments, we use the void
type
before the arrow. So the type of Main
is void -> void
.
We will talk about types of multi-argument functions next week.
We now know how to write basic function types, so we can now write
a fully typed version of filter
:
def filter (l : list [int], f : int -> bool) : list [int] {
match (l) {
| x :: xs =>
def xs' = filter (xs, f);
if (f (x)) x :: xs'
else xs'
| [] => []
}
}
The body is exactly the same as previously, but we have provided explicit types for the arguments.
OK, the question here is: why type the function as int
?
Because we intend to use it with a list of integers. Fine, you might
respond, but why limit our function to just integers? As you correctly
observe, the structure of the filter
function would work
equally well for, say, a list of strings or a list of FooBars, if only
there was a way to type it less restrictively. This leads us naturally
to the idea of generic functions.
What we would really like for our filter
function is a
generic type, one that would allow the function to work on all
kinds of data. Unfortunately, because of subtyping and related
theoretically difficult problems, the type inferencer in Nemerle
cannot infer generic types for you. No worries, you can still
have generics, you just have to specify them yourself, using a
generic type declaration for your function:
def funname[''gentype0''] (parmlist) : funtype {body}
''gentype0''
, to be
used in the function. Think of ''gentype0''
as a placeholder
for any type. Substitute ''gentype0''
any place you would use
a single specific type in a function you want to make generic.
With this knowledge, let's make a generic filter
:
using System.Console;
def filter[T] (l : list [T], f : T -> bool) : list [T] {
match (l) {
| x :: xs =>
def xs' = filter (xs, f);
if (f (x)) x :: xs'
else xs'
| [] => []
}
}
WriteLine (filter ([1, 2, 3, 4], fun (x) { x % 2 == 0 }));
WriteLine (filter (["foo", "bar", "foox"], fun (x) { x.StartsWith ("foo") }));
/* Output:
[2, 4]
["foo", "foox"]
*/
The new, generic definition of filter
can be read: for any type
T
, the first parameter has the type list [T]
, the
second T -> bool
and the result type is list [T]
. This
allows a list of integers to be filtered as easily as a list of strings.
However, the function type passed has to match the type of the list (try
passing the even lambda expression and a list of strings). This is because
we use the same generic type parameter T
for list l
and
function f
.
Using the same syntax, you can define generic parameter types for these contexts:
class classname[''gentype0''] {body}
public methodname[''gentype0''] (parmlist) : methodtype {body}
-
variant varname[''gentype0''] {body}
(you will see more on variants in Week 3)
class classname[''gentype0'', ''gentype1'', ... , ''gentypeN''] {body}
There is a convention, originating from ML, to call type parameters
'a
, 'b
, 'c
... which reads alpha, beta,
gamma... As Nemerle supports apostrophes in identifiers, you're
free to follow it.
In C++ there is a convention to call generic type parameters T
,
while in C# one mostly uses descriptive names (something like
ElementType
). Nemerle is still new, so perhaps a generic
naming convention for it will emerge, and come into common use.
The standard library uses the ML convention.
In the first few exercises you should write a handful of short list manipulation functions. This is widely recognized as a good introduction to FP and I couldn't come out with anything better.
The functions should throw an exception for invalid data (empty lists, element not
found and so on). System.ArgumentException
will be OK.
Functions should come with one or two example invocations.
-
head[T] (l : list [T]) : T
returning first element ofl
-
tail[T] (l : list [T]) : list [T]
returningl
without the first element -
length[T] (l : list [T]) : int
computing the length ofl
-
max (l : list [int]) : int
returning maximal element of given list of integers -
last[T] (l : list [T]) : T
returning the last element of a list -
last_n[T] (l : list [T], n : int) : list [T]
returning lastn
elements -
nth[T] (l : list [T], n : int) : T
returningn
th element ofl
-
append[T] (l1 : list [T], l2 : list [T]) : list [T]
returning concatenation ofl1
andl2
(please don't use the overloaded+
operator that does exactly that ;-) -
range (l: list[int], from : int, to : int) : list [int]
returning list of all integers betweenfrom
andto
inclusive
-
iter[T] (l : list [T], f : T -> void) : void
which should apply given functionf
to each element ofl
and ignore the results (as there are none) -
map[A, B] (l : list [A], f : A -> B) : list [B]
applying specified functionf
to each element of the listl
and returning lists of results of applications -
forall[T] (l : list [T], p : T -> bool) : bool
returning true if given predicatep
is true for all elements of the listl
ofl
-
find[T] (l : list [T], p : T -> bool) : T
returning first element ofl
for which the conditionp
is true
Write the function: flatten[T] (l : list [list [T]]) : list [T]
concatenating given lists together. For example:
System.Console.WriteLine (flatten ([[1, 2], [3, 4], [5]]))
[1, 2, 3, 4, 5]
.
Take the Star Wars example from the previous week, possibly enriched with your Imperial Trooper and modify it to hold a lists of persons instead of 3 persons.