-
Notifications
You must be signed in to change notification settings - Fork 0
/
infer-intro.tex
134 lines (111 loc) · 6.55 KB
/
infer-intro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
\Dchapter{Introduction}
\label{infer:chapter:intro}
%\input{infer-old-pldi-intro}
Consider the exercise of counting binary tree nodes using JavaScript.
With a class-based tree representation, we naturally add a method
to each kind of node like so.
\begin{lstlisting}[language=JavaScript]
class Node { nodes() { return 1 + this.left.nodes() + this.right.nodes(); } }
class Leaf { nodes() { return 1; } }
new Node(new Leaf(1), new Leaf(2)).nodes(); //=> 3 (constructors implicit)
\end{lstlisting}
An alternative ``ad-hoc'' representation uses plain JavaScript Objects
with explicit tags, which is less extensible but simpler.
Then, the method becomes a recursive function that explicitly takes a tree as input.
\begin{lstlisting}[language=JavaScript]
function nodes(t) { switch t.op {
case "node": return 1 + nodes(t.left) + nodes(t.right);
case "leaf": return 1; } }
nodes({op: "node", left:{op: "leaf", val: 1}, right:{op: "leaf", val: 2}})//=>3
\end{lstlisting}
Now, consider the problem of inferring type annotations for these programs.
The class-based representation is idiomatic to popular dynamic languages
like JavaScript and Python, and so many existing solutions support it.
%~\infercitep{saftoiu2010jstrace,pyannotate,typette18,An10dynamicinference,pytype,kristensen2017inference}
For example, TypeWiz~\infercitep{typewiz} uses dynamic analysis to generate
the following TypeScript annotations from the above example execution of \js{nodes}.
\begin{lstlisting}[language=JavaScript]
class Node { public left: @Leaf@; public right: @Leaf@; ... }
class Leaf { public val: @number@; ... }
\end{lstlisting}
The intuition behind inferring such a type is straightforward.
For example, an instance of \js{Leaf} was observed in \js{Node}'s \js{left} field,
and so the nominal type \js{Leaf} is used for its annotation.
The second ``ad-hoc'' style of programming seems peculiar in JavaScript, Python, and, indeed,
object-oriented style in general.
Correspondingly, existing state-of-the-art automatic annotation tools are not designed
to support them.
There are several ways to trivially handle such cases.
Some enumerate the tree representation ``verbatim'' in a union, like TypeWiz~\infercitep{typewiz}.
\begin{lstlisting}[language=JavaScript]
function nodes(t: {left: {op: string, val: number}, op: string,
right: {op: string, val: number}}
| {op: string, val: number}) ...
\end{lstlisting}
Others ``discard'' most (or all) structure, like Typette~\infercitep{typette18}
and PyType~\infercitep{pytype} for Python.
\begin{lstlisting}[language=Python]
def nodes(t: Dict[(Sequence, object)]) -> int: ... # Typette
def nodes(t) -> int: ... # PyType
\end{lstlisting}
Each annotation is clearly insufficient to meaningfully check both the function definition
and valid usages. To show a desirable annotation for the ``ad-hoc'' program,
we port it to Clojure~\infercitep{Hic08}, where it
enjoys full support from the built-in runtime verification library
clojure.spec and primary optional type system Typed Clojure~\infercitep{bonnaire2016practical}.
\begin{cljlisting}
(defn nodes [t] (case (:op t)
:node (+ 1 (nodes (:left t)) (nodes (:right t)))
:leaf 1))
(nodes {:op :node, :left {:op :leaf, :val 1}, :right {:op :leaf, :val 2}}) ;=>3
\end{cljlisting}
Making this style viable requires a harmony of language features, in particular to
support programming with functions and immutable values, but
none of which comes at the expense of object-orientation. Clojure is hosted on the Java Virtual Machine
and has full interoperability with Java objects and classes---even Clojure's core design embraces
object-orientation by exposing a collection of Java interfaces to create new kinds of data structures.
The \clj{\{k v ...\}} syntax creates a persistent and immutable Hash Array Mapped Trie~\cite{bagwell2001ideal},
which can be efficiently manipulated by dozens of built-in functions.
The leading colon syntax like \clj{:op} creates an interned \emph{keyword}, which are ideal for map keys
for their fast equality checks, and also look themselves up in maps when used as functions
(e.g., \clj{(:op t)} is like JavaScript's \js{t.op}).
\emph{Multimethods} regain the extensibility we lost when abandoning methods, like the following.
\begin{cljlisting}
(defmulti nodes-mm :op)
(defmethod nodes-mm :node [t] (+ 1 (nodes-mm (:left t)) (nodes-mm (:right t))))
(defmethod nodes-mm :leaf [t] 1)
\end{cljlisting}
On the type system side, Typed Clojure supports a variety of heterogeneous types,
in particular for maps, along with occurrence typing \cite{TF10} to follow local control flow.
Many key features come together to represent our ``ad-hoc'' binary tree as the following type.
\begin{cljlisting}
(defalias Tree
(U '{:op ':node, :left Tree, :right Tree}
'{:op ':leaf, :val Int}))
\end{cljlisting}
The \clj{defalias} form introduces an equi-recursive type alias \clj{Tree},
\clj{U} a union type, \clj{'\{:kw Type ...\}} for heterogeneous keyword map types,
and \clj{':node} for keyword singleton types.
With the following function annotation, Typed Clojure can intelligently type check
the definition and usages of \clj{nodes}.
\begin{cljlisting}
(ann nodes [Tree -> Int])
\end{cljlisting}
This (manually written) Typed Clojure annotation involving \clj{Tree}
is significantly different from TypeWiz's ``verbatim'' annotation for \js{nodes}.
First, it is recursive, and so supports trees of arbitrary depth (TypeWiz's annotation
supports trees of height $<3$).
Second, it uses singleton types \clj{':leaf} and \clj{':node} to distinguish each case
(TypeWiz upcasts \js{"leaf"} and \js{"node"} to \js{string}).
Third, the tree type is factored out under a name to enhance readability and reusability.
On the other end of the spectrum, the ``discarding'' annotations of Typette and PyType
are too imprecise to use meaningfully (they include trees of arbitrary depth, but
also many other values).
The challenge we overcome in this research is to automatically generate
annotations like Typed Clojure's \clj{Tree}, in such a way that the ease of manual amendment is
only mildly reduced by unresolvable ambiguities and incomplete data collection.
%This research presents an approach based on dynamic analysis to automatically inferring
%recursive, structural, and compact annotations
%like the Typed Clojure annotation for \clj{nodes} along with \clj{Tree}.
%We show how our approach tolerates unresolvable ambiguities and incomplete data collection
%to generate close-enough annotations that are often straightforward to manually amend.