infer-evaluation.tex

\Dchapter{Evaluation}
\label{infer:chap:evaluation}

We performed a quantitative evaluation of our workflow on several open source programs
in three experiments.
We ported five programs to Typed Clojure with our workflow,
and merely generated types for one larger program we deemed too difficult to port,
but features interesting data types.

Experiment 1 involves a manual inspection of the types from our automatic algorithm.
We detail our experience in generating types for part of an industrial-grade compiler which
we ultimately decided not to manually port to Typed Clojure.
This was because it uses many programming idioms beyond Typed Clojure's capabilities
(those detailed as ``Further Challenges'' by \infercitet{bonnaire2016practical}),
and so the final part of the workflow mostly involves working around its shortcomings.

Experiment 2 studies the kinds of the manual changes needed to port our five programs
to Typed Clojure, starting from the automatically generated annotations.
Experiment 3 enforces the initially generated annotations for these programs at runtime
to check they are meaningfully underprecise.

%\paragraph{cljs.compiler}
%ClojureScript (CLJS) is a Clojure variant that runs on JavaScript
%virtual machines. We infer types for its compiler (written in Clojure)
%which emits JavaScript from
%a recursively defined map-based abstract syntax tree format.

\Dsection{Experiment 1: Manual inspection}
\label{infer:sec:experiment1}

For the first experiment, we manually inspect the types automatically generated by our tool.
We judge our tool's ability to
use recognizable names,
favor compact annotations, and
not overspecify types.

\begin{figure}
  % indented so line numbers can line up more tastefully
\begin{cljlistingnumbered}
  (defalias Op(*@\label{infer:listing:cljs:Op}@*) ; omitted some entries and 11 cases
    (U (HMap :mandatory(*@\label{infer:listing:cljs:Op:op:bindingStart}@*)
             {:op ':binding,(*@\label{infer:listing:cljs:Op:op:binding}@*) :info (U NameShadowMap(*@\label{infer:listing:cljs:Op:op:binding:NameShadowMap}@*) FnScopeFnSelfNameNsMap(*@\label{infer:listing:cljs:Op:op:binding:FnScopeFnSelfNameNsMap}@*)), ...}
             :optional(*@\label{infer:listing:cljs:Op:optional}@*)
             {:env ColumnLineContextMap, :init Op,(*@\label{infer:listing:cljs:Op:optional:init:Op}@*) :shadow (U nil Op),(*@\label{infer:listing:cljs:Op:optional:shadow:Op}@*) ...})(*@\label{infer:listing:cljs:Op:optionalEnd}@*)(*@\label{infer:listing:cljs:Op:op:bindingEnd}@*)
      '{:op ':const,(*@\label{infer:listing:cljs:Op:op:const}@*) :env HMap49305,(*@\label{infer:listing:cljs:Op:op:const:HMap49305}@*) ...}
      '{:op ':do,(*@\label{infer:listing:cljs:Op:op:do}@*) :env HMap49305,(*@\label{infer:listing:cljs:Op:op:do:HMap49305}@*) :ret Op,(*@\label{infer:listing:cljs:Op:op:do:Op}@*) :statements (Vec Nothing)(*@\label{infer:listing:cljs:Op:op:do:statements}@*), ...}
      ...))(*@\label{infer:listing:cljs:Op-End}@*)
  (defalias ColumnLineContextMap(*@\label{infer:listing:cljs:ColumnLineContextMap}@*)
    (HMap :mandatory {:column Int, :line Int} :optional {:context ':expr}(*@\label{infer:listing:cljs:ColumnLineContextMap:optional}@*)))(*@\label{infer:listing:cljs:ColumnLineContextMapEnd}@*)
  (defalias HMap49305 ; omitted some extries(*@\label{infer:listing:cljs:HMap49305}@*)
    (U nil
       '{:context ':statement, :column Int, ...}
       '{:context ':return, :column Int, ...}
       (HMap :mandatory {:context ':expr, :column Int, ...} :optional {...})))(*@\label{infer:listing:cljs:HMap49305End}@*)
  (ann emit [Op -> nil])(*@\label{infer:listing:cljs:emit}@*)
  (ann emit-dot [Op -> nil])(*@\label{infer:listing:cljs:emit-dot}@*)
\end{cljlistingnumbered}
\caption{Sample generated types for cljs.compiler.
}
\label{infer:fig:cljs}
  %(ann emit-let [Op Any -> Any])(*@\label{infer:listing:cljs:emit-let}@*)
%    '{:op ':fn-method,
%      :body Op,
%      :children '[':params ':body],
%      :env HMap49305,
%      :fixed-arity Int,
%      :form (Coll (Coll Any)),
%      :name Op,
%      :params '[Op],
%      :recurs nil,
%      :type nil,
%      :variadic? false}
%    '{:op ':host-call,
%      :args '[Op],
%      :children Any,
%      :env context-statement-tmp-HMap-alias20275,
%      :form (Coll Sym),
%      :method Sym,
%      :tag Any,
%      :target Op}
%    '{:op ':host-field,
%      :children '[':target],
%      :env context-statement-tmp-HMap-alias20275,
%      :field Sym,
%      :form (Coll Sym),
%      :tag Sym,
%      :target Op}
%    '{:op ':if,
%      :children '[':test ':then ':else],
%      :else Op,
%      :env context-statement-tmp-HMap-alias20275,
%      :form (Coll Any),
%      :tag (Set (U nil Sym)),
%      :test Op,
%      :then Op,
%      :unchecked Boolean}
%    '{:op ':invoke,
%      :args '[Op],
%      :children '[':fn ':args],
%      :env context-statement-tmp-HMap-alias20275,
%      :fn Op,
%      :form (Coll Any),
%      :tag Sym}
%    (HMap
%      :mandatory
%      {:op ':js,
%       :env context-statement-tmp-HMap-alias20275,
%       :form (Coll (U nil Str Sym)),
%       :js-op Sym,
%       :numeric nil,
%       :tag Sym}
%      :optional
%      {:args '[Op Op],
%       :children '[':args],
%       :code Str,
%       :segs (Coll Str)})
%    (HMap
%      :mandatory
%      {:op ':js-var, :name Sym, :ns Sym}
%      :optional
%      {:tag Sym})
%    '{:op ':let,
%      :bindings '[Op Op Any],
%      :body Any,
%      :children Any,
%      :env context-statement-tmp-HMap-alias20275,
%      :form Any,
%      :tag Any}
%    (HMap
%      :mandatory
%      {:op ':local,
%       :env context-statement-tmp-HMap-alias20275,
%       :form Sym,
%       :info Op,
%       :local (U ':arg ':let),
%       :name Sym}
%      :optional
%      {:arg-id Int, :init Op, :tag Sym})
%    '{:op ':map,
%      :children '[':keys ':vals],
%      :env context-statement-tmp-HMap-alias20275,
%      :form AMap,
%      :keys '[Op],
%      :tag Sym,
%      :vals '[Op]}
%    (HMap
%      :mandatory
%      {:op ':var, :name Sym, :ns Sym}
%      :optional
%      {:arglists (Coll Any),
%       :arglists-meta (Coll nil),
%       :column Int,
%       :doc Str,
%       :end-column Int,
%       :end-line Int,
%       :env context-statement-tmp-HMap-alias20275,
%       :file (U nil Str),
%       :fn-var Boolean,
%       :form Sym,
%       :info (U nil ColumnFileLineMap),
%       :line Int,
%       :max-fixed-arity Int,
%       :meta
%       (U
%         ColumnFileLineMap__0
%         FileArglistsColumnMap
%         ColumnEndColumnEndLineMap),
%       :method-params (Coll (Coll Sym)),
%       :protocol-impl nil,
%       :protocol-inline nil,
%       :ret-tag Sym,
%       :tag Sym,
%       :top-fn ArglistsArglistsMetaMaxFixedArityMap,
%       :variadic? Boolean})))
\end{figure}

We take this opportunity to juxtapose some strengths and weaknessess
of our tool by discussing a somewhat problematic benchmark,
a namespace from the ClojureScript compiler called cljs.compiler
(the code generation phase).
We generate 448 lines of type annotations
for the 1,776 line file, and present a sample
of our tool's output as \figref{infer:fig:cljs}.
We were unable to fully complete the porting to Typed Clojure due to
type system limitations, but the annotations yielded by this benchmark
are interesting nonetheless.

The compiler's AST format is inferred as \clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End})
with 22 recursive references
(like lines \ref{infer:listing:cljs:Op:optional:init:Op}, \ref{infer:listing:cljs:Op:optional:shadow:Op}, \ref{infer:listing:cljs:Op:op:do:Op})
and 14 cases distinguished by \clj{:op} (like lines \ref{infer:listing:cljs:Op:op:binding},
\ref{infer:listing:cljs:Op:op:const}, \ref{infer:listing:cljs:Op:op:do}),
5 of which have optional entries (like lines \ref{infer:listing:cljs:Op:optional}-\ref{infer:listing:cljs:Op:optionalEnd}).
To improve inference time,
only the code emission unit tests were exercised (299 lines containing 39 assertions)
which normally take 40 seconds to run, from which we
generated 448 lines of types and 517 lines of specs
in 2.5 minutes on a 2011 MacBook Pro (16GB RAM, 2.4GHz i5),
in part because of key optimizations discussed in \Dchapref{infer:sec:extensions}.

The main function of the code generation phase is \clj{emit}, which
effectfully converts a map-based AST
to JavaScript.
The AST is created by functions in cljs.analyzer,
a significantly larger 4,366 line Clojure file.
Without inspecting cljs.analyzer,
our tool annotates \clj{emit} on line \ref{infer:listing:cljs:emit}
with a recursive AST type \clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End}).

Similar to our opening example \clj{nodes}, it uses the \clj{:op}
key to disambiguate between (16) cases, and has recursive
references (\clj{Op}).
We just present the first 4 cases.
The first case \clj{':binding} has 4 required
and 8 optional entries, whose
\clj{:info} and \clj{:env} entries refer to
other \clj{HMap} type aliases generated by the tool.
%%deleted this code
%Similar to \clj{:op},
%the \clj{:local} entry maps to a keyword singleton
%type,
%however our tool wisely chose to cluster types 
%based on the \clj{:op} entry since it is common to all cases.

%\Dsection{Philosophy}

An important question to address is ``how accurate are these annotations?''.
Unlike previous work in this area~\infercitep{An10dynamicinference}, we do not aim for soundness guarantees
in our generated types. 
A significant contribution of our work is a tool that Clojure programmers
can use to help learn about and specify their programs.
In that spirit, we strive to generate annotations meeting more qualitative criteria.
Each guideline by itself helps generate more useful annotations, and
they combine in interesting ways help to make up for shortcomings.
%in generated annotations.
%which we outline along with a commentary
%judging \figref{infer:fig:cljs} along these lines.

\paragraph{Choose recognizable names}
%Typed Clojure and clojure.spec annotations are abundant
%with useful names for types.
Assigning a good name for a type increases
readability by succinctly conveying its purpose.
Along those lines, a good name for the AST representation
on lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End}
might be \clj{AST} or \clj{Expr}.
However, these kinds of names can be very misleading when incorrect, so
instead of guessing them,
our tool takes a more consistent approach and generates \emph{easily recognizable}
names based on the type the name points to.
Then, those with a passing familiarity with the data flowing through the program
can quickly identify and rename them.
For example,
\begin{itemize}
  \item
    \clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End})
    is chosen because \clj{:op} is
    clearly the dispatch key (the \clj{:op} entry is also helpfully placed
    as the first entry in each case to aid discoverability),
  \item
    \clj{ColumnLineContextMap} (lines \ref{infer:listing:cljs:ColumnLineContextMap}-\ref{infer:listing:cljs:ColumnLineContextMapEnd})
    enumerates the keys of the map type it points to,
  \item
    \clj{NameShadowMap} and \clj{FnScopeFnSelfNameNsMap} (%referenced on
    line
    \ref{infer:listing:cljs:Op:op:binding:NameShadowMap}% and \ref{infer:listing:cljs:Op:op:binding:FnScopeFnSelfNameNsMap}
    )
    similarly, and
  \item
    \clj{HMap49305} (lines \ref{infer:listing:cljs:HMap49305}-\ref{infer:listing:cljs:HMap49305End})
    shows how our tool fails to give names to certain combinations
    of types (we now discuss the severity of this particular situation).
\end{itemize}

A failure of cljs.compiler's
generated types was \clj{HMap49305}.
It clearly fails to be a recognizable name.
However, all is not lost:
the compactness and recognizable names of other adjacent annotations
makes it plausible for a programmer with some
knowledge of the AST representation to 
recover.
In particular 13/14 cases in \clj{Op}
have entries from \clj{:env} to \clj{HMap49305}, 
(like lines \ref{infer:listing:cljs:Op:op:const:HMap49305} and \ref{infer:listing:cljs:Op:op:do:HMap49305}),
and the only exception (line \ref{infer:listing:cljs:Op:optional:init:Op})
maps to \clj{ColumnLineContextMap}. From this information the user can
decide to combine these aliases.


%Good names can sometimes be reconstructed from the program source,
%like function or parameter names, and other times 
%we can use the shape of a type to summarize it.

\paragraph{Favor compact annotations}
Literally translating runtime observations into
annotations without compacting them
leads to unmaintainable and impractical types resembling
TypeWiz's ``verbatim'' annotation for \clj{nodes}.
To avoid this, we
  use optional keys where possible, like line \ref{infer:listing:cljs:ColumnLineContextMap:optional},
  infer recursive types like \clj{Op}, and
  reuse type aliases in function annotations, like
    \clj{emit} and \clj{emit-dot} (lines \ref{infer:listing:cljs:emit}, \ref{infer:listing:cljs:emit-dot}).

One remarkable success in the generated types
was the automatic inference \clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End})
with 14 distinct cases, and other features described in \figref{infer:fig:cljs}.
Further investigation reveals that
the compiler actually features 36 distinct AST nodes---unsurprisingly, 39 assertions was not sufficient
test coverage to discover them all.
However, because of the recognizable name and organization of
\clj{Op}, it's clear where to add the missing nodes
if no further tests are available.

These processes of compacting annotations often makes them more general,
which leads into our next goal.

%Idiomatic Clojure code rarely mixes certain types in the same position,
%unless the program is polymorphic. Using this knowledge---which we observed
%by the annotations and specs assigned to idiomatic Clojure 
%code---we can rule out certain combinations of types to compact our
%resulting output, without losing information that would help us
%type check our programs.

\paragraph{Don't overspecify types}
Poor test coverage can easily skew the results of dynamic analysis tools,
so we choose to err on the side of generalizing types
where possible.
Our opening example \clj{nodes}
is a good example of this---our inferred type
is recursive, despite \clj{nodes} only being tested with a tree of height 2.
This has several benefits.
\begin{itemize}
  \item We avoid exhausting the pool of easily recognizable names
    by generalizing types to communicate the general role
    of an argument or return position.
    For example, \clj{emit-dot} (line \ref{infer:listing:cljs:emit-dot})
    is annotated to take \clj{Op}, but in reality accepts only a subset
    of \clj{Op}.
    Programmers can combine the recognizability of \clj{Op} with the
    suggestive name of \clj{emit-dot} (the dot operator in Clojure handles host interoperability) to decide whether, for instance,
    to split \clj{Op} into smaller type aliases
    or add type casts in the definition of \clj{emit-dot} to please 
    the type checker
    (some libraries require more casts than others to type check, as discussed in \secref{infer:sec:experiment2}).
  \item Generated Clojure spec annotations (an extension discussed in \secref{infer:sec:spec-extension})
        are more likely to accept valid input with specs enabled, even with incomplete unit tests
        (we enable generated specs on several libraries in \secref{infer:sec:experiment3}).
  \item Our approach becomes more amenable to extensions improving the running time
        of runtime observation without significantly deteriorating annotation quality,
        like lazy tracking (\secref{infer:sec:lazy-tracking}).
\end{itemize}

Several instances of overspecification are evident,
such as the \clj{:statements} entry of a \clj{:do} AST node being inferred as an always-empty vector
(line \ref{infer:listing:cljs:Op:op:do:statements}).
In some ways, this is useful information, showing that
test coverage for \clj{:do} nodes could be improved.
To fix the annotation, we could rerun the tool with better tests.
If no such test exists, we would have to fall back
to reverse-engineering code to identify the correct
type of \clj{:statements}, which is \clj{(Vec Op)}.

Finally, 19 functions in cljs.compiler are annotated to 
take or return \clj{Op} (like lines \ref{infer:listing:cljs:emit}, \ref{infer:listing:cljs:emit-dot}).
This kind of alias reuse enables annotations
to be relatively compact (only 16 type aliases are used by the
49 functions that were exercised).

%
%We rate the quality of generated annotations
%on several axes.
%
%\paragraph{Compactness} Type annotations should be succinct,
%        but without sacrificing too much accuracy.
%        Are our type aliases intelligently combined
%        with good choices for optional keys?
%
%  \paragraph{Accuracy} Would executing a program with these
%      type annotations cause an error?
%      Have we too eagerly erased information in favor
%      of compactness?
%
%  \paragraph{Organization} Have we chosen good recursive types?
%      Do they have good names?
%
%
%\figref{infer:fig:gentype} shows our results.
%Our first program is an implementation of a
%1971 Star Trek game.
%It comes with minimal tests, so to complete this experiment,
%we instead played the game for 30 seconds.

%\begin{figure*}
%  \footnotesize
%\begin{tabular}
%{|         l   || l   | l  | l   || l  | l | l | l | l | l | l | l | l | l | l | l | l | l |}
%  Lib           & LOC  & GT  & LA & MD      & C  & I & P & L & S & O & U & N & V & R & K & F & H \\ 
%  \hline
%  \hline
%  sc            & 166  & 133 & 3  & 70/41   & 5  & 0 & 0 & 2 & 13& 1 & 5 & 1 & 1 & 2 & 0 & 0 & 0 \\
%  mc            & 923  & 395 & 147& 124/120 & 23 & 1 & 11& 19& 2 & 5 & 0 & 9 & 3 & 2 & 4 & 1 & 3 \\
%  fs            & 588  & 157 & 1  & 119/86  & 50 & 0 & 0 & 2 & 3 & 4 & 4 & 11& 2 & 9 & 0 & 0 & 0 \\
%  dj            & 528  & 168 & 9  & 94/125 \\
%  mo            & 530  & 49  & 1  & 46/26%\\
% %data.xml      &      & \\
% % cc            & 1776 & 448 & 4  & N/A 
%  %\\
%\end{tabular}
%  \caption{\emph{The number of type annotations generated for each program}:
%  Lib = Abbreviated library names in the order we introduce them on page \pageref{infer:chap:evaluation},
%  LOC = Number of lines of code we generate types for,
%  GT = Total number of lines of generated types after running our tool,
%  LA = The number of local annotations generated by our tools.
%  \emph{Number of manual changes needed to type check, and why they were needed}:
%  MD = Lines added/removed diff from git comparing initial generated types to
%       the manual amendments needed to
%       type check with Typed Clojure (unless it was too difficult to port),
%  C = Casts,
%  I = Instantiation,
%  P = Polymorphic annotation,
%  L = Local annotation,
%  S = Work around type system Shortcoming,
%  O = Overprecise argument type,
%  U = Uncalled function due to bad test coverage,
%  N = Add No-check annotation to skip checking function,
%  V = Add Variable arity argument type,
%  R = Overprecise return type,
%  K = Add Keyword argument types,
%  F = Added filter annotation,
%  H = Erase/upcast HVec annotation.
%  }
%\end{figure*}

\Dsection{Experiment 2: Changes needed to type check}
\label{infer:sec:experiment2}
% TODO examples for all kinds of things
% TODO bucket how many changes are needed for each kind of thing
%      - eg. varargs, polymorphism
% TODO how many lines of code were skipped

We used our workflow to port the following open source Clojure programs to Typed Clojure.

\paragraph{startrek-clojure}
A reimplementation of a Star Trek text adventure game,
created as a way to learn Clojure.

\paragraph{math.combinatorics}
The core library for common combinatorial functions
on collections,
with implementations based on Knuth's Art of Computer
Programming, Volume 4.

\paragraph{fs}
A Clojure wrapper library over common file-system operations.

\paragraph{data.json}
A library for working with JSON.

%\paragraph{data.xml} A library for manipulating and outputting XML in Clojure.

\paragraph{mini.occ}
A model of occurrence typing by an author of the
current paper. It utilizes three mutually recursive
ad-hoc structures to represent expressions, types,
and propositions.

In this experiment, we first generated types with our algorithm
by running the tests, then amended the program so that it
type checks.
\figref{infer:fig:gentype} summarizes our results.
After the lines of code we generate types for, the next two columns show how many lines of
types were generated and the lines manually changed, respectively.
The latter is a git line diff between commits of the initial
generated types and the final manually amended annotations.
While an objectively fair measurement,
it is not a good indication of the effort needed to port annotations
(a 1 character changes on a line is represented by 1 line addition and 1 line deletion)
The rest of the table enumerates the different kinds of changes needed 
and their frequency.

\begin{figure*}
\begin{tabular}{|r||c|c|c||c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
  Library       & \rotatebox{270}{Lines of code}
                & \rotatebox{270}{Lines of Generated Global/Local Types}
                & \rotatebox{270}{Lines manually added/removed}
                & \rotatebox{270}{Casts/Instantiations}
                & \rotatebox{270}{Polymorphic annotation}
                & \rotatebox{270}{Local annotation}
                & \rotatebox{270}{Type System Workaround/no-check}
                & \rotatebox{270}{Overprecise argument/return type}
                & \rotatebox{270}{Uncalled function (bad test coverage)}
                & \rotatebox{270}{Variable-arity/keyword arg type}
                & \rotatebox{270}{Add occurrence typing annotation}
                & \rotatebox{270}{Erase or upcast HVec annotation}
                & \rotatebox{270}{Add missing case in defalias}
                \\ 
  \hline
  \hline
  startrek   & 166  & 133/3   & 70/41    & 5  / 0 & 0 & 2 & 13/1 & 1 /2 & 5 &  1 /  0 & 0 & 0 & 0\\
  math.comb  & 923  & 395/147 & 124/120  & 23 / 1 & 11& 19& 2 /9 & 5 /2 & 0 &  3 /  4 & 1 & 3 & 0\\
  fs         & 588  & 157/1   & 119/86   & 50 / 0 & 0 & 2 & 3 /11& 4 /9 & 4 &  2 /  0 & 0 & 0 & 0\\
  data.json  & 528  & 168/9   & 94/125   & 6  / 0 & 0 & 2 & 4 /5 & 11/7 & 5 &  0 /  20& 0 & 0 & 0\\
  mini.occ   & 530  & 49/1    & 46/26    & 7  / 0 & 0 & 2 & 5 /2 & 4 /2 & 6 &  0 /  0 & 0 & 1 & 5\\
 % cc            & 1776 & 448 & 4  & N/A 
  %\\
\end{tabular}
  \caption{Lines of generated annotations, git line diff for total manual changes to type check the program,
  and the kinds of manual changes.
  }
  \label{infer:fig:gentype}
\end{figure*}

\paragraph{Uncalled functions}
A function without tests receives a broad type annotation that
must be amended.
%
For example, the startrek-clojure game has several exit
conditions, one of which is running out of time.
Since the tests do not specifically call this function,
nor play the game long enough to invoke this condition,
no useful type is inferred.

\begin{cljlisting}
(ann game-over-out-of-time AnyFunction)
\end{cljlisting}

In this case, minimal effort is needed to amend this
type signature: the appropriate type alias
already exists:

\begin{cljlisting}
(defalias CurrentKlingonsCurrentSectorEnterpriseMap
  (HMap :mandatory
    {:current-klingons (Vec EnergySectorMap),
     :current-sector (Vec Int), ...}
    :optional {:lrs-history (Vec Str)}))
\end{cljlisting}
%\begin{cljlisting}
%(defalias CurrentKlingonsCurrentSectorEnterpriseMap
%  (HMap :mandatory
%    {:current-klingons (Vec EnergySectorMap),
%     :current-sector (Vec Int), 
%     :enterprise EnergyIsDockedQuadrantMap,
%     :quads (Vec BasesKlingonsQuadrantMap), 
%     :stardate CurrentEndStartMap,
%     :starting-klingons Int}
%    :optional {:lrs-history (Vec Str)}))
%\end{cljlisting}

So we amend the signature as

\begin{cljlisting}
(ann game-over-out-of-time
  [(Atom1 CurrentKlingonsCurrentSectorEnterpriseMap) 
   -> Boolean])
\end{cljlisting}


\paragraph{Over-precision}
Function types are often too restrictive due to
insufficient unit tests.

There are several instances of this in math.combinatorics.
The \clj{all-different?} function
takes a collection and returns true only if the collection
contains distinct elements.
As evidenced in the generated type, the tests exercise
this functions with collections of integers, atoms,
keywords, and characters.

\begin{cljlisting}
(ann all-different?
  [(Coll (U Int (Atom1 Int) ':a ':b Character)) 
   -> Boolean])
\end{cljlisting}

In our experience, the union is very rarely a good candidate
for a Typed Clojure type signature, so a useful heuristic to improve
the generated types would be to upcast such unions to a more permissive
type, like \clj{Any}.
When we performed that case study, we did not yet add that heuristic
to our tool,
so in this case, we manually amend the signature as

\begin{cljlisting}
(ann all-different? [(Coll Any) -> Boolean])
\end{cljlisting}

Another example of overprecision is the generated type
of \clj{initial-perm-numbers} a helper function
taking a \emph{frequency map}---a hash map from values
to the number of times they occur---which is the shape
of the return value of the core \clj{frequencies}
function.

The generated type shows only a frequency map where
the values are integers are exercised.
%
\begin{cljlisting}
(ann initial-perm-numbers
  [(Map Int Int) -> (Coll Int)])
\end{cljlisting}
%
A more appropriate type instead takes \clj{(Map Any Int)}.
%
%\begin{cljlisting}
%(ann initial-perm-numbers
%  [(Map Any Int) -> (Coll Int)])
%\end{cljlisting}
%
In many examples of overprecision, while the generated
type might not be immediately useful to check programs,
they serve as valuable starting points and also provide
an interesting summary of test coverage.

\paragraph{Missing polymorphism}

We do not attempt to infer polymorphic function types, 
so these amendments are expected. However, it is useful
to compare the optimal types with our generated ones.

For example, the \clj{remove-nth} function in \clj{math.combinatorics}
returns a functional delete operation on its argument.
Here we can see the tests only exercise this function with
collections of integers.

\begin{cljlisting}
(ann remove-nth [(Coll Int) Int -> (Vec Int)])
\end{cljlisting}

However, the overall shape of the function is intact,
and the manually amended type only requires a few 
keystrokes.

\begin{cljlisting}
(ann remove-nth
  (All [a] [(Coll a) Int -> (Vec a)]))
\end{cljlisting}

Similarly, \clj{iter-perm} could be polymorphic, 
but its type is generated as

\begin{cljlisting}
(ann iter-perm [(Vec Int) -> (U nil (Vec Int))])
\end{cljlisting}

We decided this function actually works over any number,
and bounded polymorphism was more appropriate, encoding
the fact that the elements of the output collection
are from the input collection.

\begin{cljlisting}
(ann iter-perm
  (All [a]
    [(Vec (I a Num)) -> (U nil (Vec (I a Num)))]))
\end{cljlisting}
%
%\paragraph{Missing return}
%Sometimes a function never returns, because of infinite loops
%or exceptions.

\paragraph{Missing argument counts}
Often, variable argument functions are given very precise types.
Our algorithm does not apply any heuristics to approximate
variable arguments --- instead we emit types that reflect
only the arities that were called during the unit tests.

The \clj{math.combinatorics} experiment contains
a good example of this phemonenon in the type inferred
for the \clj{plus} helper function.
From the generated type, we can see the tests exercise this function with 2, 6,
and 7 arguments.

\begin{cljlisting}
(ann plus (IFn [Int Int Int Int Int Int Int -> Int]
               [Int Int Int Int Int Int -> Int]
               [Int Int -> Int]))
\end{cljlisting}

Instead, \clj{plus} is actually variadic and works over any number of arguments.
It is better annotated as the following, which is easy to guess based on
both the annotated type and manually viewing the function implementation.

\begin{cljlisting}
(ann plus [Int * -> Int])
\end{cljlisting}

A similar issue occurs with \clj{mult}.

\begin{cljlisting}
(ann mult [Int Int -> Int]) ;; generated
(ann mult [Int * -> Int])   ;; amended
\end{cljlisting}

A similar issue is inferring keyword arguments. Clojure implements
keyword arguments with normal variadic arguments. Notice
the generated type for \clj{lex-partitions-H},
which takes a fixed argument, followed by some optional integer keyword
arguments. 

\begin{cljlisting}
(ann lex-partitions-H
  (IFn [Int -> (Coll (Coll (Vec Int)))]
       [Int ':min Int ':max Int 
        -> (Coll (Coll (Coll Int)))]))
\end{cljlisting}

While the arity of the generated type is too specific,
we can conceivably use the type to help us write a better one.

\begin{cljlisting}
(ann lex-partitions-H
  [Int & :optional {:min Int :max Int}
   -> (Coll (Coll (Coll Int)))])
\end{cljlisting}

\paragraph{Weaknesses in Typed Clojure}

We encountered several known weaknesses in Typed Clojure's type system
that we worked around.
%
The most invasive change needed was in startrek-clojure, which
strongly updated the global mutable configuration map on initial
play. We instead initialized the map with a dummy
value when it is first created.

\paragraph{Missing \clj{defalias} cases}

With insufficient test coverage, our tool can miss cases in a recursively defined
type.
In particular, mini.occ features three recursive types---for the representation
of types \clj{T}, propositions \clj{P}, and expressions \clj{E}.
For \clj{T}, three cases were missing, along with having to upcast the \clj{:params}
entry from the singleton vector \clj{'[NameTypeMap]}.
Two cases were missing from \clj{E}.
The manual changes are highlighted (\clj{P} required no changes with five cases).

\begin{minipage}[t]{0.54\linewidth}
\begin{cljlisting}
(defalias T
  (U (*@\colorbox{pink}{'\{:T ':not, :type T\}}@*)
     (*@\colorbox{pink}{'\{:T ':refine, :name t/Sym, :prop P\}}@*)
     (*@\colorbox{pink}{'\{:T ':union, :types (t/Set T)\}}@*)
     '{:T ':false}
     '{:T ':fun,
       :params (*@\colorbox{pink}{(t/Vec}@*) NameTypeMap(*@\colorbox{pink}{)}@*),
       :return T}
     '{:T ':intersection, :types (Set T)}
     '{:T ':num}))
\end{cljlisting}
\end{minipage}
%
\begin{minipage}[t]{0.4\linewidth}
\begin{cljlisting}
(defalias E
  (U (*@\colorbox{pink}{'\{:E ':add1\}}@*)
     (*@\colorbox{pink}{'\{:E ':n?\}}@*)
     '{:E ':app, :args (Vec E),
       :fun E}
     '{:E ':false}
     '{:E ':if, :else E,
       :test E, :then E}
     '{:E ':lambda, :arg Sym,
       :arg-type T, :body E}
     '{:E ':var, :name Sym}))
\end{cljlisting}
\end{minipage}


%cljs.compiler uses many polymorphic idioms that Typed Clojure is
%poor at checking, so we deemed it too difficult to attempt to
%type check. In particular, there are many of usages of the
%core functions
%\clj{get-in} and \clj{update-in} (functions that deeply lookup
%and manipulate maps) which are not even assigned types
%in Typed Clojure.
%Many function definitions would need to be ignored by the type
%checker to work around this.
%Furthermore, many manual instantiations
%would be needed to check transducers and polymorphic functions
%passed to other polymorphic functions.

%\begin{verbatim}
%  - get/get-in
%  - apply + kw args
%  - strong updates
%\end{verbatim}

%\paragraph{Possible errors in programs}


\Dsection{Experiment 3: Specs pass unit tests}
\label{infer:sec:experiment3}

Our final experiment uses our tool to
generate specs (\secref{infer:sec:spec-extension})
instead of types.
Specs are checked at runtime,
so to verify the utility of generated specs,
we enable spec checking while
rerunning the unit tests that were used
in the process of creating them.

\begin{figure*}
\begin{tabular}
{|         l   || l   | l || l  | l  | l || l |}
  Library       & LOC  &  Lines of specs  & Recursive & Instance & Het. Map & Passed Tests?\\ 
  \hline
  \hline
  startrek      & 166  &  25  & 0  & 10   & 0  & Yes\\
  math.comb     & 923  &  601 & 0  & 320  & 0  & Yes\\
  fs            & 588  &  543 & 0  & 215  & 0  & Yes\\
  data.json     & 528  &  401 & 0  & 174  & 0  & No (1/79 failed)\\ % pprinting related test
  mini.occ      & 530  &  131 & 3  & 25   & 15 & Yes\\
 %data.xml      &      & \\
 % cc            & 1776 & 448 & 4  & N/A 
  %\\
\end{tabular}
  \caption{Summary of the quantity and kinds of generated specs and whether they passed
  unit tests when enabled.
  The one failing test was related to pretty-printing JSON, and seems to be an artifact
  of our testing environment, as it still fails with all specs removed.
  }
\label{infer:fig:genspec}
\end{figure*}


At first this might seem like a trivial property, but it serves as
a valuable test of our inference algorithm.
The aggressive merging strategies to minimize aliases and
maximize recognizability, while unsound transformations,
are based on hypotheses about Clojure idioms and how
Clojure programs are constructed.
If, hypothetically, we generated singleton specs for numbers
like we do for keywords and did not eventually upcast
them to \clj{number?}, the specs might be too strict
to pass its unit tests.
Some function specs also perform generative testing based on
the argument and return types provided.
If we collapse a spec too much and include it in such
a spec, it might feed a function invalid input.

Thankfully, we avoid such pitfalls, and so
our generated specs pass their tests for the benchmarks
we tried.
\figref{infer:fig:genspec} shows
our preliminary results. All inferred specs pass the unit
tests when enforced, which tells us they are at least well formed.
We had some seemingly unrelated difficulty with a test in data.json which we explain
in the caption.
Since hundreds of invariants are checked---mostly ``instance'' checks that a value is of a particular class or interface---we can also be more confident
that the specs are useful.


%\Dsubsection{Experiment 3: Generating generative tests}

% We should generate the card playing specs in this guide:
% http://clojure.org/guides/spec

% # How evaluate
% ## qualitative
% Does it make sense??
% 
% 1. Don't run, gen type, manual inspection
%   - done on something small but real
%   - star trek game?
% 
% - Try different eval methods on different programs
%   - try different projects on different methods
%
% 2. Generate types, try type checking programs
%   - record what changes needed to get it to
%     type check 
%   - (on a different program than 1.) 
% 
% 3. Generate spec, insert the spec, run the test
%    with the spec on, also generate tests
%   - does spec ignore the input??
%     or just generate tests
%   - best situation:
%     - spec all passes
%     - then types check with minimal changes
%   - Q: can we use spec's tests to improve
%        types, iteratively?
%        (could throw away exceptions, throw
%         away bad input etc., different options
%         here)
% (optional)
% 4. Generate types, use gradual typing