-
Notifications
You must be signed in to change notification settings - Fork 0
/
infer-evaluation.tex
892 lines (800 loc) · 34.3 KB
/
infer-evaluation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
\Dchapter{Evaluation}
\label{infer:chap:evaluation}
We performed a quantitative evaluation of our workflow on several open source programs
in three experiments.
We ported five programs to Typed Clojure with our workflow,
and merely generated types for one larger program we deemed too difficult to port,
but features interesting data types.
Experiment 1 involves a manual inspection of the types from our automatic algorithm.
We detail our experience in generating types for part of an industrial-grade compiler which
we ultimately decided not to manually port to Typed Clojure.
This was because it uses many programming idioms beyond Typed Clojure's capabilities
(those detailed as ``Further Challenges'' by \infercitet{bonnaire2016practical}),
and so the final part of the workflow mostly involves working around its shortcomings.
Experiment 2 studies the kinds of the manual changes needed to port our five programs
to Typed Clojure, starting from the automatically generated annotations.
Experiment 3 enforces the initially generated annotations for these programs at runtime
to check they are meaningfully underprecise.
%\paragraph{cljs.compiler}
%ClojureScript (CLJS) is a Clojure variant that runs on JavaScript
%virtual machines. We infer types for its compiler (written in Clojure)
%which emits JavaScript from
%a recursively defined map-based abstract syntax tree format.
\Dsection{Experiment 1: Manual inspection}
\label{infer:sec:experiment1}
For the first experiment, we manually inspect the types automatically generated by our tool.
We judge our tool's ability to
use recognizable names,
favor compact annotations, and
not overspecify types.
\begin{figure}
% indented so line numbers can line up more tastefully
\begin{cljlistingnumbered}
(defalias Op(*@\label{infer:listing:cljs:Op}@*) ; omitted some entries and 11 cases
(U (HMap :mandatory(*@\label{infer:listing:cljs:Op:op:bindingStart}@*)
{:op ':binding,(*@\label{infer:listing:cljs:Op:op:binding}@*) :info (U NameShadowMap(*@\label{infer:listing:cljs:Op:op:binding:NameShadowMap}@*) FnScopeFnSelfNameNsMap(*@\label{infer:listing:cljs:Op:op:binding:FnScopeFnSelfNameNsMap}@*)), ...}
:optional(*@\label{infer:listing:cljs:Op:optional}@*)
{:env ColumnLineContextMap, :init Op,(*@\label{infer:listing:cljs:Op:optional:init:Op}@*) :shadow (U nil Op),(*@\label{infer:listing:cljs:Op:optional:shadow:Op}@*) ...})(*@\label{infer:listing:cljs:Op:optionalEnd}@*)(*@\label{infer:listing:cljs:Op:op:bindingEnd}@*)
'{:op ':const,(*@\label{infer:listing:cljs:Op:op:const}@*) :env HMap49305,(*@\label{infer:listing:cljs:Op:op:const:HMap49305}@*) ...}
'{:op ':do,(*@\label{infer:listing:cljs:Op:op:do}@*) :env HMap49305,(*@\label{infer:listing:cljs:Op:op:do:HMap49305}@*) :ret Op,(*@\label{infer:listing:cljs:Op:op:do:Op}@*) :statements (Vec Nothing)(*@\label{infer:listing:cljs:Op:op:do:statements}@*), ...}
...))(*@\label{infer:listing:cljs:Op-End}@*)
(defalias ColumnLineContextMap(*@\label{infer:listing:cljs:ColumnLineContextMap}@*)
(HMap :mandatory {:column Int, :line Int} :optional {:context ':expr}(*@\label{infer:listing:cljs:ColumnLineContextMap:optional}@*)))(*@\label{infer:listing:cljs:ColumnLineContextMapEnd}@*)
(defalias HMap49305 ; omitted some extries(*@\label{infer:listing:cljs:HMap49305}@*)
(U nil
'{:context ':statement, :column Int, ...}
'{:context ':return, :column Int, ...}
(HMap :mandatory {:context ':expr, :column Int, ...} :optional {...})))(*@\label{infer:listing:cljs:HMap49305End}@*)
(ann emit [Op -> nil])(*@\label{infer:listing:cljs:emit}@*)
(ann emit-dot [Op -> nil])(*@\label{infer:listing:cljs:emit-dot}@*)
\end{cljlistingnumbered}
\caption{Sample generated types for cljs.compiler.
}
\label{infer:fig:cljs}
%(ann emit-let [Op Any -> Any])(*@\label{infer:listing:cljs:emit-let}@*)
% '{:op ':fn-method,
% :body Op,
% :children '[':params ':body],
% :env HMap49305,
% :fixed-arity Int,
% :form (Coll (Coll Any)),
% :name Op,
% :params '[Op],
% :recurs nil,
% :type nil,
% :variadic? false}
% '{:op ':host-call,
% :args '[Op],
% :children Any,
% :env context-statement-tmp-HMap-alias20275,
% :form (Coll Sym),
% :method Sym,
% :tag Any,
% :target Op}
% '{:op ':host-field,
% :children '[':target],
% :env context-statement-tmp-HMap-alias20275,
% :field Sym,
% :form (Coll Sym),
% :tag Sym,
% :target Op}
% '{:op ':if,
% :children '[':test ':then ':else],
% :else Op,
% :env context-statement-tmp-HMap-alias20275,
% :form (Coll Any),
% :tag (Set (U nil Sym)),
% :test Op,
% :then Op,
% :unchecked Boolean}
% '{:op ':invoke,
% :args '[Op],
% :children '[':fn ':args],
% :env context-statement-tmp-HMap-alias20275,
% :fn Op,
% :form (Coll Any),
% :tag Sym}
% (HMap
% :mandatory
% {:op ':js,
% :env context-statement-tmp-HMap-alias20275,
% :form (Coll (U nil Str Sym)),
% :js-op Sym,
% :numeric nil,
% :tag Sym}
% :optional
% {:args '[Op Op],
% :children '[':args],
% :code Str,
% :segs (Coll Str)})
% (HMap
% :mandatory
% {:op ':js-var, :name Sym, :ns Sym}
% :optional
% {:tag Sym})
% '{:op ':let,
% :bindings '[Op Op Any],
% :body Any,
% :children Any,
% :env context-statement-tmp-HMap-alias20275,
% :form Any,
% :tag Any}
% (HMap
% :mandatory
% {:op ':local,
% :env context-statement-tmp-HMap-alias20275,
% :form Sym,
% :info Op,
% :local (U ':arg ':let),
% :name Sym}
% :optional
% {:arg-id Int, :init Op, :tag Sym})
% '{:op ':map,
% :children '[':keys ':vals],
% :env context-statement-tmp-HMap-alias20275,
% :form AMap,
% :keys '[Op],
% :tag Sym,
% :vals '[Op]}
% (HMap
% :mandatory
% {:op ':var, :name Sym, :ns Sym}
% :optional
% {:arglists (Coll Any),
% :arglists-meta (Coll nil),
% :column Int,
% :doc Str,
% :end-column Int,
% :end-line Int,
% :env context-statement-tmp-HMap-alias20275,
% :file (U nil Str),
% :fn-var Boolean,
% :form Sym,
% :info (U nil ColumnFileLineMap),
% :line Int,
% :max-fixed-arity Int,
% :meta
% (U
% ColumnFileLineMap__0
% FileArglistsColumnMap
% ColumnEndColumnEndLineMap),
% :method-params (Coll (Coll Sym)),
% :protocol-impl nil,
% :protocol-inline nil,
% :ret-tag Sym,
% :tag Sym,
% :top-fn ArglistsArglistsMetaMaxFixedArityMap,
% :variadic? Boolean})))
\end{figure}
We take this opportunity to juxtapose some strengths and weaknessess
of our tool by discussing a somewhat problematic benchmark,
a namespace from the ClojureScript compiler called cljs.compiler
(the code generation phase).
We generate 448 lines of type annotations
for the 1,776 line file, and present a sample
of our tool's output as \figref{infer:fig:cljs}.
We were unable to fully complete the porting to Typed Clojure due to
type system limitations, but the annotations yielded by this benchmark
are interesting nonetheless.
The compiler's AST format is inferred as \clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End})
with 22 recursive references
(like lines \ref{infer:listing:cljs:Op:optional:init:Op}, \ref{infer:listing:cljs:Op:optional:shadow:Op}, \ref{infer:listing:cljs:Op:op:do:Op})
and 14 cases distinguished by \clj{:op} (like lines \ref{infer:listing:cljs:Op:op:binding},
\ref{infer:listing:cljs:Op:op:const}, \ref{infer:listing:cljs:Op:op:do}),
5 of which have optional entries (like lines \ref{infer:listing:cljs:Op:optional}-\ref{infer:listing:cljs:Op:optionalEnd}).
To improve inference time,
only the code emission unit tests were exercised (299 lines containing 39 assertions)
which normally take 40 seconds to run, from which we
generated 448 lines of types and 517 lines of specs
in 2.5 minutes on a 2011 MacBook Pro (16GB RAM, 2.4GHz i5),
in part because of key optimizations discussed in \Dchapref{infer:sec:extensions}.
The main function of the code generation phase is \clj{emit}, which
effectfully converts a map-based AST
to JavaScript.
The AST is created by functions in cljs.analyzer,
a significantly larger 4,366 line Clojure file.
Without inspecting cljs.analyzer,
our tool annotates \clj{emit} on line \ref{infer:listing:cljs:emit}
with a recursive AST type \clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End}).
Similar to our opening example \clj{nodes}, it uses the \clj{:op}
key to disambiguate between (16) cases, and has recursive
references (\clj{Op}).
We just present the first 4 cases.
The first case \clj{':binding} has 4 required
and 8 optional entries, whose
\clj{:info} and \clj{:env} entries refer to
other \clj{HMap} type aliases generated by the tool.
%%deleted this code
%Similar to \clj{:op},
%the \clj{:local} entry maps to a keyword singleton
%type,
%however our tool wisely chose to cluster types
%based on the \clj{:op} entry since it is common to all cases.
%\Dsection{Philosophy}
An important question to address is ``how accurate are these annotations?''.
Unlike previous work in this area~\infercitep{An10dynamicinference}, we do not aim for soundness guarantees
in our generated types.
A significant contribution of our work is a tool that Clojure programmers
can use to help learn about and specify their programs.
In that spirit, we strive to generate annotations meeting more qualitative criteria.
Each guideline by itself helps generate more useful annotations, and
they combine in interesting ways help to make up for shortcomings.
%in generated annotations.
%which we outline along with a commentary
%judging \figref{infer:fig:cljs} along these lines.
\paragraph{Choose recognizable names}
%Typed Clojure and clojure.spec annotations are abundant
%with useful names for types.
Assigning a good name for a type increases
readability by succinctly conveying its purpose.
Along those lines, a good name for the AST representation
on lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End}
might be \clj{AST} or \clj{Expr}.
However, these kinds of names can be very misleading when incorrect, so
instead of guessing them,
our tool takes a more consistent approach and generates \emph{easily recognizable}
names based on the type the name points to.
Then, those with a passing familiarity with the data flowing through the program
can quickly identify and rename them.
For example,
\begin{itemize}
\item
\clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End})
is chosen because \clj{:op} is
clearly the dispatch key (the \clj{:op} entry is also helpfully placed
as the first entry in each case to aid discoverability),
\item
\clj{ColumnLineContextMap} (lines \ref{infer:listing:cljs:ColumnLineContextMap}-\ref{infer:listing:cljs:ColumnLineContextMapEnd})
enumerates the keys of the map type it points to,
\item
\clj{NameShadowMap} and \clj{FnScopeFnSelfNameNsMap} (%referenced on
line
\ref{infer:listing:cljs:Op:op:binding:NameShadowMap}% and \ref{infer:listing:cljs:Op:op:binding:FnScopeFnSelfNameNsMap}
)
similarly, and
\item
\clj{HMap49305} (lines \ref{infer:listing:cljs:HMap49305}-\ref{infer:listing:cljs:HMap49305End})
shows how our tool fails to give names to certain combinations
of types (we now discuss the severity of this particular situation).
\end{itemize}
A failure of cljs.compiler's
generated types was \clj{HMap49305}.
It clearly fails to be a recognizable name.
However, all is not lost:
the compactness and recognizable names of other adjacent annotations
makes it plausible for a programmer with some
knowledge of the AST representation to
recover.
In particular 13/14 cases in \clj{Op}
have entries from \clj{:env} to \clj{HMap49305},
(like lines \ref{infer:listing:cljs:Op:op:const:HMap49305} and \ref{infer:listing:cljs:Op:op:do:HMap49305}),
and the only exception (line \ref{infer:listing:cljs:Op:optional:init:Op})
maps to \clj{ColumnLineContextMap}. From this information the user can
decide to combine these aliases.
%Good names can sometimes be reconstructed from the program source,
%like function or parameter names, and other times
%we can use the shape of a type to summarize it.
\paragraph{Favor compact annotations}
Literally translating runtime observations into
annotations without compacting them
leads to unmaintainable and impractical types resembling
TypeWiz's ``verbatim'' annotation for \clj{nodes}.
To avoid this, we
use optional keys where possible, like line \ref{infer:listing:cljs:ColumnLineContextMap:optional},
infer recursive types like \clj{Op}, and
reuse type aliases in function annotations, like
\clj{emit} and \clj{emit-dot} (lines \ref{infer:listing:cljs:emit}, \ref{infer:listing:cljs:emit-dot}).
One remarkable success in the generated types
was the automatic inference \clj{Op} (lines \ref{infer:listing:cljs:Op}-\ref{infer:listing:cljs:Op-End})
with 14 distinct cases, and other features described in \figref{infer:fig:cljs}.
Further investigation reveals that
the compiler actually features 36 distinct AST nodes---unsurprisingly, 39 assertions was not sufficient
test coverage to discover them all.
However, because of the recognizable name and organization of
\clj{Op}, it's clear where to add the missing nodes
if no further tests are available.
These processes of compacting annotations often makes them more general,
which leads into our next goal.
%Idiomatic Clojure code rarely mixes certain types in the same position,
%unless the program is polymorphic. Using this knowledge---which we observed
%by the annotations and specs assigned to idiomatic Clojure
%code---we can rule out certain combinations of types to compact our
%resulting output, without losing information that would help us
%type check our programs.
\paragraph{Don't overspecify types}
Poor test coverage can easily skew the results of dynamic analysis tools,
so we choose to err on the side of generalizing types
where possible.
Our opening example \clj{nodes}
is a good example of this---our inferred type
is recursive, despite \clj{nodes} only being tested with a tree of height 2.
This has several benefits.
\begin{itemize}
\item We avoid exhausting the pool of easily recognizable names
by generalizing types to communicate the general role
of an argument or return position.
For example, \clj{emit-dot} (line \ref{infer:listing:cljs:emit-dot})
is annotated to take \clj{Op}, but in reality accepts only a subset
of \clj{Op}.
Programmers can combine the recognizability of \clj{Op} with the
suggestive name of \clj{emit-dot} (the dot operator in Clojure handles host interoperability) to decide whether, for instance,
to split \clj{Op} into smaller type aliases
or add type casts in the definition of \clj{emit-dot} to please
the type checker
(some libraries require more casts than others to type check, as discussed in \secref{infer:sec:experiment2}).
\item Generated Clojure spec annotations (an extension discussed in \secref{infer:sec:spec-extension})
are more likely to accept valid input with specs enabled, even with incomplete unit tests
(we enable generated specs on several libraries in \secref{infer:sec:experiment3}).
\item Our approach becomes more amenable to extensions improving the running time
of runtime observation without significantly deteriorating annotation quality,
like lazy tracking (\secref{infer:sec:lazy-tracking}).
\end{itemize}
Several instances of overspecification are evident,
such as the \clj{:statements} entry of a \clj{:do} AST node being inferred as an always-empty vector
(line \ref{infer:listing:cljs:Op:op:do:statements}).
In some ways, this is useful information, showing that
test coverage for \clj{:do} nodes could be improved.
To fix the annotation, we could rerun the tool with better tests.
If no such test exists, we would have to fall back
to reverse-engineering code to identify the correct
type of \clj{:statements}, which is \clj{(Vec Op)}.
Finally, 19 functions in cljs.compiler are annotated to
take or return \clj{Op} (like lines \ref{infer:listing:cljs:emit}, \ref{infer:listing:cljs:emit-dot}).
This kind of alias reuse enables annotations
to be relatively compact (only 16 type aliases are used by the
49 functions that were exercised).
%
%We rate the quality of generated annotations
%on several axes.
%
%\paragraph{Compactness} Type annotations should be succinct,
% but without sacrificing too much accuracy.
% Are our type aliases intelligently combined
% with good choices for optional keys?
%
% \paragraph{Accuracy} Would executing a program with these
% type annotations cause an error?
% Have we too eagerly erased information in favor
% of compactness?
%
% \paragraph{Organization} Have we chosen good recursive types?
% Do they have good names?
%
%
%\figref{infer:fig:gentype} shows our results.
%Our first program is an implementation of a
%1971 Star Trek game.
%It comes with minimal tests, so to complete this experiment,
%we instead played the game for 30 seconds.
%\begin{figure*}
% \footnotesize
%\begin{tabular}
%{| l || l | l | l || l | l | l | l | l | l | l | l | l | l | l | l | l | l |}
% Lib & LOC & GT & LA & MD & C & I & P & L & S & O & U & N & V & R & K & F & H \\
% \hline
% \hline
% sc & 166 & 133 & 3 & 70/41 & 5 & 0 & 0 & 2 & 13& 1 & 5 & 1 & 1 & 2 & 0 & 0 & 0 \\
% mc & 923 & 395 & 147& 124/120 & 23 & 1 & 11& 19& 2 & 5 & 0 & 9 & 3 & 2 & 4 & 1 & 3 \\
% fs & 588 & 157 & 1 & 119/86 & 50 & 0 & 0 & 2 & 3 & 4 & 4 & 11& 2 & 9 & 0 & 0 & 0 \\
% dj & 528 & 168 & 9 & 94/125 \\
% mo & 530 & 49 & 1 & 46/26%\\
% %data.xml & & \\
% % cc & 1776 & 448 & 4 & N/A
% %\\
%\end{tabular}
% \caption{\emph{The number of type annotations generated for each program}:
% Lib = Abbreviated library names in the order we introduce them on page \pageref{infer:chap:evaluation},
% LOC = Number of lines of code we generate types for,
% GT = Total number of lines of generated types after running our tool,
% LA = The number of local annotations generated by our tools.
% \emph{Number of manual changes needed to type check, and why they were needed}:
% MD = Lines added/removed diff from git comparing initial generated types to
% the manual amendments needed to
% type check with Typed Clojure (unless it was too difficult to port),
% C = Casts,
% I = Instantiation,
% P = Polymorphic annotation,
% L = Local annotation,
% S = Work around type system Shortcoming,
% O = Overprecise argument type,
% U = Uncalled function due to bad test coverage,
% N = Add No-check annotation to skip checking function,
% V = Add Variable arity argument type,
% R = Overprecise return type,
% K = Add Keyword argument types,
% F = Added filter annotation,
% H = Erase/upcast HVec annotation.
% }
%\end{figure*}
\Dsection{Experiment 2: Changes needed to type check}
\label{infer:sec:experiment2}
% TODO examples for all kinds of things
% TODO bucket how many changes are needed for each kind of thing
% - eg. varargs, polymorphism
% TODO how many lines of code were skipped
We used our workflow to port the following open source Clojure programs to Typed Clojure.
\paragraph{startrek-clojure}
A reimplementation of a Star Trek text adventure game,
created as a way to learn Clojure.
\paragraph{math.combinatorics}
The core library for common combinatorial functions
on collections,
with implementations based on Knuth's Art of Computer
Programming, Volume 4.
\paragraph{fs}
A Clojure wrapper library over common file-system operations.
\paragraph{data.json}
A library for working with JSON.
%\paragraph{data.xml} A library for manipulating and outputting XML in Clojure.
\paragraph{mini.occ}
A model of occurrence typing by an author of the
current paper. It utilizes three mutually recursive
ad-hoc structures to represent expressions, types,
and propositions.
In this experiment, we first generated types with our algorithm
by running the tests, then amended the program so that it
type checks.
\figref{infer:fig:gentype} summarizes our results.
After the lines of code we generate types for, the next two columns show how many lines of
types were generated and the lines manually changed, respectively.
The latter is a git line diff between commits of the initial
generated types and the final manually amended annotations.
While an objectively fair measurement,
it is not a good indication of the effort needed to port annotations
(a 1 character changes on a line is represented by 1 line addition and 1 line deletion)
The rest of the table enumerates the different kinds of changes needed
and their frequency.
\begin{figure*}
\begin{tabular}{|r||c|c|c||c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
Library & \rotatebox{270}{Lines of code}
& \rotatebox{270}{Lines of Generated Global/Local Types}
& \rotatebox{270}{Lines manually added/removed}
& \rotatebox{270}{Casts/Instantiations}
& \rotatebox{270}{Polymorphic annotation}
& \rotatebox{270}{Local annotation}
& \rotatebox{270}{Type System Workaround/no-check}
& \rotatebox{270}{Overprecise argument/return type}
& \rotatebox{270}{Uncalled function (bad test coverage)}
& \rotatebox{270}{Variable-arity/keyword arg type}
& \rotatebox{270}{Add occurrence typing annotation}
& \rotatebox{270}{Erase or upcast HVec annotation}
& \rotatebox{270}{Add missing case in defalias}
\\
\hline
\hline
startrek & 166 & 133/3 & 70/41 & 5 / 0 & 0 & 2 & 13/1 & 1 /2 & 5 & 1 / 0 & 0 & 0 & 0\\
math.comb & 923 & 395/147 & 124/120 & 23 / 1 & 11& 19& 2 /9 & 5 /2 & 0 & 3 / 4 & 1 & 3 & 0\\
fs & 588 & 157/1 & 119/86 & 50 / 0 & 0 & 2 & 3 /11& 4 /9 & 4 & 2 / 0 & 0 & 0 & 0\\
data.json & 528 & 168/9 & 94/125 & 6 / 0 & 0 & 2 & 4 /5 & 11/7 & 5 & 0 / 20& 0 & 0 & 0\\
mini.occ & 530 & 49/1 & 46/26 & 7 / 0 & 0 & 2 & 5 /2 & 4 /2 & 6 & 0 / 0 & 0 & 1 & 5\\
% cc & 1776 & 448 & 4 & N/A
%\\
\end{tabular}
\caption{Lines of generated annotations, git line diff for total manual changes to type check the program,
and the kinds of manual changes.
}
\label{infer:fig:gentype}
\end{figure*}
\paragraph{Uncalled functions}
A function without tests receives a broad type annotation that
must be amended.
%
For example, the startrek-clojure game has several exit
conditions, one of which is running out of time.
Since the tests do not specifically call this function,
nor play the game long enough to invoke this condition,
no useful type is inferred.
\begin{cljlisting}
(ann game-over-out-of-time AnyFunction)
\end{cljlisting}
In this case, minimal effort is needed to amend this
type signature: the appropriate type alias
already exists:
\begin{cljlisting}
(defalias CurrentKlingonsCurrentSectorEnterpriseMap
(HMap :mandatory
{:current-klingons (Vec EnergySectorMap),
:current-sector (Vec Int), ...}
:optional {:lrs-history (Vec Str)}))
\end{cljlisting}
%\begin{cljlisting}
%(defalias CurrentKlingonsCurrentSectorEnterpriseMap
% (HMap :mandatory
% {:current-klingons (Vec EnergySectorMap),
% :current-sector (Vec Int),
% :enterprise EnergyIsDockedQuadrantMap,
% :quads (Vec BasesKlingonsQuadrantMap),
% :stardate CurrentEndStartMap,
% :starting-klingons Int}
% :optional {:lrs-history (Vec Str)}))
%\end{cljlisting}
So we amend the signature as
\begin{cljlisting}
(ann game-over-out-of-time
[(Atom1 CurrentKlingonsCurrentSectorEnterpriseMap)
-> Boolean])
\end{cljlisting}
\paragraph{Over-precision}
Function types are often too restrictive due to
insufficient unit tests.
There are several instances of this in math.combinatorics.
The \clj{all-different?} function
takes a collection and returns true only if the collection
contains distinct elements.
As evidenced in the generated type, the tests exercise
this functions with collections of integers, atoms,
keywords, and characters.
\begin{cljlisting}
(ann all-different?
[(Coll (U Int (Atom1 Int) ':a ':b Character))
-> Boolean])
\end{cljlisting}
In our experience, the union is very rarely a good candidate
for a Typed Clojure type signature, so a useful heuristic to improve
the generated types would be to upcast such unions to a more permissive
type, like \clj{Any}.
When we performed that case study, we did not yet add that heuristic
to our tool,
so in this case, we manually amend the signature as
\begin{cljlisting}
(ann all-different? [(Coll Any) -> Boolean])
\end{cljlisting}
Another example of overprecision is the generated type
of \clj{initial-perm-numbers} a helper function
taking a \emph{frequency map}---a hash map from values
to the number of times they occur---which is the shape
of the return value of the core \clj{frequencies}
function.
The generated type shows only a frequency map where
the values are integers are exercised.
%
\begin{cljlisting}
(ann initial-perm-numbers
[(Map Int Int) -> (Coll Int)])
\end{cljlisting}
%
A more appropriate type instead takes \clj{(Map Any Int)}.
%
%\begin{cljlisting}
%(ann initial-perm-numbers
% [(Map Any Int) -> (Coll Int)])
%\end{cljlisting}
%
In many examples of overprecision, while the generated
type might not be immediately useful to check programs,
they serve as valuable starting points and also provide
an interesting summary of test coverage.
\paragraph{Missing polymorphism}
We do not attempt to infer polymorphic function types,
so these amendments are expected. However, it is useful
to compare the optimal types with our generated ones.
For example, the \clj{remove-nth} function in \clj{math.combinatorics}
returns a functional delete operation on its argument.
Here we can see the tests only exercise this function with
collections of integers.
\begin{cljlisting}
(ann remove-nth [(Coll Int) Int -> (Vec Int)])
\end{cljlisting}
However, the overall shape of the function is intact,
and the manually amended type only requires a few
keystrokes.
\begin{cljlisting}
(ann remove-nth
(All [a] [(Coll a) Int -> (Vec a)]))
\end{cljlisting}
Similarly, \clj{iter-perm} could be polymorphic,
but its type is generated as
\begin{cljlisting}
(ann iter-perm [(Vec Int) -> (U nil (Vec Int))])
\end{cljlisting}
We decided this function actually works over any number,
and bounded polymorphism was more appropriate, encoding
the fact that the elements of the output collection
are from the input collection.
\begin{cljlisting}
(ann iter-perm
(All [a]
[(Vec (I a Num)) -> (U nil (Vec (I a Num)))]))
\end{cljlisting}
%
%\paragraph{Missing return}
%Sometimes a function never returns, because of infinite loops
%or exceptions.
\paragraph{Missing argument counts}
Often, variable argument functions are given very precise types.
Our algorithm does not apply any heuristics to approximate
variable arguments --- instead we emit types that reflect
only the arities that were called during the unit tests.
The \clj{math.combinatorics} experiment contains
a good example of this phemonenon in the type inferred
for the \clj{plus} helper function.
From the generated type, we can see the tests exercise this function with 2, 6,
and 7 arguments.
\begin{cljlisting}
(ann plus (IFn [Int Int Int Int Int Int Int -> Int]
[Int Int Int Int Int Int -> Int]
[Int Int -> Int]))
\end{cljlisting}
Instead, \clj{plus} is actually variadic and works over any number of arguments.
It is better annotated as the following, which is easy to guess based on
both the annotated type and manually viewing the function implementation.
\begin{cljlisting}
(ann plus [Int * -> Int])
\end{cljlisting}
A similar issue occurs with \clj{mult}.
\begin{cljlisting}
(ann mult [Int Int -> Int]) ;; generated
(ann mult [Int * -> Int]) ;; amended
\end{cljlisting}
A similar issue is inferring keyword arguments. Clojure implements
keyword arguments with normal variadic arguments. Notice
the generated type for \clj{lex-partitions-H},
which takes a fixed argument, followed by some optional integer keyword
arguments.
\begin{cljlisting}
(ann lex-partitions-H
(IFn [Int -> (Coll (Coll (Vec Int)))]
[Int ':min Int ':max Int
-> (Coll (Coll (Coll Int)))]))
\end{cljlisting}
While the arity of the generated type is too specific,
we can conceivably use the type to help us write a better one.
\begin{cljlisting}
(ann lex-partitions-H
[Int & :optional {:min Int :max Int}
-> (Coll (Coll (Coll Int)))])
\end{cljlisting}
\paragraph{Weaknesses in Typed Clojure}
We encountered several known weaknesses in Typed Clojure's type system
that we worked around.
%
The most invasive change needed was in startrek-clojure, which
strongly updated the global mutable configuration map on initial
play. We instead initialized the map with a dummy
value when it is first created.
\paragraph{Missing \clj{defalias} cases}
With insufficient test coverage, our tool can miss cases in a recursively defined
type.
In particular, mini.occ features three recursive types---for the representation
of types \clj{T}, propositions \clj{P}, and expressions \clj{E}.
For \clj{T}, three cases were missing, along with having to upcast the \clj{:params}
entry from the singleton vector \clj{'[NameTypeMap]}.
Two cases were missing from \clj{E}.
The manual changes are highlighted (\clj{P} required no changes with five cases).
\begin{minipage}[t]{0.54\linewidth}
\begin{cljlisting}
(defalias T
(U (*@\colorbox{pink}{'\{:T ':not, :type T\}}@*)
(*@\colorbox{pink}{'\{:T ':refine, :name t/Sym, :prop P\}}@*)
(*@\colorbox{pink}{'\{:T ':union, :types (t/Set T)\}}@*)
'{:T ':false}
'{:T ':fun,
:params (*@\colorbox{pink}{(t/Vec}@*) NameTypeMap(*@\colorbox{pink}{)}@*),
:return T}
'{:T ':intersection, :types (Set T)}
'{:T ':num}))
\end{cljlisting}
\end{minipage}
%
\begin{minipage}[t]{0.4\linewidth}
\begin{cljlisting}
(defalias E
(U (*@\colorbox{pink}{'\{:E ':add1\}}@*)
(*@\colorbox{pink}{'\{:E ':n?\}}@*)
'{:E ':app, :args (Vec E),
:fun E}
'{:E ':false}
'{:E ':if, :else E,
:test E, :then E}
'{:E ':lambda, :arg Sym,
:arg-type T, :body E}
'{:E ':var, :name Sym}))
\end{cljlisting}
\end{minipage}
%cljs.compiler uses many polymorphic idioms that Typed Clojure is
%poor at checking, so we deemed it too difficult to attempt to
%type check. In particular, there are many of usages of the
%core functions
%\clj{get-in} and \clj{update-in} (functions that deeply lookup
%and manipulate maps) which are not even assigned types
%in Typed Clojure.
%Many function definitions would need to be ignored by the type
%checker to work around this.
%Furthermore, many manual instantiations
%would be needed to check transducers and polymorphic functions
%passed to other polymorphic functions.
%\begin{verbatim}
% - get/get-in
% - apply + kw args
% - strong updates
%\end{verbatim}
%\paragraph{Possible errors in programs}
\Dsection{Experiment 3: Specs pass unit tests}
\label{infer:sec:experiment3}
Our final experiment uses our tool to
generate specs (\secref{infer:sec:spec-extension})
instead of types.
Specs are checked at runtime,
so to verify the utility of generated specs,
we enable spec checking while
rerunning the unit tests that were used
in the process of creating them.
\begin{figure*}
\begin{tabular}
{| l || l | l || l | l | l || l |}
Library & LOC & Lines of specs & Recursive & Instance & Het. Map & Passed Tests?\\
\hline
\hline
startrek & 166 & 25 & 0 & 10 & 0 & Yes\\
math.comb & 923 & 601 & 0 & 320 & 0 & Yes\\
fs & 588 & 543 & 0 & 215 & 0 & Yes\\
data.json & 528 & 401 & 0 & 174 & 0 & No (1/79 failed)\\ % pprinting related test
mini.occ & 530 & 131 & 3 & 25 & 15 & Yes\\
%data.xml & & \\
% cc & 1776 & 448 & 4 & N/A
%\\
\end{tabular}
\caption{Summary of the quantity and kinds of generated specs and whether they passed
unit tests when enabled.
The one failing test was related to pretty-printing JSON, and seems to be an artifact
of our testing environment, as it still fails with all specs removed.
}
\label{infer:fig:genspec}
\end{figure*}
At first this might seem like a trivial property, but it serves as
a valuable test of our inference algorithm.
The aggressive merging strategies to minimize aliases and
maximize recognizability, while unsound transformations,
are based on hypotheses about Clojure idioms and how
Clojure programs are constructed.
If, hypothetically, we generated singleton specs for numbers
like we do for keywords and did not eventually upcast
them to \clj{number?}, the specs might be too strict
to pass its unit tests.
Some function specs also perform generative testing based on
the argument and return types provided.
If we collapse a spec too much and include it in such
a spec, it might feed a function invalid input.
Thankfully, we avoid such pitfalls, and so
our generated specs pass their tests for the benchmarks
we tried.
\figref{infer:fig:genspec} shows
our preliminary results. All inferred specs pass the unit
tests when enforced, which tells us they are at least well formed.
We had some seemingly unrelated difficulty with a test in data.json which we explain
in the caption.
Since hundreds of invariants are checked---mostly ``instance'' checks that a value is of a particular class or interface---we can also be more confident
that the specs are useful.
%\Dsubsection{Experiment 3: Generating generative tests}
% We should generate the card playing specs in this guide:
% http://clojure.org/guides/spec
% # How evaluate
% ## qualitative
% Does it make sense??
%
% 1. Don't run, gen type, manual inspection
% - done on something small but real
% - star trek game?
%
% - Try different eval methods on different programs
% - try different projects on different methods
%
% 2. Generate types, try type checking programs
% - record what changes needed to get it to
% type check
% - (on a different program than 1.)
%
% 3. Generate spec, insert the spec, run the test
% with the spec on, also generate tests
% - does spec ignore the input??
% or just generate tests
% - best situation:
% - spec all passes
% - then types check with minimal changes
% - Q: can we use spec's tests to improve
% types, iteratively?
% (could throw away exceptions, throw
% away bad input etc., different options
% here)
% (optional)
% 4. Generate types, use gradual typing