-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathscript_expression.cpp
3261 lines (3106 loc) · 208 KB
/
script_expression.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*
AutoHotkey
Copyright 2003-2008 Chris Mallett ([email protected])
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
*/
//////////////////////////////////////////////////////////////////////////////////////
// v1.0.40.02: This is now a separate file to allow its compiler optimization settings
// to be set independently of those of the other modules. In one benchmark, this
// improved performance of expressions and function calls by 9% (that is, when the
// other modules are set to "minmize size" such as for the AutoHotkeySC.bin file).
// This gain in performance is at the cost of a 1.5 KB increase in the size of the
// compressed code, which seems well worth it given how often expressions and
// function-calls are used (such as in loops).
//
// ExpandArgs() and related functions were also put into this file because that
// further improves performance across the board -- even for AutoHotkey.exe despite
// the fact that the only thing that changed for it was the module move, not the
// compiler settings. Apparently, the butterfly effect can cause even minor
// modifications to impact the overall performance of the generated code by as much as
// 7%. However, this might have more to do with cache hits and misses in the CPU than
// with the nature of the code produced by the compiler.
// UPDATE 10/18/2006: There's not much difference anymore -- in fact, using min size
// for everything makes compiled scripts slightly faster in basic benchmarks, probably
// due to the recent addition of the linker optimization that physically orders
// functions in a better order inside the EXE. Therefore, script_expression.cpp no
// longer has a separate "favor fast code" option.
//////////////////////////////////////////////////////////////////////////////////////
#include "stdafx.h" // pre-compiled headers
#include "script.h"
#include "globaldata.h" // for a lot of things
#include "qmath.h" // For ExpandExpression()
// __forceinline: Decided against it for this function because alhough it's only called by one caller,
// testing shows that it wastes stack space (room for its automatic variables would be unconditionally
// reserved in the stack of its caller). Also, the performance benefit of inlining this is too slight.
// Here's a simple way to verify wasted stack space in a caller that calls an inlined function:
// DWORD stack
// _asm mov stack, esp
// MsgBox(stack);
char *Line::ExpandExpression(int aArgIndex, ResultType &aResult, char *&aTarget, char *&aDerefBuf
, size_t &aDerefBufSize, char *aArgDeref[], size_t aExtraSize)
// Caller should ignore aResult unless this function returns NULL.
// Returns a pointer to this expression's result, which can be one of the following:
// 1) NULL, in which case aResult will be either FAIL or EARLY_EXIT to indicate the means by which the current
// quasi-thread was terminated as a result of a function call.
// 2) The constant empty string (""), in which case we do not alter aTarget for our caller.
// 3) Some persistent location not in aDerefBuf, namely the mContents of a variable or a literal string/number,
// such as a function-call that returns "abc", 123, or a variable.
// 4) At position aTarget inside aDerefBuf (note that aDerefBuf might have been reallocated by us).
// aTarget is left unchnaged except in case #4, in which case aTarget has been adjusted to the position after our
// result-string's terminator. In addition, in case #4, aDerefBuf, aDerefBufSize, and aArgDeref[] have been adjusted
// for our caller if aDerefBuf was too small and needed to be enlarged.
//
// Thanks to Joost Mulders for providing the expression evaluation code upon which this function is based.
{
ExprTokenType right;
char *target = aTarget; // "target" is used to track our usage (current position) within the aTarget buffer.
// The following must be defined early so that mem_count is initialized and the array is guaranteed to be
// "in scope" in case of early "goto" (goto substantially boosts performance and reduces code size here).
#define MAX_EXPR_MEM_ITEMS 200 // v1.0.47.01: Raised from 100 because a line consisting entirely of concat operators can exceed it. However, there's probably not much point to going much above MAX_TOKENS/2 because then it would reach the MAX_TOKENS limit first.
char *mem[MAX_EXPR_MEM_ITEMS]; // No init necessary. In most cases, it will never be used.
int mem_count = 0; // The actual number of items in use in the above array.
char *result_to_return = ""; // By contrast, NULL is used to tell the caller to abort the current thread. That isn't done for normal syntax errors, just critical conditions such as out-of-memory.
Var *output_var = (mActionType == ACT_ASSIGNEXPR) ? OUTPUT_VAR : NULL; // Resolve early because it's similar in usage/scope to the above. Plus MUST be resolved prior to calling any script-functions since they could change the values in sArgVar[].
// Having a precedence array is required at least for SYM_POWER (since the order of evaluation
// of something like 2**1**2 does matter). It also helps performance by avoiding unnecessary pushing
// and popping of operators to the stack. This array must be kept in sync with "enum SymbolType".
// Also, dimensioning explicitly by SYM_COUNT helps enforce that at compile-time:
static UCHAR sPrecedence[SYM_COUNT] = // Performance: UCHAR vs. INT benches a little faster, perhaps due to the slight reduction in code size it causes.
{
0,0,0,0,0,0,0 // SYM_STRING, SYM_INTEGER, SYM_FLOAT, SYM_VAR, SYM_OPERAND, SYM_DYNAMIC, SYM_BEGIN (SYM_BEGIN must be lowest precedence).
, 82, 82 // SYM_POST_INCREMENT, SYM_POST_DECREMENT: Highest precedence operator so that it will work even though it comes *after* a variable name (unlike other unaries, which come before).
, 4, 4 // SYM_CPAREN, SYM_OPAREN (to simplify the code, parentheses must be lower than all operators in precedence).
, 6 // SYM_COMMA -- Must be just above SYM_OPAREN so it doesn't pop OPARENs off the stack.
, 7,7,7,7,7,7,7,7,7,7,7,7 // SYM_ASSIGN_*. THESE HAVE AN ODD NUMBER to indicate right-to-left evaluation order, which is necessary for cascading assignments such as x:=y:=1 to work.
// , 8 // THIS VALUE MUST BE LEFT UNUSED so that the one above can be promoted to it by the infix-to-postfix routine.
, 11, 11 // SYM_IFF_ELSE, SYM_IFF_THEN (ternary conditional). HAS AN ODD NUMBER to indicate right-to-left evaluation order, which is necessary for ternaries to perform traditionally when nested in each other without parentheses.
// , 12 // THIS VALUE MUST BE LEFT UNUSED so that the one above can be promoted to it by the infix-to-postfix routine.
, 16 // SYM_OR
, 20 // SYM_AND
, 25 // SYM_LOWNOT (the word "NOT": the low precedence version of logical-not). HAS AN ODD NUMBER to indicate right-to-left evaluation order so that things like "not not var" are supports (which can be used to convert a variable into a pure 1/0 boolean value).
// , 26 // THIS VALUE MUST BE LEFT UNUSED so that the one above can be promoted to it by the infix-to-postfix routine.
, 30, 30, 30 // SYM_EQUAL, SYM_EQUALCASE, SYM_NOTEQUAL (lower prec. than the below so that "x < 5 = var" means "result of comparison is the boolean value in var".
, 34, 34, 34, 34 // SYM_GT, SYM_LT, SYM_GTOE, SYM_LTOE
, 38 // SYM_CONCAT
, 42 // SYM_BITOR -- Seems more intuitive to have these three higher in prec. than the above, unlike C and Perl, but like Python.
, 46 // SYM_BITXOR
, 50 // SYM_BITAND
, 54, 54 // SYM_BITSHIFTLEFT, SYM_BITSHIFTRIGHT
, 58, 58 // SYM_ADD, SYM_SUBTRACT
, 62, 62, 62 // SYM_MULTIPLY, SYM_DIVIDE, SYM_FLOORDIVIDE
, 67,67,67,67,67 // SYM_NEGATIVE (unary minus), SYM_HIGHNOT (the high precedence "!" operator), SYM_BITNOT, SYM_ADDRESS, SYM_DEREF
// NOTE: THE ABOVE MUST BE AN ODD NUMBER to indicate right-to-left evaluation order, which was added in v1.0.46 to support consecutive unary operators such as !*var !!var (!!var can be used to convert a value into a pure 1/0 boolean).
// , 68 // THIS VALUE MUST BE LEFT UNUSED so that the one above can be promoted to it by the infix-to-postfix routine.
, 72 // SYM_POWER (see note below). Associativity kept as left-to-right for backward compatibility (e.g. 2**2**3 is 4**3=64 not 2**8=256).
, 77, 77 // SYM_PRE_INCREMENT, SYM_PRE_DECREMENT (higher precedence than SYM_POWER because it doesn't make sense to evaluate power first because that would cause ++/-- to fail due to operating on a non-lvalue.
// , 78 // THIS VALUE MUST BE LEFT UNUSED so that the one above can be promoted to it by the infix-to-postfix routine.
// , 82, 82 // RESERVED FOR SYM_POST_INCREMENT, SYM_POST_DECREMENT (which are listed higher above for the performance of YIELDS_AN_OPERAND().
, 86 // SYM_FUNC -- Must be of highest precedence so that it stays tightly bound together as though it's a single operand for use by other operators.
};
// Most programming languages give exponentiation a higher precedence than unary minus and logical-not.
// For example, -2**2 is evaluated as -(2**2), not (-2)**2 (the latter is unsupported by qmathPow anyway).
// However, this rule requires a small workaround in the postfix-builder to allow 2**-2 to be
// evaluated as 2**(-2) rather than being seen as an error. v1.0.45: A similar thing is required
// to allow the following to work: 2**!1, 2**not 0, 2**~0xFFFFFFFE, 2**&x.
// On a related note, the right-to-left tradition of something like 2**3**4 is not implemented (maybe in v2).
// Instead, the expression is evagotoluated from left-to-right (like other operators) to simplify the code.
#define MAX_TOKENS 512 // Max number of operators/operands. Seems enough to handle anything realistic, while conserving call-stack space.
ExprTokenType infix[MAX_TOKENS], *postfix[MAX_TOKENS], *stack[MAX_TOKENS + 1]; // +1 for SYM_BEGIN on the stack.
int infix_count = 0, postfix_count = 0, stack_count = 0;
// Above dimensions the stack to be as large as the infix/postfix arrays to cover worst-case
// scenarios and avoid having to check for overflow. For the infix-to-postfix conversion, the
// stack must be large enough to hold a malformed expression consisting entirely of operators
// (though other checks might prevent this). It must also be large enough for use by the final
// expression evaluation phase, the worst case of which is unknown but certainly not larger
// than MAX_TOKENS.
int i, j, s, actual_param_count, delta;
SymbolType right_is_number, left_is_number, result_symbol;
double right_double, left_double;
__int64 right_int64, left_int64;
char *right_string, *left_string;
char *right_contents, *left_contents;
size_t right_length, left_length;
char left_buf[MAX_FORMATTED_NUMBER_LENGTH + 1]; // BIF_OnMessage and SYM_DYNAMIC rely on this one being large enough to hold MAX_VAR_NAME_LENGTH.
char right_buf[MAX_FORMATTED_NUMBER_LENGTH + 1]; // Only needed for holding numbers
char *result; // "result" is used for return values and also the final result.
VarSizeType result_length;
size_t result_size, alloca_usage = 0; // v1.0.45: Track amount of alloca mem to avoid stress on stack from extreme expressions (mostly theoretical).
BOOL done, done_and_have_an_output_var, make_result_persistent, left_branch_is_true
, left_was_negative, is_pre_op; // BOOL vs. bool benchmarks slightly faster, and is slightly smaller in code size (or maybe it's cp1's int vs. char that shrunk it).
ExprTokenType *circuit_token;
Var *sym_assign_var, *temp_var;
VarBkp *var_backup = NULL; // If needed, it will hold an array of VarBkp objects. v1.0.40.07: Initialized to NULL to facilitate an approach that's more maintainable.
int var_backup_count; // The number of items in the above array (when it's non-NULL).
SymbolType stack_symbol, infix_symbol, sym_prev;
ExprTokenType *fwd_infix, *this_infix = infix;
int functions_on_stack = 0;
///////////////////////////////////////////////////////////////////////////////////////////////
// TOKENIZE THE INFIX EXPRESSION INTO AN INFIX ARRAY: Avoids the performance overhead of having
// to re-detect whether each symbol is an operand vs. operator at multiple stages.
///////////////////////////////////////////////////////////////////////////////////////////////
// In v1.0.46.01, this section was simplified to avoid transcribing the entire expression into the
// deref buffer. In addition to improving performance and reducing code size, this also solves
// obscure timing bugs caused by functions that have side-effects, especially in comma-separated
// sub-expressions. In these cases, one part of an expression could change a built-in variable
// (indirectly or in the case of Clipboard, directly), an environment variable, or a double-def.
// For example the dynamic components of a double-deref can be changed by other parts of an
// expression, even one without commas. Another example is: fn(clipboard, func_that_changes_clip()).
// So now, built-in & environment variables and double-derefs are resolve when they're actually
// encountered during the final/evaluation phase.
// Another benefit to deferring the resolution of these types of items is that they become eligible
// for short-circuiting, which further helps performance (they're quite similar to built-in
// functions in this respect).
char *op_end, *cp;
DerefType *deref, *this_deref, *deref_start, *deref_alloca;
int derefs_in_this_double;
int cp1; // int vs. char benchmarks slightly faster, and is slightly smaller in code size.
for (cp = mArg[aArgIndex].text, deref = mArg[aArgIndex].deref // Start at the begining of this arg's text and look for the next deref.
;; ++deref, ++infix_count) // FOR EACH DEREF IN AN ARG:
{
this_deref = deref && deref->marker ? deref : NULL; // A deref with a NULL marker terminates the list (i.e. the final deref isn't a deref, merely a terminator of sorts.
// BEFORE PROCESSING "this_deref" ITSELF, MUST FIRST PROCESS ANY LITERAL/RAW TEXT THAT LIES TO ITS LEFT.
if (this_deref && cp < this_deref->marker // There's literal/raw text to the left of the next deref.
|| !this_deref && *cp) // ...or there's no next deref, but there's some literal raw text remaining to be processed.
{
for (;; ++infix_count) // FOR EACH TOKEN INSIDE THIS RAW/LITERAL TEXT SECTION.
{
// Because neither the postfix array nor the stack can ever wind up with more tokens than were
// contained in the original infix array, only the infix array need be checked for overflow:
if (infix_count > MAX_TOKENS - 1) // No room for this operator or operand to be added.
goto abnormal_end;
// Only spaces and tabs are considered whitespace, leaving newlines and other whitespace characters
// for possible future use:
cp = omit_leading_whitespace(cp);
if (!*cp // Very end of expression...
|| this_deref && cp >= this_deref->marker) // ...or no more literal/raw text left to process at the left side of this_deref.
break; // Break out of inner loop so that bottom of the outer loop will process this_deref itself.
ExprTokenType &this_infix_item = infix[infix_count]; // Might help reduce code size since it's referenced many places below.
// CHECK IF THIS CHARACTER IS AN OPERATOR.
cp1 = cp[1]; // Improves performance by nearly 5% and appreciably reduces code size (at the expense of being less maintainable).
switch (*cp)
{
// The most common cases are kept up top to enhance performance if switch() is implemented as if-else ladder.
case '+':
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_ADD;
}
else
{
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol))
{
if (cp1 == '+')
{
// For consistency, assume that since the previous item is an operand (even if it's
// ')'), this is a post-op that applies to that operand. For example, the following
// are all treated the same for consistency (implicit concatention where the '.'
// is omitted is rare anyway).
// x++ y
// x ++ y
// x ++y
// The following implicit concat is deliberately unsupported:
// "string" ++x
// The ++ above is seen as applying to the string because it doesn't seem worth
// the complexity to distinguish between expressions that can accept a post-op
// and those that can't (operands other than variables can have a post-op;
// e.g. (x:=y)++).
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_POST_INCREMENT;
}
else
this_infix_item.symbol = SYM_ADD;
}
else if (cp1 == '+') // Pre-increment.
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_PRE_INCREMENT;
}
else // Remove unary pluses from consideration since they do not change the calculation.
--infix_count; // Counteract the loop's increment.
}
break;
case '-':
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_SUBTRACT;
break;
}
// Otherwise (since above didn't "break"):
// Must allow consecutive unary minuses because otherwise, the following example
// would not work correctly when y contains a negative value: var := 3 * -y
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol))
{
if (cp1 == '-')
{
// See comments at SYM_POST_INCREMENT about this section.
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_POST_DECREMENT;
}
else
this_infix_item.symbol = SYM_SUBTRACT;
}
else if (cp1 == '-') // Pre-decrement.
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_PRE_DECREMENT;
}
else // Unary minus.
{
// Set default for cases where the processing below this line doesn't determine
// it's a negative numeric literal:
this_infix_item.symbol = SYM_NEGATIVE;
// v1.0.40.06: The smallest signed 64-bit number (-0x8000000000000000) wasn't properly
// supported in previous versions because its unary minus was being seen as an operator,
// and thus the raw number was being passed as a positive to _atoi64() or _strtoi64(),
// neither of which would recognize it as a valid value. To correct this, a unary
// minus followed by a raw numeric literal is now treated as a single literal number
// rather than unary minus operator followed by a positive number.
//
// To be a valid "literal negative number", the character immediately following
// the unary minus must not be:
// 1) Whitespace (atoi() and such don't support it, nor is it at all conventional).
// 2) An open-parenthesis such as the one in -(x).
// 3) Another unary minus or operator such as --x (which is the pre-decrement operator).
// To cover the above and possibly other unforeseen things, insist that the first
// character be a digit (even a hex literal must start with 0).
if ((cp1 >= '0' && cp1 <= '9') || cp1 == '.') // v1.0.46.01: Recognize dot too, to support numbers like -.5.
{
for (op_end = cp + 2; !strchr(EXPR_OPERAND_TERMINATORS, *op_end); ++op_end); // Find the end of this number (can be '\0').
// 1.0.46.11: Due to obscurity, no changes have been made here to support scientific
// notation followed by the power operator; e.g. -1.0e+1**5.
if (!this_deref || op_end < this_deref->marker) // Detect numeric double derefs such as one created via "12%i% = value".
{
// Because the power operator takes precedence over unary minus, don't collapse
// unary minus into a literal numeric literal if the number is immediately
// followed by the power operator. This is correct behavior even for
// -0x8000000000000000 because -0x8000000000000000**2 would in fact be undefined
// because ** is higher precedence than unary minus and +0x8000000000000000 is
// beyond the signed 64-bit range. SEE ALSO the comments higher above.
// Use a temp variable because numeric_literal requires that op_end be set properly:
char *pow_temp = omit_leading_whitespace(op_end);
if (!(pow_temp[0] == '*' && pow_temp[1] == '*'))
goto numeric_literal; // Goto is used for performance and also as a patch to minimize the chance of breaking other things via redesign.
//else it's followed by pow. Since pow is higher precedence than unary minus,
// leave this unary minus as an operator so that it will take effect after the pow.
}
//else possible double deref, so leave this unary minus as an operator.
}
} // Unary minus.
break;
case ',':
this_infix_item.symbol = SYM_COMMA; // Used to separate sub-statements and function parameters.
break;
case '/':
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_DIVIDE;
}
else if (cp1 == '/')
{
if (cp[2] == '=')
{
cp += 2; // An additional increment to have loop skip over the operator's 2nd & 3rd symbols.
this_infix_item.symbol = SYM_ASSIGN_FLOORDIVIDE;
}
else
{
++cp; // An additional increment to have loop skip over the second '/' too.
this_infix_item.symbol = SYM_FLOORDIVIDE;
}
}
else
this_infix_item.symbol = SYM_DIVIDE;
break;
case '*':
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_MULTIPLY;
}
else
{
if (cp1 == '*') // Python, Perl, and other languages also use ** for power.
{
++cp; // An additional increment to have loop skip over the second '*' too.
this_infix_item.symbol = SYM_POWER;
}
else
{
// Differentiate between unary dereference (*) and the "multiply" operator:
// See '-' above for more details:
this_infix_item.symbol = (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol))
? SYM_MULTIPLY : SYM_DEREF;
}
}
break;
case '!':
if (cp1 == '=') // i.e. != is synonymous with <>, which is also already supported by legacy.
{
++cp; // An additional increment to have loop skip over the '=' too.
this_infix_item.symbol = SYM_NOTEQUAL;
}
else
// If what lies to its left is a CPARAN or OPERAND, SYM_CONCAT is not auto-inserted because:
// 1) Allows ! and ~ to potentially be overloaded to become binary and unary operators in the future.
// 2) Keeps the behavior consistent with unary minus, which could never auto-concat since it would
// always be seen as the binary subtract operator in such cases.
// 3) Simplifies the code.
this_infix_item.symbol = SYM_HIGHNOT; // High-precedence counterpart of the word "not".
break;
case '(':
// The below should not hurt any future type-casting feature because the type-cast can be checked
// for prior to checking the below. For example, if what immediately follows the open-paren is
// the string "int)", this symbol is not open-paren at all but instead the unary type-cast-to-int
// operator.
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol)) // If it's an operand, at this stage it can only be SYM_OPERAND or SYM_STRING.
{
if (infix_count > MAX_TOKENS - 2) // -2 to ensure room for this operator and the operand further below.
goto abnormal_end;
this_infix_item.symbol = SYM_CONCAT;
++infix_count;
}
infix[infix_count].symbol = SYM_OPAREN; // MUST NOT REFER TO this_infix_item IN CASE ABOVE DID ++infix_count.
break;
case ')':
this_infix_item.symbol = SYM_CPAREN;
break;
case '=':
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the other '=' too.
this_infix_item.symbol = SYM_EQUALCASE;
}
else
this_infix_item.symbol = SYM_EQUAL;
break;
case '>':
switch (cp1)
{
case '=':
++cp; // An additional increment to have loop skip over the '=' too.
this_infix_item.symbol = SYM_GTOE;
break;
case '>':
if (cp[2] == '=')
{
cp += 2; // An additional increment to have loop skip over the operator's 2nd & 3rd symbols.
this_infix_item.symbol = SYM_ASSIGN_BITSHIFTRIGHT;
}
else
{
++cp; // An additional increment to have loop skip over the second '>' too.
this_infix_item.symbol = SYM_BITSHIFTRIGHT;
}
break;
default:
this_infix_item.symbol = SYM_GT;
}
break;
case '<':
switch (cp1)
{
case '=':
++cp; // An additional increment to have loop skip over the '=' too.
this_infix_item.symbol = SYM_LTOE;
break;
case '>':
++cp; // An additional increment to have loop skip over the '>' too.
this_infix_item.symbol = SYM_NOTEQUAL;
break;
case '<':
if (cp[2] == '=')
{
cp += 2; // An additional increment to have loop skip over the operator's 2nd & 3rd symbols.
this_infix_item.symbol = SYM_ASSIGN_BITSHIFTLEFT;
}
else
{
++cp; // An additional increment to have loop skip over the second '<' too.
this_infix_item.symbol = SYM_BITSHIFTLEFT;
}
break;
default:
this_infix_item.symbol = SYM_LT;
}
break;
case '&':
if (cp1 == '&')
{
++cp; // An additional increment to have loop skip over the second '&' too.
this_infix_item.symbol = SYM_AND;
}
else if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_BITAND;
}
else
{
// Differentiate between unary "take the address of" and the "bitwise and" operator:
// See '-' above for more details:
this_infix_item.symbol = (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol))
? SYM_BITAND : SYM_ADDRESS;
}
break;
case '|':
if (cp1 == '|')
{
++cp; // An additional increment to have loop skip over the second '|' too.
this_infix_item.symbol = SYM_OR;
}
else if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_BITOR;
}
else
this_infix_item.symbol = SYM_BITOR;
break;
case '^':
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_BITXOR;
}
else
this_infix_item.symbol = SYM_BITXOR;
break;
case '~':
// If what lies to its left is a CPARAN or OPERAND, SYM_CONCAT is not auto-inserted because:
// 1) Allows ! and ~ to potentially be overloaded to become binary and unary operators in the future.
// 2) Keeps the behavior consistent with unary minus, which could never auto-concat since it would
// always be seen as the binary subtract operator in such cases.
// 3) Simplifies the code.
this_infix_item.symbol = SYM_BITNOT;
break;
case '?':
this_infix_item.symbol = SYM_IFF_THEN;
break;
case ':':
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the second '|' too.
this_infix_item.symbol = SYM_ASSIGN;
}
else
this_infix_item.symbol = SYM_IFF_ELSE;
break;
case '"': // QUOTED/LITERAL STRING.
// Note that single and double-derefs are impossible inside string-literals
// because the load-time deref parser would never detect anything inside
// of quotes -- even non-escaped percent signs -- as derefs.
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol)) // If it's an operand, at this stage it can only be SYM_OPERAND or SYM_STRING.
{
if (infix_count > MAX_TOKENS - 2) // -2 to ensure room for this operator and the operand further below.
goto abnormal_end;
this_infix_item.symbol = SYM_CONCAT;
++infix_count;
}
// MUST NOT REFER TO this_infix_item IN CASE ABOVE DID ++infix_count:
infix[infix_count].symbol = SYM_STRING; // Marked explicitly as string vs. SYM_OPERAND to prevent it from being seen as a number, e.g. if (var == "12.0") would be false if var contains "12" with no trailing ".0".
infix[infix_count].marker = target; // Point it to its position in the buffer (populated below).
// The following section is nearly identical to one in DefineFunc().
// Find the end of this string literal, noting that a pair of double quotes is
// a literal double quote inside the string:
for (++cp;;) // Omit the starting-quote from consideration, and from the resulting/built string.
{
if (!*cp) // No matching end-quote. Probably impossible due to load-time validation.
goto abnormal_end;
if (*cp == '"') // And if it's not followed immediately by another, this is the end of it.
{
++cp;
if (*cp != '"') // String terminator or some non-quote character.
break; // The previous char is the ending quote.
//else a pair of quotes, which resolves to a single literal quote. So fall through
// to the below, which will copy of quote character to the buffer. Then this pair
// is skipped over and the loop continues until the real end-quote is found.
}
//else some character other than '\0' or '"'.
*target++ = *cp++;
}
*target++ = '\0'; // Terminate it in the buffer.
continue; // Continue vs. break to avoid the ++cp at the bottom. Above has already set cp to be the character after this literal string's close-quote.
default: // NUMERIC-LITERAL, DOUBLE-DEREF, RELATIONAL OPERATOR SUCH AS "NOT", OR UNRECOGNIZED SYMBOL.
if (*cp == '.') // This one must be done here rather than as a "case". See comment below.
{
if (cp1 == '=')
{
++cp; // An additional increment to have loop skip over the operator's second symbol.
this_infix_item.symbol = SYM_ASSIGN_CONCAT;
break;
}
if (IS_SPACE_OR_TAB(cp1))
{
this_infix_item.symbol = SYM_CONCAT;
break;
}
//else this is a '.' that isn't followed by a space, tab, or '='. So it's probably
// a number without a leading zero like .2, so continue on below to process it.
}
// Find the end of this operand or keyword, even if that end extended into the next deref.
// StrChrAny() is not used because if *op_end is '\0', the strchr() below will find it too:
for (op_end = cp + 1; !strchr(EXPR_OPERAND_TERMINATORS, *op_end); ++op_end);
// Now op_end marks the end of this operand or keyword. That end might be the zero terminator
// or the next operator in the expression, or just a whitespace.
if (this_deref && op_end >= this_deref->marker)
goto double_deref; // This also serves to break out of the inner for(), equivalent to a break.
// Otherwise, this operand is a normal raw numeric-literal or a word-operator (and/or/not).
// The section below is very similar to the one used at load-time to recognize and/or/not,
// so it should be maintained with that section. UPDATE for v1.0.45: The load-time parser
// now resolves "OR" to || and "AND" to && to improve runtime performance and reduce code size here.
// However, "NOT" but still be parsed here at runtime because it's not quite the same as the "!"
// operator (different precedence), and it seemed too much trouble to invent some special
// operator symbol for load-time to insert as a placeholder/substitute (especially since that
// symbol would appear in ListLines).
if (op_end-cp == 3
&& (cp[0] == 'n' || cp[0] == 'N')
&& ( cp1 == 'o' || cp1 == 'O')
&& (cp[2] == 't' || cp[2] == 'T')) // "NOT" was found.
{
this_infix_item.symbol = SYM_LOWNOT;
cp = op_end; // Have the loop process whatever lies at op_end and beyond.
continue; // Continue vs. break to avoid the ++cp at the bottom (though it might not matter in this case).
}
numeric_literal:
// Since above didn't "continue", this item is probably a raw numeric literal (either SYM_FLOAT
// or SYM_INTEGER, to be differentiated later) because just about every other possibility has
// been ruled out above. For example, unrecognized symbols should be impossible at this stage
// because load-time validation would have caught them. And any kind of unquoted alphanumeric
// characters (other than "NOT", which was detected above) wouldn't have reached this point
// because load-time pre-parsing would have marked it as a deref/var, not raw/literal text.
if ( toupper(op_end[-1]) == 'E' // v1.0.46.11: It looks like scientific notation...
&& !(cp[0] == '0' && toupper(cp[1]) == 'X') // ...and it's not a hex number (this check avoids falsely detecting hex numbers that end in 'E' as exponents). This line fixed in v1.0.46.12.
&& !(cp[0] == '-' && cp[1] == '0' && toupper(cp[2]) == 'X') // ...and it's not a negative hex number (this check avoids falsely detecting hex numbers that end in 'E' as exponents). This line added as a fix in v1.0.47.03.
)
{
// Since op_end[-1] is the 'E' or an exponent, the only valid things for op_end[0] to be
// are + or - (it can't be a digit because the loop above would never have stopped op_end
// at a digit). If it isn't + or -, it's some kind of syntax error, so doing the following
// seems harmless in any case:
do // Skip over the sign and its exponent; e.g. the "+1" in "1.0e+1". There must be a sign in this particular sci-notation number or we would never have arrived here.
++op_end;
while (*op_end >= '0' && *op_end <= '9'); // Avoid isdigit() because it sometimes causes a debug assertion failure at: (unsigned)(c + 1) <= 256 (probably only in debug mode), and maybe only when bad data got in it due to some other bug.
}
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol)) // If it's an operand, at this stage it can only be SYM_OPERAND or SYM_STRING.
{
if (infix_count > MAX_TOKENS - 2) // -2 to ensure room for this operator and the operand further below.
goto abnormal_end;
this_infix_item.symbol = SYM_CONCAT;
++infix_count;
}
// MUST NOT REFER TO this_infix_item IN CASE ABOVE DID ++infix_count:
infix[infix_count].symbol = SYM_OPERAND;
infix[infix_count].marker = target; // Point it to its position in the buffer (populated below).
memcpy(target, cp, op_end - cp);
target += op_end - cp;
*target++ = '\0'; // Terminate it in the buffer.
cp = op_end; // Have the loop process whatever lies at op_end and beyond.
continue; // "Continue" to avoid the ++cp at the bottom.
} // switch() for type of symbol/operand.
++cp; // i.e. increment only if a "continue" wasn't encountered somewhere above. Although maintainability is reduced to do this here, it avoids dozens of ++cp in other places.
} // for each token in this section of raw/literal text.
} // End of processing of raw/literal text (such as operators) that lie to the left of this_deref.
if (!this_deref) // All done because the above just processed all the raw/literal text (if any) that
break; // lay to the right of the last deref.
// THE ABOVE HAS NOW PROCESSED ANY/ALL RAW/LITERAL TEXT THAT LIES TO THE LEFT OF this_deref.
// SO NOW PROCESS THIS_DEREF ITSELF.
if (infix_count > MAX_TOKENS - 1) // No room for the deref item below to be added.
goto abnormal_end;
//DerefType &this_deref_ref = *this_deref; // Boosts performance slightly.
if (this_deref->is_function) // Above has ensured that at this stage, this_deref!=NULL.
{
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol)) // If it's an operand, at this stage it can only be SYM_OPERAND or SYM_STRING.
{
if (infix_count > MAX_TOKENS - 2) // -2 to ensure room for this operator and the operand further below.
goto abnormal_end;
infix[infix_count++].symbol = SYM_CONCAT;
}
infix[infix_count].symbol = SYM_FUNC;
infix[infix_count].deref = deref;
}
else // this_deref is a variable.
{
if (*this_deref->marker == g_DerefChar) // A double-deref because normal derefs don't start with '%'.
{
// Find the end of this operand, even if that end extended into the next deref.
// StrChrAny() is not used because if *op_end is '\0', the strchr() below will find it too:
for (op_end = this_deref->marker + this_deref->length; !strchr(EXPR_OPERAND_TERMINATORS, *op_end); ++op_end);
goto double_deref;
}
else
{
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol)) // If it's an operand, at this stage it can only be SYM_OPERAND or SYM_STRING.
{
if (infix_count > MAX_TOKENS - 2) // -2 to ensure room for this operator and the operand further below.
goto abnormal_end;
infix[infix_count++].symbol = SYM_CONCAT;
}
if (this_deref->var->Type() == VAR_NORMAL // VAR_ALIAS is taken into account (and resolved) by Type().
&& (g_NoEnv || this_deref->var->Length())) // v1.0.43.08: Added g_NoEnv. Relies on short-circuit boolean order.
// "!this_deref->var->Get()" isn't checked here. See comments in SYM_DYNAMIC evaluation.
{
// DllCall() and possibly others rely on this having been done to support changing the
// value of a parameter (similar to by-ref).
infix[infix_count].symbol = SYM_VAR; // Type() is always VAR_NORMAL as verified above. This is relied upon in several places such as built-in functions.
}
else // It's either a built-in variable (including clipboard) OR a possible environment variable.
{
infix[infix_count].symbol = SYM_DYNAMIC;
infix[infix_count].buf = NULL; // SYM_DYNAMIC requires that buf be set to NULL for vars (since there are two different types of SYM_DYNAMIC).
}
infix[infix_count].var = this_deref->var;
}
} // Handling of the var or function in this_deref.
// Finally, jump over the dereference text. Note that in the case of an expression, there might not
// be any percent signs within the text of the dereference, e.g. x + y, not %x% + %y% (unless they're
// deliberately double-derefs).
cp += this_deref->length;
// The outer loop will now do ++infix for us.
continue; // To avoid falling into the label below. The label below is only reached by explicit goto.
double_deref: // Caller has set cp to be start and op_end to be the character after the last one of the double deref.
if (infix_count && YIELDS_AN_OPERAND(infix[infix_count - 1].symbol)) // If it's an operand, at this stage it can only be SYM_OPERAND or SYM_STRING.
{
if (infix_count > MAX_TOKENS - 2) // -2 to ensure room for this operator and the operand further below.
goto abnormal_end;
infix[infix_count++].symbol = SYM_CONCAT;
}
infix[infix_count].symbol = SYM_DYNAMIC;
infix[infix_count].buf = target; // Point it to its position in the buffer (populated below).
memcpy(target, cp, op_end - cp); // "target" is incremented and string-terminated later below.
// Set "deref" properly for the loop to resume processing at the item after this double deref.
// Callers of double_deref have ensured that deref!=NULL and deref->marker!=NULL (because it
// doesn't make sense to have a double-deref unless caller discovered the first deref that
// belongs to this double deref, such as the "i" in Array%i%).
for (deref_start = deref, ++deref; deref->marker && deref->marker < op_end; ++deref);
derefs_in_this_double = (int)(deref - deref_start);
--deref; // Compensate for the outer loop's ++deref.
// There's insufficient room to shoehorn all the necessary data into the token (since circuit_token probably
// can't be safely overloaded at this stage), so allocate a little bit of stack memory, just enough for the
// number of derefs (variables) whose contents comprise the name of this double-deref variable (typically
// there's only one; e.g. the "i" in Array%i%).
deref_alloca = (DerefType *)_alloca((derefs_in_this_double + 1) * sizeof(DerefType)); // Provides one extra at the end as a terminator.
memcpy(deref_alloca, deref_start, derefs_in_this_double * sizeof(DerefType));
deref_alloca[derefs_in_this_double].marker = NULL; // Put a NULL in the last item, which terminates the array.
for (deref_start = deref_alloca; deref_start->marker; ++deref_start)
deref_start->marker = target + (deref_start->marker - cp); // Point each to its position in the *new* buf.
infix[infix_count].var = (Var *)deref_alloca; // Postfix evaluation uses this to build the variable's name dynamically.
target += op_end - cp; // Must be done only after the above, since it uses the old value of target.
if (*op_end == '(') // i.e. dynamic function call
{
if (infix_count > MAX_TOKENS - 2) // No room for the following symbol to be added (plus the ++infix done that will be done by the outer loop).
goto abnormal_end;
++infix_count;
// As a result of a prior loop, deref_start = the null-marker deref which terminates the deref list.
deref_start->is_function = true;
// param_count was set when the derefs were parsed.
deref_start->param_count = deref_alloca->param_count;
infix[infix_count].symbol = SYM_FUNC;
infix[infix_count].deref = deref_start;
// postfix processing of SYM_DYNAMIC will update deref->func before SYM_FUNC is processed.
}
else
deref_start->is_function = false;
*target++ = '\0'; // Terminate the name, which looks something like "Array%i%".
cp = op_end; // Must be done only after above is done using cp: Set things up for the next iteration.
// The outer loop will now do ++infix for us.
} // For each deref in this expression, and also for the final literal/raw text to the right of the last deref.
// Terminate the array with a special item. This allows infix-to-postfix conversion to do a faster
// traversal of the infix array.
if (infix_count > MAX_TOKENS - 1) // No room for the following symbol to be added.
goto abnormal_end;
infix[infix_count].symbol = SYM_INVALID;
////////////////////////////
// CONVERT INFIX TO POSTFIX.
////////////////////////////
#define STACK_PUSH(token_ptr) stack[stack_count++] = token_ptr
#define STACK_POP stack[--stack_count] // To be used as the r-value for an assignment.
// SYM_BEGIN is the first item to go on the stack. It's a flag to indicate that conversion to postfix has begun:
ExprTokenType token_begin;
token_begin.symbol = SYM_BEGIN;
STACK_PUSH(&token_begin);
this_infix = infix;
functions_on_stack = 0;
for (;;) // While SYM_BEGIN is still on the stack, continue iterating.
{
ExprTokenType *&this_postfix = postfix[postfix_count]; // Resolve early, especially for use by "goto". Reduces code size a bit, though it doesn't measurably help performance.
infix_symbol = this_infix->symbol; //
stack_symbol = stack[stack_count - 1]->symbol; // Frequently used, so resolve only once to help performance.
// Put operands into the postfix array immediately, then move on to the next infix item:
if (IS_OPERAND(infix_symbol)) // At this stage, operands consist of only SYM_OPERAND and SYM_STRING.
{
if (infix_symbol == SYM_DYNAMIC && SYM_DYNAMIC_IS_VAR_NORMAL_OR_CLIP(this_infix)) // Ordered for short-circuit performance.
{
// v1.0.46.01: If an environment variable is being used as an lvalue -- regardless
// of whether that variable is blank in the environment -- treat it as a normal
// variable instead. This is because most people would want that, and also because
// it's tranditional not to directly support assignments to environment variables
// (only EnvSet can do that, mostly for code simplicity). In addition, things like
// EnvVar.="string" and EnvVar+=2 aren't supported due to obscurity/rarity (instead,
// such expressions treat EnvVar as blank). In light of all this, convert environment
// variables that are targets of ANY assignments into normal variables so that they
// can be seen as a valid lvalues when the time comes to do the assignment.
// IMPORTANT: VAR_CLIPBOARD is made into SYM_VAR here, but only for assignments.
// This allows built-in functions and other places in the code to treat SYM_VAR
// as though it's always VAR_NORMAL, which reduces code size and improves maintainability.
sym_prev = this_infix[1].symbol; // Resolve to help macro's code size and performance.
if (IS_ASSIGNMENT_OR_POST_OP(sym_prev) // Post-op must be checked for VAR_CLIPBOARD (by contrast, it seems unnecessary to check it for others; see comments below).
|| stack_symbol == SYM_PRE_INCREMENT || stack_symbol == SYM_PRE_DECREMENT) // Stack *not* infix.
this_infix->symbol = SYM_VAR; // Convert clipboard or environment variable into SYM_VAR.
// POST-INC/DEC: It seems unnecessary to check for these except for VAR_CLIPBOARD because
// those assignments (and indeed any assignment other than .= and :=) will have no effect
// on a ON A SYM_DYNAMIC environment variable. This is because by definition, such
// variables have an empty Var::Contents(), and AutoHotkey v1 does not allow
// math operations on blank variables. Thus, the result of doing a math-assignment
// operation on a blank lvalue is almost the same as doing it on an invalid lvalue.
// The main difference is that with the exception of post-inc/dec, assignments
// wouldn't produce an lvalue unless we explicitly check for them all above.
// An lvalue should be produced so that the following features are consistent
// even for variables whose names happen to match those of environment variables:
// - Pass an assignment byref or takes its address; e.g. &(++x).
// - Cascading assigments; e.g. (++var) += 4 (rare to be sure).
// - Possibly other lvalue behaviors that rely on SYM_VAR being present.
// Above logic might not be perfect because it doesn't check for parens such as (var):=x,
// and possibly other obscure types of assignments. However, it seems adequate given
// the rarity of such things and also because env vars are being phased out (scripts can
// use #NoEnv to avoid all such issues).
}
this_postfix = this_infix++;
this_postfix->circuit_token = NULL; // Set default. It's only ever overridden after it's in the postfix array.
++postfix_count;
continue; // Doing a goto to a hypothetical "standard_postfix_circuit_token" (in lieu of these last 3 lines) reduced performance and didn't help code size.
}
// Since above didn't "continue", the current infix symbol is not an operand, but an operator or other symbol.
switch(infix_symbol)
{
case SYM_CPAREN: // Listed first for performance. It occurs frequently while emptying the stack to search for the matching open-parenthesis.
if (stack_symbol == SYM_OPAREN) // See comments near the bottom of this case. The first open-paren on the stack must be the one that goes with this close-paren.
{
--stack_count; // Remove this open-paren from the stack, since it is now complete.
++this_infix; // Since this pair of parentheses is done, move on to the next token in the infix expression.
// There should be no danger of stack underflow in the following because SYM_BEGIN always
// exists at the bottom of the stack:
if (stack[stack_count - 1]->symbol == SYM_FUNC) // i.e. topmost item on stack is SYM_FUNC.
{
--functions_on_stack;
goto standard_pop_into_postfix; // Within the postfix list, a function-call should always immediately follow its params.
}
}
else if (stack_symbol == SYM_BEGIN) // Paren is closed without having been opened (currently impossible due to load-time balancing, but kept for completeness).
goto abnormal_end;
else // This stack item is an operator.
{
goto standard_pop_into_postfix;
// By not incrementing i, the loop will continue to encounter SYM_CPAREN and thus
// continue to pop things off the stack until the corresponding OPAREN is reached.
}
break;
case SYM_FUNC:
++functions_on_stack; // This technique performs well but prevents multi-statements from being nested inside function calls (seems too obscure to worry about); e.g. fn((x:=5, y+=3), 2)
STACK_PUSH(this_infix++);
// NOW FALL INTO THE OPEN-PAREN BELOW because load-time validation has ensured that each SYM_FUNC
// is followed by a '('.
// ABOVE CASE FALLS INTO BELOW.
case SYM_OPAREN:
// Open-parentheses always go on the stack to await their matching close-parentheses.
STACK_PUSH(this_infix++);
break;
case SYM_IFF_ELSE: // i.e. this infix symbol is ':'.
if (stack_symbol == SYM_BEGIN) // ELSE with no matching IF/THEN (load-time currently doesn't validate/detect this).
goto abnormal_end; // Below relies on the above check having been done, to avoid underflow.
// Otherwise:
this_postfix = STACK_POP; // There should be no danger of stack underflow in the following because SYM_BEGIN always exists at the bottom of the stack.
if (stack_symbol == SYM_IFF_THEN) // See comments near the bottom of this case. The first found "THEN" on the stack must be the one that goes with this "ELSE".
{
this_postfix->circuit_token = this_infix; // Point this "THEN" to its "ELSE" for use by short-circuit. This simplifies short-circuit by means such as avoiding the need to take notice of nested IFF's when discarding a branch (a different stage points the IFF's condition to its "THEN").
STACK_PUSH(this_infix++); // Push the ELSE onto the stack so that its operands will go into the postfix array before it.
// Above also does ++i since this ELSE found its matching IF/THEN, so it's time to move on to the next token in the infix expression.
}
else // This stack item is an operator INCLUDE some other THEN's ELSE (all such ELSE's should be purged from the stack so that 1 ? 1 ? 2 : 3 : 4 creates postfix 112?3:?4: not something like 112?3?4::.
{
this_postfix->circuit_token = NULL; // Set default. It's only ever overridden after it's in the postfix array.
// By not incrementing i, the loop will continue to encounter SYM_IFF_ELSE and thus
// continue to pop things off the stack until the corresponding SYM_IFF_THEN is reached.
}
++postfix_count;
break;
case SYM_INVALID:
if (stack_symbol == SYM_BEGIN) // Stack is basically empty, so stop the loop.
{
--stack_count; // Remove SYM_BEGIN from the stack, leaving the stack empty for use in postfix eval.
goto end_of_infix_to_postfix; // Both infix and stack have been fully processed, so move on to the postfix evaluation phase.
}
else if (stack_symbol == SYM_OPAREN) // Open paren is never closed (currently impossible due to load-time balancing, but kept for completeness).
goto abnormal_end;
else // Pop item off the stack, AND CONTINUE ITERATING, which will hit this line until stack is empty.
goto standard_pop_into_postfix;
// ALL PATHS ABOVE must continue or goto.
default: // This infix symbol is an operator, so act according to its precedence.
// If the symbol waiting on the stack has a lower precedence than the current symbol, push the
// current symbol onto the stack so that it will be processed sooner than the waiting one.
// Otherwise, pop waiting items off the stack (by means of i not being incremented) until their
// precedence falls below the current item's precedence, or the stack is emptied.
// Note: BEGIN and OPAREN are the lowest precedence items ever to appear on the stack (CPAREN
// never goes on the stack, so can't be encountered there).
if ( sPrecedence[stack_symbol] < sPrecedence[infix_symbol] + (sPrecedence[infix_symbol] % 2) // Performance: An sPrecedence2[] array could be made in lieu of the extra add+indexing+modulo, but it benched only 0.3% faster, so the extra code size it caused didn't seem worth it.
|| IS_ASSIGNMENT_EXCEPT_POST_AND_PRE(infix_symbol) && stack_symbol != SYM_DEREF // See note 1 below. Ordered for short-circuit performance.
|| stack_symbol == SYM_POWER && (infix_symbol >= SYM_NEGATIVE && infix_symbol <= SYM_DEREF // See note 2 below. Check lower bound first for short-circuit performance.
|| infix_symbol == SYM_LOWNOT) )
{
// NOTE 1: v1.0.46: The IS_ASSIGNMENT_EXCEPT_POST_AND_PRE line above was added in conjunction with
// the new assignment operators (e.g. := and +=). Here's what it does: Normally, the assignment
// operators have the lowest precedence of all (except for commas) because things that lie
// to the *right* of them in the infix expression should be evaluated first to be stored
// as the assignment's result. However, if what lies to the *left* of the assignment
// operator isn't a valid lvalue/variable (and not even a unary like -x can produce an lvalue
// because they're not supposed to alter the contents of the variable), obeying the normal
// precedence rules would be produce a syntax error due to "assigning to non-lvalue".
// So instead, apply any pending operator on the stack (which lies to the left of the lvalue
// in the infix expression) *after* the assignment by leaving it on the stack. For example,
// C++ and probably other langauges (but not the older ANSI C) evaluate "true ? x:=1 : y:=1"
// as a pair of assignments rather than as who-knows-what (probably a syntax error if you
// strictly followed precedence). Similarly, C++ evaluates "true ? var1 : var2 := 3" not as
// "(true ? var1 : var2) := 3" (like ANSI C) but as "true ? var1 : (var2 := 3)". Other examples:
// -> not var:=5 ; It's evaluated contrary to precedence as: not (var:=5) [PHP does this too,
// and probably others]
// -> 5 + var+=5 ; It's evaluated contrary to precedence as: 5 + (var+=5) [not sure if other
// languages do ones like this]
// -> ++i := 5 ; Silly since increment has no lasting effect; so assign the 5 then do the pre-inc.
// -> ++i /= 5 ; Valid, but maybe too obscure and inconsistent to treat it differently than
// the others (especially since not many people will remember that unlike i++, ++i produces
// an lvalue); so divide by 5 then do the increment.
// -> i++ := 5 (and i++ /= 5) ; Postfix operator can't produce an lvalue, so do the assignment
// first and then the postfix op.
// SYM_DEREF is the only exception to the above because there's a slight chance that
// *Var:=X (evaluated strictly according to precedence as (*Var):=X) will be used for someday.
// Also, SYM_FUNC seems unaffected by any of this due to its enclosing parentheses (i.e. even
// if a function-call can someday generate an lvalue [SYM_VAR], the current rules probably
// already support it.
// Performance: Adding the above behavior reduced the expressions benchmark by only 0.6%; so
// it seems worth it.
//
// NOTE 2: The SYM_POWER line above is a workaround to allow 2**-2 (and others in v1.0.45) to be
// evaluated as 2**(-2) rather than being seen as an error. However, as of v1.0.46, consecutive
// unary operators are supported via the right-to-left evaluation flag above (formerly, consecutive
// unaries produced a failure [blank value]). For example:
// !!x ; Useful to convert a blank value into a zero for use with unitialized variables.
// not not x ; Same as above.
// Other examples: !-x, -!x, !&x, -&Var, ~&Var
// And these deref ones (which worked even before v1.0.46 by different means: giving
// '*' a higher precedence than the other unaries): !*Var, -*Var and ~*Var
// !x ; Supported even if X contains a negative number, since x is recognized as an isolated operand and not something containing unary minus.
//
// To facilitate short-circuit boolean evaluation, right before an AND/OR/IFF is pushed onto the
// stack, point the end of it's left branch to it. Note that the following postfix token
// can itself be of type AND/OR/IFF, a simple example of which is "if (true and true and true)",
// in which the first and's parent (in an imaginary tree) is the second "and".
// But how is it certain that this is the final operator or operand of and AND/OR/IFF's left branch?
// Here is the explanation:
// Everything higher precedence than the AND/OR/IFF came off the stack right before it, resulting in
// what must be a balanced/complete sub-postfix-expression in and of itself (unless the expression
// has a syntax error, which is caught in various places). Because it's complete, during the
// postfix evaluation phase, that sub-expression will result in a new operand for the stack,
// which must then be the left side of the AND/OR/IFF because the right side immediately follows it
// within the postfix array, which in turn is immediately followed its operator (namely AND/OR/IFF).
// Also, the final result of an IFF's condition-branch must point to the IFF/THEN symbol itself
// because that's the means by which the condition is merely "checked" rather than becoming an
// operand itself.
if (infix_symbol <= SYM_AND && infix_symbol >= SYM_IFF_THEN && postfix_count) // Check upper bound first for short-circuit performance.
postfix[postfix_count - 1]->circuit_token = this_infix; // In the case of IFF, this points the final result of the IFF's condition to its SYM_IFF_THEN (a different stage points the THEN to its ELSE).
if (infix_symbol != SYM_COMMA)
STACK_PUSH(this_infix);
else // infix_symbol == SYM_COMMA, but which type of comma (function vs. statement-separator).
{
// KNOWN LIMITATION: Although the functions_on_stack method is simple and efficient, it isn't
// capable of detecting commas that separate statements inside a function call such as:
// fn(x, (y:=2, fn2()))
// Thus, such attempts will cause the expression as a whole to fail and evaluate to ""
// (though individual parts of the expression may execute before it fails).
// C++ and possibly other C-like languages seem to allow such expressions as shown by the
// following simple example: MsgBox((1, 2)); // In which MsgBox sees (1, 2) as a single arg.
// Perhaps this could be solved someday by checking/tracking whether there is a non-function
// open-paren on the stack above/prior to the first function-call-open-paren on the stack.
// That rule seems flexible enough to work even for things like f1((f2(), X)). Perhaps a
// simple stack traversal could be done to find the first OPAREN. If it's a function's OPAREN,
// this is a function-comma. Otherwise, this comma is a statement-separator nested inside a
// function call. But the performance impact of that doesn't seem worth it given rarity of use.
if (!functions_on_stack) // This comma separates statements rather than function parameters.
{
STACK_PUSH(this_infix);
// v1.0.46.01: Treat ", var = expr" as though the "=" is ":=", even if there's a ternary
// on the right side (for consistency and since such a ternary would be stand-alone,
// which is a rare use for ternary). Also cascade to the right to treat things like
// x=y=z as assignments because its intuitiveness seems to outweigh other considerations.
// In a future version, these transformations could be done at loadtime to improve runtime
// performance; but currently that seems more complex than it's worth (and loadtime
// performance and code size shouldn't be entirely ignored).
for (fwd_infix = this_infix + 1;; fwd_infix += 2)
{
// The following is checked first to simplify things and avoid any chance of reading
// beyond the last item in the array. This relies on the fact that a SYM_INVALID token
// exists at the end of the array as a terminator.
if (fwd_infix->symbol == SYM_INVALID || fwd_infix[1].symbol != SYM_EQUAL) // Relies on short-circuit boolean order.
break; // No further checking needed because there's no qualified equal-sign.
// Otherwise, check what lies to the left of the equal-sign.
if (fwd_infix->symbol == SYM_VAR)
{
fwd_infix[1].symbol = SYM_ASSIGN;
continue; // Cascade to the right until the last qualified '=' operator is found.