forked from ispc/ispc.github.com
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathispc.html
4538 lines (4521 loc) · 272 KB
/
ispc.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.15.2: http://docutils.sourceforge.net/" />
<title>Intel® ISPC User's Guide</title>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-1486404-4']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<link rel="stylesheet" href="css/style.css" type="text/css" />
</head>
<body>
<div class="document" id="intel-ispc-user-s-guide">
<div id="wrap">
<div id="wrap2">
<div id="header">
<h1 id="logo">Intel® Implicit SPMD Program Compiler</h1>
<div id="slogan">An open-source compiler for high-performance SIMD programming on
the CPU</div>
</div>
<div id="nav">
<div id="nbar">
<ul>
<li><a href="index.html">Overview</a></li>
<li><a href="features.html">Features</a></li>
<li><a href="downloads.html">Downloads</a></li>
<li id="selected"><a href="documentation.html">Documentation</a></li>
<li><a href="perf.html">Performance</a></li>
<li><a href="contrib.html">Contributors</a></li>
</ul>
</div>
</div>
<div id="content-wrap">
<div id="sidebar">
<div class="widgetspace">
<h1>Resources</h1>
<ul class="menu">
<li><a href="http://github.com/ispc/ispc/">ispc page on github</a></li>
<li><a href="http://groups.google.com/group/ispc-users/">ispc
users mailing list</a></li>
<li><a href="http://groups.google.com/group/ispc-dev/">ispc
developers mailing list</a></li>
<li><a href="http://github.com/ispc/ispc/wiki/">Wiki</a></li>
<li><a href="http://github.com/ispc/ispc/issues/">Bug tracking</a></li>
</ul>
</div>
</div>
<h1 class="title">Intel® ISPC User's Guide</h1>
<div id="content">
<p>The Intel® Implicit SPMD Program Compiler (Intel® ISPC) is a compiler for
writing SPMD (single program multiple data) programs to run on the CPU.
The SPMD
programming approach is widely known to graphics and GPGPU programmers; it
is used for GPU shaders and CUDA* and OpenCL* kernels, for example. The
main idea behind SPMD is that one writes programs as if they were operating
on a single data element (a pixel for a pixel shader, for example), but
then the underlying hardware and runtime system executes multiple
invocations of the program in parallel with different inputs (the values
for different pixels, for example).</p>
<p>The main goals behind <tt class="docutils literal">ispc</tt> are to:</p>
<ul class="simple">
<li>Build a variant of the C programming language that delivers good
performance to performance-oriented programmers who want to run SPMD
programs on CPUs.</li>
<li>Provide a thin abstraction layer between the programmer and the
hardware--in particular, to follow the lesson from C for serial programs
of having an execution and data model where the programmer can cleanly
reason about the mapping of their source program to compiled assembly
language and the underlying hardware.</li>
<li>Harness the computational power of the Single Program, Multiple Data (SIMD) vector
units without the extremely low-programmer-productivity activity of directly
writing intrinsics.</li>
<li>Explore opportunities from close-coupling between C/C++ application code
and SPMD <tt class="docutils literal">ispc</tt> code running on the same processor--lightweight function
calls between the two languages, sharing data directly via pointers without
copying or reformatting, etc.</li>
</ul>
<p><strong>We are very interested in your feedback and comments about ispc and
in hearing your experiences using the system. We are especially interested
in hearing if you try using ispc but see results that are not as you
were expecting or hoping for.</strong> We encourage you to send a note with your
experiences or comments to the <a class="reference external" href="http://groups.google.com/group/ispc-users">ispc-users</a> mailing list or to file bug or
feature requests with the <tt class="docutils literal">ispc</tt> <a class="reference external" href="https://github.com/ispc/ispc/issues?state=open">bug tracker</a>. (Thanks!)</p>
<p>Contents:</p>
<ul class="simple">
<li><a class="reference internal" href="#recent-changes-to-ispc">Recent Changes to ISPC</a><ul>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-1">Updating ISPC Programs For Changes In ISPC 1.1</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-2">Updating ISPC Programs For Changes In ISPC 1.2</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-3">Updating ISPC Programs For Changes In ISPC 1.3</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-5-0">Updating ISPC Programs For Changes In ISPC 1.5.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-6-0">Updating ISPC Programs For Changes In ISPC 1.6.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-7-0">Updating ISPC Programs For Changes In ISPC 1.7.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-8-2">Updating ISPC Programs For Changes In ISPC 1.8.2</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-9-0">Updating ISPC Programs For Changes In ISPC 1.9.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-9-1">Updating ISPC Programs For Changes In ISPC 1.9.1</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-9-2">Updating ISPC Programs For Changes In ISPC 1.9.2</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-10-0">Updating ISPC Programs For Changes In ISPC 1.10.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-11-0">Updating ISPC Programs For Changes In ISPC 1.11.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-12-0">Updating ISPC Programs For Changes In ISPC 1.12.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-13-0">Updating ISPC Programs For Changes In ISPC 1.13.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-14-0">Updating ISPC Programs For Changes In ISPC 1.14.0</a></li>
</ul>
</li>
<li><a class="reference internal" href="#getting-started-with-ispc">Getting Started with ISPC</a><ul>
<li><a class="reference internal" href="#installing-ispc">Installing ISPC</a></li>
<li><a class="reference internal" href="#compiling-and-running-a-simple-ispc-program">Compiling and Running a Simple ISPC Program</a></li>
</ul>
</li>
<li><a class="reference internal" href="#using-the-ispc-compiler">Using The ISPC Compiler</a><ul>
<li><a class="reference internal" href="#basic-command-line-options">Basic Command-line Options</a></li>
<li><a class="reference internal" href="#selecting-the-compilation-target">Selecting The Compilation Target</a></li>
<li><a class="reference internal" href="#selecting-32-or-64-bit-addressing">Selecting 32 or 64 Bit Addressing</a></li>
<li><a class="reference internal" href="#the-preprocessor">The Preprocessor</a></li>
<li><a class="reference internal" href="#debugging">Debugging</a></li>
<li><a class="reference internal" href="#other-ways-of-passing-arguments-to-ispc">Other ways of passing arguments to ISPC</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-ispc-parallel-execution-model">The ISPC Parallel Execution Model</a><ul>
<li><a class="reference internal" href="#basic-concepts-program-instances-and-gangs-of-program-instances">Basic Concepts: Program Instances and Gangs of Program Instances</a></li>
<li><a class="reference internal" href="#control-flow-within-a-gang">Control Flow Within A Gang</a><ul>
<li><a class="reference internal" href="#control-flow-example-if-statements">Control Flow Example: If Statements</a></li>
<li><a class="reference internal" href="#control-flow-example-loops">Control Flow Example: Loops</a></li>
<li><a class="reference internal" href="#gang-convergence-guarantees">Gang Convergence Guarantees</a></li>
</ul>
</li>
<li><a class="reference internal" href="#uniform-data">Uniform Data</a><ul>
<li><a class="reference internal" href="#uniform-control-flow">Uniform Control Flow</a></li>
<li><a class="reference internal" href="#uniform-variables-and-varying-control-flow">Uniform Variables and Varying Control Flow</a></li>
</ul>
</li>
<li><a class="reference internal" href="#data-races-within-a-gang">Data Races Within a Gang</a></li>
<li><a class="reference internal" href="#tasking-model">Tasking Model</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-ispc-language">The ISPC Language</a><ul>
<li><a class="reference internal" href="#relationship-to-the-c-programming-language">Relationship To The C Programming Language</a></li>
<li><a class="reference internal" href="#lexical-structure">Lexical Structure</a></li>
<li><a class="reference internal" href="#types">Types</a><ul>
<li><a class="reference internal" href="#basic-types-and-type-qualifiers">Basic Types and Type Qualifiers</a></li>
<li><a class="reference internal" href="#uniform-and-varying-qualifiers">"uniform" and "varying" Qualifiers</a></li>
<li><a class="reference internal" href="#defining-new-names-for-types">Defining New Names For Types</a></li>
<li><a class="reference internal" href="#pointer-types">Pointer Types</a></li>
<li><a class="reference internal" href="#function-pointer-types">Function Pointer Types</a></li>
<li><a class="reference internal" href="#reference-types">Reference Types</a></li>
<li><a class="reference internal" href="#enumeration-types">Enumeration Types</a></li>
<li><a class="reference internal" href="#short-vector-types">Short Vector Types</a></li>
<li><a class="reference internal" href="#array-types">Array Types</a></li>
<li><a class="reference internal" href="#struct-types">Struct Types</a><ul>
<li><a class="reference internal" href="#operators-overloading">Operators Overloading</a></li>
</ul>
</li>
<li><a class="reference internal" href="#structure-of-array-types">Structure of Array Types</a></li>
</ul>
</li>
<li><a class="reference internal" href="#declarations-and-initializers">Declarations and Initializers</a></li>
<li><a class="reference internal" href="#expressions">Expressions</a><ul>
<li><a class="reference internal" href="#dynamic-memory-allocation">Dynamic Memory Allocation</a></li>
</ul>
</li>
<li><a class="reference internal" href="#control-flow">Control Flow</a><ul>
<li><a class="reference internal" href="#conditional-statements-if">Conditional Statements: "if"</a></li>
<li><a class="reference internal" href="#conditional-statements-switch">Conditional Statements: "switch"</a></li>
<li><a class="reference internal" href="#iteration-statements">Iteration Statements</a><ul>
<li><a class="reference internal" href="#basic-iteration-statements-for-while-and-do">Basic Iteration Statements: "for", "while", and "do"</a></li>
<li><a class="reference internal" href="#iteration-over-active-program-instances-foreach-active">Iteration over active program instances: "foreach_active"</a></li>
<li><a class="reference internal" href="#iteration-over-unique-elements-foreach-unique">Iteration over unique elements: "foreach_unique"</a></li>
<li><a class="reference internal" href="#parallel-iteration-statements-foreach-and-foreach-tiled">Parallel Iteration Statements: "foreach" and "foreach_tiled"</a></li>
<li><a class="reference internal" href="#parallel-iteration-with-programindex-and-programcount">Parallel Iteration with "programIndex" and "programCount"</a></li>
</ul>
</li>
<li><a class="reference internal" href="#unstructured-control-flow-goto">Unstructured Control Flow: "goto"</a></li>
<li><a class="reference internal" href="#coherent-control-flow-statements-cif-and-friends">"Coherent" Control Flow Statements: "cif" and Friends</a></li>
<li><a class="reference internal" href="#functions-and-function-calls">Functions and Function Calls</a><ul>
<li><a class="reference internal" href="#function-overloading">Function Overloading</a></li>
</ul>
</li>
<li><a class="reference internal" href="#re-establishing-the-execution-mask">Re-establishing The Execution Mask</a></li>
<li><a class="reference internal" href="#task-parallel-execution">Task Parallel Execution</a><ul>
<li><a class="reference internal" href="#task-parallelism-launch-and-sync-statements">Task Parallelism: "launch" and "sync" Statements</a></li>
<li><a class="reference internal" href="#task-parallelism-runtime-requirements">Task Parallelism: Runtime Requirements</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#the-ispc-standard-library">The ISPC Standard Library</a><ul>
<li><a class="reference internal" href="#basic-operations-on-data">Basic Operations On Data</a><ul>
<li><a class="reference internal" href="#logical-and-selection-operations">Logical and Selection Operations</a></li>
<li><a class="reference internal" href="#bit-operations">Bit Operations</a></li>
</ul>
</li>
<li><a class="reference internal" href="#math-functions">Math Functions</a><ul>
<li><a class="reference internal" href="#basic-math-functions">Basic Math Functions</a></li>
<li><a class="reference internal" href="#transcendental-functions">Transcendental Functions</a></li>
<li><a class="reference internal" href="#pseudo-random-numbers">Pseudo-Random Numbers</a></li>
<li><a class="reference internal" href="#random-numbers">Random Numbers</a></li>
</ul>
</li>
<li><a class="reference internal" href="#output-functions">Output Functions</a></li>
<li><a class="reference internal" href="#assertions">Assertions</a></li>
<li><a class="reference internal" href="#cross-program-instance-operations">Cross-Program Instance Operations</a><ul>
<li><a class="reference internal" href="#reductions">Reductions</a></li>
</ul>
</li>
<li><a class="reference internal" href="#data-movement">Data Movement</a><ul>
<li><a class="reference internal" href="#setting-and-copying-values-in-memory">Setting and Copying Values In Memory</a></li>
<li><a class="reference internal" href="#packed-load-and-store-operations">Packed Load and Store Operations</a></li>
<li><a class="reference internal" href="#streaming-load-and-store-operations">Streaming Load and Store Operations</a></li>
</ul>
</li>
<li><a class="reference internal" href="#data-conversions">Data Conversions</a><ul>
<li><a class="reference internal" href="#converting-between-array-of-structures-and-structure-of-arrays-layout">Converting Between Array-of-Structures and Structure-of-Arrays Layout</a></li>
<li><a class="reference internal" href="#conversions-to-and-from-half-precision-floats">Conversions To and From Half-Precision Floats</a></li>
<li><a class="reference internal" href="#converting-to-srgb8">Converting to sRGB8</a></li>
</ul>
</li>
<li><a class="reference internal" href="#systems-programming-support">Systems Programming Support</a><ul>
<li><a class="reference internal" href="#atomic-operations-and-memory-fences">Atomic Operations and Memory Fences</a></li>
<li><a class="reference internal" href="#prefetches">Prefetches</a></li>
<li><a class="reference internal" href="#system-information">System Information</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#interoperability-with-the-application">Interoperability with the Application</a><ul>
<li><a class="reference internal" href="#interoperability-overview">Interoperability Overview</a></li>
<li><a class="reference internal" href="#data-layout">Data Layout</a></li>
<li><a class="reference internal" href="#data-alignment-and-aliasing">Data Alignment and Aliasing</a></li>
<li><a class="reference internal" href="#restructuring-existing-programs-to-use-ispc">Restructuring Existing Programs to Use ISPC</a></li>
</ul>
</li>
<li><a class="reference internal" href="#notices-disclaimers">Notices & Disclaimers</a></li>
<li><a class="reference internal" href="#optimization-notice">Optimization Notice</a></li>
</ul>
<div class="section" id="recent-changes-to-ispc">
<h1>Recent Changes to ISPC</h1>
<p>See the file <a class="reference external" href="https://raw.github.com/ispc/ispc/master/docs/ReleaseNotes.txt">ReleaseNotes.txt</a> in the <tt class="docutils literal">ispc</tt> distribution for a list
of recent changes to the compiler.</p>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-1">
<h2>Updating ISPC Programs For Changes In ISPC 1.1</h2>
<p>The major changes introduced in the 1.1 release of <tt class="docutils literal">ispc</tt> are first-class
support for pointers in the language and new parallel loop constructs.
Adding this functionality required a number of syntactic changes to the
language. These changes should generally lead to straightforward minor
modifications of existing <tt class="docutils literal">ispc</tt> programs.</p>
<p>These are the relevant changes to the language:</p>
<ul class="simple">
<li>The syntax for reference types has been changed to match C++'s syntax for
references and the <tt class="docutils literal">reference</tt> keyword has been removed. (A diagnostic
message is issued if <tt class="docutils literal">reference</tt> is used.)<ul>
<li>Declarations like <tt class="docutils literal">reference float foo</tt> should be changed to <tt class="docutils literal">float &foo</tt>.</li>
<li>Any array parameters in function declaration with a <tt class="docutils literal">reference</tt>
qualifier should just have <tt class="docutils literal">reference</tt> removed: <tt class="docutils literal">void foo(reference
float <span class="pre">bar[])</span></tt> can just be <tt class="docutils literal">void foo(float <span class="pre">bar[])</span></tt>.</li>
</ul>
</li>
<li>It is now a compile-time error to assign an entire array to another
array.</li>
<li>A number of standard library routines have been updated to take
pointer-typed parameters, rather than references or arrays an index
offsets, as appropriate. For example, the <tt class="docutils literal">atomic_add_global()</tt>
function previously took a reference to the variable to be updated
atomically but now takes a pointer. In a similar fashion,
<tt class="docutils literal">packed_store_active()</tt> takes a pointer to a <tt class="docutils literal">uniform unsigned int</tt>
as its first parameter rather than taking a <tt class="docutils literal">uniform unsigned int[]</tt> as
its first parameter and a <tt class="docutils literal">uniform int</tt> offset as its second parameter.</li>
<li>It is no longer legal to pass a varying lvalue to a function that takes a
reference parameter; references can only be to uniform lvalue types. In
this case, the function should be rewritten to take a varying pointer
parameter.</li>
<li>There are new iteration constructs for looping over computation domains,
<tt class="docutils literal">foreach</tt> and <tt class="docutils literal">foreach_tiled</tt>. In addition to being syntactically
cleaner than regular <tt class="docutils literal">for</tt> loops, these can provide performance
benefits in many cases when iterating over data and mapping it to program
instances. See the Section <a class="reference internal" href="#parallel-iteration-statements-foreach-and-foreach-tiled">Parallel Iteration Statements: "foreach" and
"foreach_tiled"</a> for more information about these.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-2">
<h2>Updating ISPC Programs For Changes In ISPC 1.2</h2>
<p>The following changes were made to the language syntax and semantics for
the <tt class="docutils literal">ispc</tt> 1.2 release:</p>
<ul class="simple">
<li>Syntax for the "launch" keyword has been cleaned up; it's now no longer
necessary to bracket the launched function call with angle brackets. (In
other words, now use <tt class="docutils literal">launch <span class="pre">foo();</span></tt>, rather than <tt class="docutils literal">launch < foo() >;</tt>.)</li>
<li>When using pointers, the pointed-to data type is now "uniform" by
default. Use the varying keyword to specify varying pointed-to types
when needed. (i.e. <tt class="docutils literal">float *ptr</tt> is a varying pointer to uniform float
data, whereas previously it was a varying pointer to varying float
values.) Use <tt class="docutils literal">varying float *</tt> to specify a varying pointer to varying
float data, and so forth.</li>
<li>The details of "uniform" and "varying" and how they interact with struct
types have been cleaned up. Now, when a struct type is declared, if the
struct elements don't have explicit "uniform" or "varying" qualifiers,
they are said to have "unbound" variability. When a struct type is
instantiated, any unbound variability elements inherit the variability of
the parent struct type. See <a class="reference internal" href="#struct-types">Struct Types</a> for more details.</li>
<li><tt class="docutils literal">ispc</tt> has a new language feature that makes it much easier to use the
efficient "(array of) structure of arrays" (AoSoA, or SoA) memory layout
of data. A new <tt class="docutils literal">soa<n></tt> qualifier can be applied to structure types to
specify an n-wide SoA version of the corresponding type. Array indexing
and pointer operations with arrays SoA types automatically handles the
two-stage indexing calculation to access the data. See <a class="reference internal" href="#structure-of-array-types">Structure of
Array Types</a> for more details.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-3">
<h2>Updating ISPC Programs For Changes In ISPC 1.3</h2>
<p>This release adds a number of new iteration constructs, which in turn use
new reserved words: <tt class="docutils literal">unmasked</tt>, <tt class="docutils literal">foreach_unique</tt>, <tt class="docutils literal">foreach_active</tt>,
and <tt class="docutils literal">in</tt>. Any program that happens to have a variable or function with
one of these names must be modified to rename that symbol.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-5-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.5.0</h2>
<p>This release adds support for double precision floating point constants.
Double precision floating point constants are floating point number with
<tt class="docutils literal">d</tt> suffix and optional exponent part. Here are some examples: 3.14d,
31.4d-1, 1.d, 1.0d, 1d-2. Note that floating point number without suffix is
treated as single precision constant.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-6-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.6.0</h2>
<p>This release adds support for <a class="reference internal" href="#operators-overloading">Operators Overloading</a>, so a word <tt class="docutils literal">operator</tt>
becomes a keyword and it potentially creates a conflict with existing user
function. Also a new library function packed_store_active2() was introduced,
which also may create a conflict with existing user functions.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-7-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.7.0</h2>
<p>This release contains several changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li>The algorithm for selecting overloaded functions was extended to cover more
types of overloading, and handling of reference types was fixed. At the same
time the old scheme, which blindly used the function with "the best score"
summed for all arguments, was switched to the C++ approach, which requires
"the best score" for each argument. If the best function doesn't exist, a
warning is issued in this version. It will be turned into an error in the
next version. A simple example: Suppose we have two functions: max(int, int)
and max(unsigned int, unsigned int). The new rules lead to an error when
calling max(int, unsigned int), as the best choice is ambiguous.</li>
<li>Implicit cast of pointer to const type to void* was disallowed. Use explicit
cast if needed.</li>
<li>A bug which prevented "const" qualifiers from appearing in emitted .h files
was fixed. Consequently, "const" qualifiers now properly appearing in emitted
.h files may cause compile errors in pre-existing codes.</li>
<li>get_ProgramCount() was moved from stdlib to examples/util/util.isph file. You
need to include this file to be able to use this function.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-8-2">
<h2>Updating ISPC Programs For Changes In ISPC 1.8.2</h2>
<p>The release doesn't contain language changes, which may affect compatibility with
older versions. Though you may want be aware of the following:</p>
<ul class="simple">
<li>Mangling of uniform types was changed to not include varying width, so now you
may use uniform structures and pointers to uniform types as return types in
export functions in multi-target compilation.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-9-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.9.0</h2>
<p>The release doesn't contains language changes, which may affect compatibility with
older versions. It introduces new AVX512 target: avx512knl-i32x16.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-9-1">
<h2>Updating ISPC Programs For Changes In ISPC 1.9.1</h2>
<p>The release doesn't contains language changes, which may affect compatibility with
older versions. It introduces new AVX512 target: avx512skx-i32x16.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-9-2">
<h2>Updating ISPC Programs For Changes In ISPC 1.9.2</h2>
<p>The release doesn't contain language changes, which may affect compatibility with
older versions.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-10-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.10.0</h2>
<p>The release has several new language features, which do not affect compatibility.
Namely, new streaming stores, aos_to_soa/soa_to_aos instrinsics for 64 bit types,
and a "#pragma ignore".</p>
<p>One change that potentially may affect compatibility - changed size of short vector
types. If you use short vector types for data passed between C/C++ and ISPC, you
may want to pay attention to it.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-11-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.11.0</h2>
<p>This release redefined -O1 compiler option to optimize for size, so it may require
adjusting your build system accordingly.</p>
<p>Starting 1.11.0 version auto-generated headers use <tt class="docutils literal">#pragma once</tt>. In the unlikely
case when your C/C++ compiler is not supporting that, please use <tt class="docutils literal"><span class="pre">--no-pragma-once</span></tt>
<tt class="docutils literal">ispc</tt> switch.</p>
<p>This release also introduces new AVX512 target avx512skx-i32x8. It produces code,
which doesn't use ZMM registers.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-12-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.12.0</h2>
<p>This release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li><tt class="docutils literal">noinline</tt> keyword was added.</li>
<li>Standard library functions <tt class="docutils literal">rsqrt_fast()</tt> and <tt class="docutils literal">rcp_fast()</tt> were added.</li>
<li>AVX1.1 (IvyBridge) targets and generic KNC and KNL targets were removed.
Note that KNL is still supported through avx512knl-i32x16.</li>
</ul>
<p>The release also introduces static initialization for varying variables, which
should not affect compatibility.</p>
<p>This release introduces experimental cross OS compilation support and ARM/AARCH64
support. It also contains a new 128-bit AVX2 target (avx2-i32x4) and a CPU
definition for Ice Lake client (--cpu=icl).</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-13-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.13.0</h2>
<p>This release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li>Representation of <tt class="docutils literal">bool</tt> type in storage was changed from target-specific to
one byte per boolean value. So size of <tt class="docutils literal">varying bool</tt> is target width (in
bytes), and size of <tt class="docutils literal">unform bool</tt> is one. This definition is compatible
with C/C++, hence improves interoperability.</li>
<li>type aliases for unsigned types were added: <tt class="docutils literal">uint8</tt>, <tt class="docutils literal">uint16</tt>, <tt class="docutils literal">uint32</tt>,
<tt class="docutils literal">uint64</tt>, and <tt class="docutils literal">uint</tt>. To detect if these types are supported you can
check if ISPC_UINT_IS_DEFINED macro is defined, this is handy for writing code
which works with older versions of <tt class="docutils literal">ispc</tt>.</li>
<li><tt class="docutils literal">extract()</tt>/<tt class="docutils literal">insert()</tt> for boolean arguments, and <tt class="docutils literal">abs()</tt> for all integer and
FP types were added to standard library.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-14-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.14.0</h2>
<p>This release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li>"generic" targets were removed. Please use native targets instead.</li>
</ul>
<p>New i8 and i16 targets were introduced: avx2-i8x32, avx2-i16x16, avx512skx-i8x64,
and avx512skx-i16x32.</p>
<p>Windows x86_64 target now supports <tt class="docutils literal">__vectorcall</tt> calling convention.
It's off by default, can be enabled by <tt class="docutils literal"><span class="pre">--vectorcall</span></tt> command line switch.</p>
</div>
</div>
<div class="section" id="getting-started-with-ispc">
<h1>Getting Started with ISPC</h1>
<div class="section" id="installing-ispc">
<h2>Installing ISPC</h2>
<p>The <a class="reference external" href="http://ispc.github.io/downloads.html">ispc downloads web page</a> has prebuilt executables for Windows*,
Linux* and macOS* available for download. Alternatively, you can
download the source code from that page and build it yourself; see the
<a class="reference external" href="http://github.com/ispc/ispc/wiki">ispc wiki</a> for instructions about building <tt class="docutils literal">ispc</tt> from source.</p>
<p>Once you have an executable for your system, copy it into a directory
that's in your <tt class="docutils literal">PATH</tt>. Congratulations--you've now installed <tt class="docutils literal">ispc</tt>.</p>
</div>
<div class="section" id="compiling-and-running-a-simple-ispc-program">
<h2>Compiling and Running a Simple ISPC Program</h2>
<p>The directory <tt class="docutils literal">examples/simple</tt> in the <tt class="docutils literal">ispc</tt> distribution includes a
simple example of how to use <tt class="docutils literal">ispc</tt> with a short C++ program. See the
file <tt class="docutils literal">simple.ispc</tt> in that directory (also reproduced here.)</p>
<pre class="literal-block">
export void simple(uniform float vin[], uniform float vout[],
uniform int count) {
foreach (index = 0 ... count) {
float v = vin[index];
if (v < 3.)
v = v * v;
else
v = sqrt(v);
vout[index] = v;
}
}
</pre>
<p>This program loops over an array of values in <tt class="docutils literal">vin</tt> and computes an
output value for each one. For each value in <tt class="docutils literal">vin</tt>, if its value is less
than three, the output is the value squared, otherwise it's the square root
of the value.</p>
<p>The first thing to notice in this program is the presence of the <tt class="docutils literal">export</tt>
keyword in the function definition; this indicates that the function should
be made available to be called from application code. The <tt class="docutils literal">uniform</tt>
qualifiers on the parameters to <tt class="docutils literal">simple</tt> indicate that the corresponding
variables are non-vector quantities--this concept is discussed in detail in the
<a class="reference internal" href="#uniform-and-varying-qualifiers">"uniform" and "varying" Qualifiers</a> section.</p>
<p>Each iteration of the <tt class="docutils literal">foreach</tt> loop works on a number of input values in
parallel--depending on the compilation target chosen, it may be 4, 8, or
even 16 elements of the <tt class="docutils literal">vin</tt> array, processed efficiently with the CPU's
SIMD hardware. Here, the variable <tt class="docutils literal">index</tt> takes all values from 0 to
<tt class="docutils literal"><span class="pre">count-1</span></tt>. After the load from the array to the variable <tt class="docutils literal">v</tt>, the
program can then proceed, doing computation and control flow based on the
values loaded. The result from the running program instances is written to
the <tt class="docutils literal">vout</tt> array before the next iteration of the <tt class="docutils literal">foreach</tt> loop runs.</p>
<p>To build and run examples go to <tt class="docutils literal">examples</tt> and create <tt class="docutils literal">build</tt> folder.
Run <tt class="docutils literal">cmake <span class="pre">-DISPC_EXECUTABLE=<path_to_ispc_binary></span> ../</tt>. On Linux* and
macOS*, the makefile will be generated in that directory. On Windows*,
Microsoft Visual Studio solution <tt class="docutils literal">ispc_examples.sln</tt> will be created. In
either case, build it now! We'll walk through the details of the compilation
steps in the following section, <a class="reference internal" href="#using-the-ispc-compiler">Using The ISPC Compiler</a>.) In addition to
compiling the <tt class="docutils literal">ispc</tt> program, in this case the <tt class="docutils literal">ispc</tt> compiler also
generates a small header file, <tt class="docutils literal">simple.h</tt>. This header file includes the
declaration for the C-callable function that the above <tt class="docutils literal">ispc</tt> program is
compiled to. The relevant parts of this file are:</p>
<pre class="literal-block">
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
extern void simple(float vin[], float vout[], int32_t count);
#ifdef __cplusplus
}
#endif // __cplusplus
</pre>
<p>It's not mandatory to <tt class="docutils literal">#include</tt> the generated header file in your C/C++
code (you can alternatively use a manually-written <tt class="docutils literal">extern</tt> declaration
of the <tt class="docutils literal">ispc</tt> functions you use), but it's a helpful check to ensure that
the function signatures are as expected on both sides.</p>
<p>Here is the main program, <tt class="docutils literal">simple.cpp</tt>, which calls the <tt class="docutils literal">ispc</tt> function
above.</p>
<pre class="literal-block">
#include <stdio.h>
#include "simple.h"
int main() {
float vin[16], vout[16];
for (int i = 0; i < 16; ++i)
vin[i] = i;
simple(vin, vout, 16);
for (int i = 0; i < 16; ++i)
printf("%d: simple(%f) = %f\n", i, vin[i], vout[i]);
}
</pre>
<p>Note that the call to the <tt class="docutils literal">ispc</tt> function in the middle of <tt class="docutils literal">main()</tt> is
a regular function call. (And it has the same overhead as a C/C++ function
call, for that matter.)</p>
<p>When the executable <tt class="docutils literal">simple</tt> runs, it generates the expected output:</p>
<pre class="literal-block">
0: simple(0.000000) = 0.000000
1: simple(1.000000) = 1.000000
2: simple(2.000000) = 4.000000
3: simple(3.000000) = 1.732051
...
</pre>
<p>For a slightly more complex example of using <tt class="docutils literal">ispc</tt>, see the <a class="reference external" href="http://ispc.github.com/example.html">Mandelbrot
set example</a> page on the <tt class="docutils literal">ispc</tt> website for a walk-through of an <tt class="docutils literal">ispc</tt>
implementation of that algorithm. After reading through that example, you
may want to examine the source code of the various examples in the
<tt class="docutils literal">examples/</tt> directory of the <tt class="docutils literal">ispc</tt> distribution.</p>
</div>
</div>
<div class="section" id="using-the-ispc-compiler">
<h1>Using The ISPC Compiler</h1>
<p>To go from a <tt class="docutils literal">ispc</tt> source file to an object file that can be linked
with application code, enter the following command</p>
<pre class="literal-block">
ispc foo.ispc -o foo.o
</pre>
<p>(On Windows, you may want to specify <tt class="docutils literal">foo.obj</tt> as the output filename.)</p>
<div class="section" id="basic-command-line-options">
<h2>Basic Command-line Options</h2>
<p>The <tt class="docutils literal">ispc</tt> executable can be run with <tt class="docutils literal"><span class="pre">--help</span></tt> to print a list of
accepted command-line arguments. By default, the compiler compiles the
provided program (and issues warnings and errors), but doesn't
generate any output.</p>
<p>If the <tt class="docutils literal"><span class="pre">-o</span></tt> flag is given, it will generate an output file (a native
object file by default).</p>
<pre class="literal-block">
ispc foo.ispc -o foo.obj
</pre>
<p>To generate a text assembly file, pass <tt class="docutils literal"><span class="pre">--emit-asm</span></tt>:</p>
<pre class="literal-block">
ispc foo.ispc -o foo.asm --emit-asm
</pre>
<p>To generate LLVM bitcode, use the <tt class="docutils literal"><span class="pre">--emit-llvm</span></tt> flag.
To generate LLVM bitcode in textual form, use the <tt class="docutils literal"><span class="pre">--emit-llvm-text</span></tt> flag.</p>
<p>Optimizations are on by default; they can be turned off with <tt class="docutils literal"><span class="pre">-O0</span></tt>:</p>
<pre class="literal-block">
ispc foo.ispc -o foo.obj -O0
</pre>
<p>There is support for generating debugging symbols; this is enabled with the
<tt class="docutils literal"><span class="pre">-g</span></tt> command-line flag. Using <tt class="docutils literal"><span class="pre">-g</span></tt> doesn't affect optimization level;
to debug unoptimized code pass <tt class="docutils literal"><span class="pre">-O0</span></tt> flag.</p>
<p>The <tt class="docutils literal"><span class="pre">-h</span></tt> flag can also be used to direct <tt class="docutils literal">ispc</tt> to generate a C/C++
header file that includes C/C++ declarations of the C-callable <tt class="docutils literal">ispc</tt>
functions and the types passed to it.</p>
<p>The <tt class="docutils literal"><span class="pre">-D</span></tt> option can be used to specify definitions to be passed along to
the pre-processor, which runs over the program input before it's compiled.
For example, including <tt class="docutils literal"><span class="pre">-DTEST=1</span></tt> defines the pre-processor symbol
<tt class="docutils literal">TEST</tt> to have the value <tt class="docutils literal">1</tt> when the program is compiled.</p>
<p>The compiler issues a number of performance warnings for code constructs
that compile to relatively inefficient code. These warnings can be
silenced with the <tt class="docutils literal"><span class="pre">--wno-perf</span></tt> flag (or by using <tt class="docutils literal"><span class="pre">--woff</span></tt>, which turns
off all compiler warnings.) Furthermore, <tt class="docutils literal"><span class="pre">--werror</span></tt> can be provided to
direct the compiler to treat any warnings as errors.</p>
<p>Position-independent code (for use in shared libraries) is generated if the
<tt class="docutils literal"><span class="pre">--pic</span></tt> command-line argument is provided.</p>
</div>
<div class="section" id="selecting-the-compilation-target">
<h2>Selecting The Compilation Target</h2>
<p>There are four options that affect the compilation target: <tt class="docutils literal"><span class="pre">--arch</span></tt>,
which sets the target architecture, <tt class="docutils literal"><span class="pre">--cpu</span></tt>, which sets the target CPU,
<tt class="docutils literal"><span class="pre">--target</span></tt>, which sets the target instruction set, and <tt class="docutils literal"><span class="pre">--target-os</span></tt>,
which sets the target operating system.</p>
<p>If none of these options is specified, <tt class="docutils literal">ispc</tt> generates code for the host
OS and for the architecture of the system the compiler is running on (i.e.
64-bit x86-64 (<tt class="docutils literal"><span class="pre">--arch=x86-64</span></tt>) on x86 systems and ARM NEON on ARM systems.</p>
<p>To compile to a 32-bit x86 target, for example, supply <tt class="docutils literal"><span class="pre">--arch=x86</span></tt> on
the command line:</p>
<pre class="literal-block">
ispc foo.ispc -o foo.obj --arch=x86
</pre>
<p>Currently-supported architectures are <tt class="docutils literal"><span class="pre">x86-64</span></tt>, <tt class="docutils literal">x86</tt>, and <tt class="docutils literal">arm</tt>.</p>
<p>The target CPU determines both the default instruction set used as well as
which CPU architecture the code is tuned for. <tt class="docutils literal">ispc <span class="pre">--help</span></tt> provides a
list of all of the supported CPUs. By default, the CPU type of the system
on which you're running <tt class="docutils literal">ispc</tt> is used to determine the target CPU.</p>
<pre class="literal-block">
ispc foo.ispc -o foo.obj --cpu=corei7-avx
</pre>
<p>Next, <tt class="docutils literal"><span class="pre">--target</span></tt> selects the target instruction set. The target
string is of the form <tt class="docutils literal"><span class="pre">[ISA]-i[mask</span> size]x[gang size]</tt>. For example,
<tt class="docutils literal"><span class="pre">--target=avx2-i32x16</span></tt> specifies a target with the AVX2 instruction set,
a mask size of 32 bits, and a gang size of 16.</p>
<p>The following target ISAs are supported:</p>
<table border="1" class="docutils">
<colgroup>
<col width="17%" />
<col width="83%" />
</colgroup>
<tbody valign="top">
<tr><td>Target</td>
<td>Description</td>
</tr>
<tr><td>avx, avx1</td>
<td>AVX (2010-2011 era Intel CPUs)</td>
</tr>
<tr><td>avx2</td>
<td>AVX 2 target (2013- Intel "Haswell" CPUs)</td>
</tr>
<tr><td>avx512knl</td>
<td>AVX 512 target (Xeon Phi chips codename Knights Landing)</td>
</tr>
<tr><td>avx512skx</td>
<td>AVX 512 target (future Xeon CPUs)</td>
</tr>
<tr><td>neon</td>
<td>ARM NEON</td>
</tr>
<tr><td>sse2</td>
<td>SSE2 (early 2000s era x86 CPUs)</td>
</tr>
<tr><td>sse4</td>
<td>SSE4 (generally 2008-2010 Intel CPUs)</td>
</tr>
</tbody>
</table>
<p>Consult your CPU's manual for specifics on which vector instruction set it
supports.</p>
<p>The mask size may be 8, 16, or 32 bits, though not all combinations of ISAs
and mask sizes are supported. For best performance, the best general
approach is to choose a mask size equal to the size of the most common
datatype in your programs. For example, if most of your computation is on
32-bit floating-point values, an <tt class="docutils literal">i32</tt> target is appropriate. However,
if you're mostly doing computation on 8-bit images, <tt class="docutils literal">i8</tt> is a better choice.</p>
<p>See <a class="reference internal" href="#basic-concepts-program-instances-and-gangs-of-program-instances">Basic Concepts: Program Instances and Gangs of Program Instances</a> for
more discussion of the "gang size" and its implications for program
execution.</p>
<p>Running <tt class="docutils literal">ispc <span class="pre">--help</span></tt> and looking at the output for the <tt class="docutils literal"><span class="pre">--target</span></tt>
option gives the most up-to-date documentation about which targets your
compiler binary supports.</p>
<p>The naming scheme for compilation targets changed in August 2013; the
following table shows the relationship between names in the old scheme and
in the new scheme:</p>
<table border="1" class="docutils">
<colgroup>
<col width="54%" />
<col width="46%" />
</colgroup>
<tbody valign="top">
<tr><td>Target</td>
<td>Former Name</td>
</tr>
<tr><td>avx1-i32x8</td>
<td>avx, avx1</td>
</tr>
<tr><td>avx1-i32x16</td>
<td>avx-x2</td>
</tr>
<tr><td>avx2-i32x8</td>
<td>avx2</td>
</tr>
<tr><td>avx2-i32x16</td>
<td>avx2-x2</td>
</tr>
<tr><td>neon-8</td>
<td>n/a</td>
</tr>
<tr><td>neon-16</td>
<td>n/a</td>
</tr>
<tr><td>neon-32</td>
<td>n/a</td>
</tr>
<tr><td>sse2-i32x4</td>
<td>sse2</td>
</tr>
<tr><td>sse2-i32x8</td>
<td>sse2-x2</td>
</tr>
<tr><td>sse4-i32x4</td>
<td>sse4</td>
</tr>
<tr><td>sse4-i32x8</td>
<td>sse4-x2</td>
</tr>
<tr><td>sse4-i8x16</td>
<td>n/a</td>
</tr>
<tr><td>sse4-i16x8</td>
<td>n/a</td>
</tr>
</tbody>
</table>
<p>By default, the target instruction set is chosen based on the most capable
one supported by the system on which you're running <tt class="docutils literal">ispc</tt>. You can
override this choice with the <tt class="docutils literal"><span class="pre">--target</span></tt> flag; for example, to select
Intel® SSE2 with a 32-bit mask and 4 program instances in a gang, use
<tt class="docutils literal"><span class="pre">--target=sse2-i32x4</span></tt>. (As with the other options in this section, see
the output of <tt class="docutils literal">ispc <span class="pre">--help</span></tt> for a full list of supported targets.)</p>
<p>Finally, <tt class="docutils literal"><span class="pre">--target-os</span></tt> selects the target operating system. Depending on
your host <tt class="docutils literal">ispc</tt> may support Windows, Linux, macOS, Android, iOS and PS4
targets. Running <tt class="docutils literal">ispc <span class="pre">--help</span></tt> and looking at the output for the <tt class="docutils literal"><span class="pre">--target-os</span></tt>
option gives the list of supported targets. By default <tt class="docutils literal">ispc</tt> produces the
code for your host operating system.</p>
<pre class="literal-block">
ispc foo.ispc -o foo.obj --target-os=android
</pre>
<p>Note that cross OS compilation is in experimental stage. We encourage you to
try it and send us a note with your experiences or to file a bug or feature
requests with the <tt class="docutils literal">ispc</tt> <a class="reference external" href="https://github.com/ispc/ispc/issues?state=open">bug tracker</a>.</p>
</div>
<div class="section" id="selecting-32-or-64-bit-addressing">
<h2>Selecting 32 or 64 Bit Addressing</h2>
<p>By default, <tt class="docutils literal">ispc</tt> uses 32-bit arithmetic for performing addressing
calculations, even when using a 64-bit compilation target like x86-64.
This implementation approach can provide substantial performance benefits
by reducing the cost of addressing calculations. (Note that pointers
themselves are still maintained as 64-bit quantities for 64-bit targets.)</p>
<p>If you need to be able to address more than 4GB of memory from your
<tt class="docutils literal">ispc</tt> programs, the <tt class="docutils literal"><span class="pre">--addressing=64</span></tt> command-line argument can be
provided to cause the compiler to generate 64-bit arithmetic for addressing
calculations. Note that it is safe to mix object files where some were
compiled with the default <tt class="docutils literal"><span class="pre">--addressing=32</span></tt> and others were compiled with
<tt class="docutils literal"><span class="pre">--addressing=64</span></tt>.</p>
</div>
<div class="section" id="the-preprocessor">
<h2>The Preprocessor</h2>
<p><tt class="docutils literal">ispc</tt> automatically runs the C preprocessor on your input program before
compiling it. Thus, you can use <tt class="docutils literal">#ifdef</tt>, <tt class="docutils literal">#define</tt>, and so forth in
your ispc programs. (This functionality can be disabled with the <tt class="docutils literal"><span class="pre">--nocpp</span></tt>
command-line argument.)</p>
<p>A number of preprocessor symbols are automatically defined before the
preprocessor runs:</p>
<table border="1" class="docutils">
<caption>Predefined Preprocessor symbols and their values</caption>
<colgroup>
<col width="33%" />
<col width="33%" />
<col width="33%" />
</colgroup>
<tbody valign="top">
<tr><td>Symbol name</td>
<td>Value</td>
<td>Use</td>
</tr>
<tr><td>ISPC</td>
<td>1</td>
<td>Detecting that the <tt class="docutils literal">ispc</tt> compiler is processing the file</td>
</tr>
<tr><td>ISPC_TARGET_{NEON, SSE2, SSE4, AVX, AVX2, AVX512KNL, AVX512SKX}</td>
<td>1</td>
<td>One of these will be set, depending on the compilation target.</td>
</tr>
<tr><td>ISPC_POINTER_SIZE</td>
<td>32 or 64</td>
<td>Number of bits used to represent a pointer for the target architecture.</td>
</tr>
<tr><td>ISPC_MAJOR_VERSION</td>
<td>1</td>
<td>Major version of the <tt class="docutils literal">ispc</tt> compiler/language</td>
</tr>
<tr><td>ISPC_MINOR_VERSION</td>
<td>13</td>
<td>Minor version of the <tt class="docutils literal">ispc</tt> compiler/language</td>
</tr>
<tr><td>PI</td>
<td>3.1415926535</td>
<td>Mathematics</td>
</tr>
<tr><td>TARGET_WIDTH</td>
<td>Vector width of the target, e.g., 8 for sse2-i32x8.</td>
<td>Static varying initialization.</td>
</tr>
<tr><td>TARGET_ELEMENT_WIDTH</td>
<td>Element width in bytes, e.g., 4 for i32.</td>
<td>Static varying initialization.</td>
</tr>
<tr><td>ISPC_UINT_IS_DEFINED</td>
<td><ol class="first last arabic simple">
<li></li>
</ol>
</td>
<td>Detecting if uint8/uint16/uint32/uint64 types are defined in the ISPC version.</td>
</tr>
</tbody>
</table>
<p><tt class="docutils literal">ispc</tt> also provides <tt class="docutils literal">#pragma ignore warning</tt> directives to ignore compiler warnings for individual lines.</p>
<table border="1" class="docutils">
<caption><tt class="docutils literal">#pragma ignore warning</tt> directives and their functions:</caption>
<colgroup>
<col width="50%" />
<col width="50%" />
</colgroup>
<tbody valign="top">
<tr><td><tt class="docutils literal">#pragma</tt> name</td>
<td>Use</td>
</tr>
<tr><td><tt class="docutils literal">#pragma ignore warning(all)</tt></td>
<td>Turns off all <tt class="docutils literal">ispc</tt> compiler warnings including performance warnings for the following line of code.</td>
</tr>
<tr><td><tt class="docutils literal">#pragma ignore warning(perf)</tt></td>
<td>Turns off only performance warnings for the following line of code.</td>
</tr>
<tr><td><tt class="docutils literal">#pragma ignore warning</tt></td>
<td>Turns off all <tt class="docutils literal">ispc</tt> compiler warnings including performance warnings for the following line of code.</td>
</tr>
</tbody>
</table>
<p>When using <tt class="docutils literal">#pragma ignore warning</tt> before a call to a macro, it suppresses warnings from the expanded macro code.</p>
</div>
<div class="section" id="debugging">
<h2>Debugging</h2>
<p>The <tt class="docutils literal"><span class="pre">-g</span></tt> command-line flag can be supplied to the compiler, which causes
it to generate debugging symbols. The debug info is emitted in DWARF format
on Linux* and macOS*. The version of the DWARF can be controlled by
command-line switch <tt class="docutils literal"><span class="pre">--dwarf-version={2,3,4}</span></tt>. On Windows* CodeView format
is used (not PDB), it's natively supported by Microsoft Visual Studio*.
Running <tt class="docutils literal">ispc</tt> programs in the debugger, setting breakpoints, printing out
variables is just the same as debugging C/C++ programs. Similarly, you can
directly step up and down the call stack between <tt class="docutils literal">ispc</tt> code and C/C++
code.</p>
<p>One limitation of the current debugging support is that the debugger
provides a window into an entire gang's worth of program instances, rather
than just a single program instance. (These concepts will be introduced
shortly, in <a class="reference internal" href="#basic-concepts-program-instances-and-gangs-of-program-instances">Basic Concepts: Program Instances and Gangs of Program Instances</a>
). Thus, when a <tt class="docutils literal">varying</tt> variable is printed, the values for
each of the program instances are displayed. Along similar lines, the path
the debugger follows through program source code passes each statement that
any program instance wants to execute (see <a class="reference internal" href="#control-flow-within-a-gang">Control Flow Within A Gang</a>
for more details on control flow in <tt class="docutils literal">ispc</tt>.)</p>
<p>While debugging, a variable, <tt class="docutils literal">__mask</tt>, is available to provide the
current program execution mask at the current point in the program</p>
<p>Another option for debugging is
to use the <tt class="docutils literal">print</tt> statement for <tt class="docutils literal">printf()</tt> style debugging. (See
<a class="reference internal" href="#output-functions">Output Functions</a> for more information.) You can also use the ability to
call back to application code at particular points in the program, passing
a set of variable values to be logged or otherwise analyzed from there.</p>
</div>
<div class="section" id="other-ways-of-passing-arguments-to-ispc">
<h2>Other ways of passing arguments to ISPC</h2>
<p>In addition to specifying arguments on the command line, if the <tt class="docutils literal">ISPC_ARGS</tt>
environment variable has been set it is split into arguments and these arguments
are appended to any provided on the command line.</p>
<p>It is also possible to pass arguments to <tt class="docutils literal">ispc</tt> in a file. If an argument has
the form <tt class="docutils literal">@<filename></tt>, where <tt class="docutils literal"><filename></tt> exists and is readable, it is
replaced with the content of the file split into arguments. Note that it <em>is</em>
allowed for a file to contain a further <tt class="docutils literal">@<filename></tt> argument.</p>
<p>Where a file or environment variable is split into arguments, this is done based on
the arguments being separated by one or more whitespace characters, including tabs
and newlines. There is no means of escaping or quoting a character to allow an
argument to contain a whitespace character.</p>
</div>
</div>
<div class="section" id="the-ispc-parallel-execution-model">
<h1>The ISPC Parallel Execution Model</h1>
<p>Though <tt class="docutils literal">ispc</tt> is a C-based language, it is inherently a language for
parallel computation. Understanding the details of <tt class="docutils literal">ispc</tt>'s parallel
execution model that are introduced in this section is critical for writing
efficient and correct programs in <tt class="docutils literal">ispc</tt>.</p>
<p><tt class="docutils literal">ispc</tt> supports two types of parallelism: task parallelism to parallelize
across multiple processor cores and SPMD parallelism to parallelize across
the SIMD vector lanes on a single core. Most of this section focuses on
SPMD parallelism, but see <a class="reference internal" href="#tasking-model">Tasking Model</a> at the end of this section for
discussion of task parallelism in <tt class="docutils literal">ispc</tt>.</p>
<p>This section will use some snippets of <tt class="docutils literal">ispc</tt> code to illustrate various
concepts. Given <tt class="docutils literal">ispc</tt>'s relationship to C, these should be
understandable on their own, but you may want to refer to the <a class="reference internal" href="#the-ispc-language">The ISPC
Language</a> section for details on language syntax.</p>
<div class="section" id="basic-concepts-program-instances-and-gangs-of-program-instances">
<h2>Basic Concepts: Program Instances and Gangs of Program Instances</h2>
<p>Upon entry to a <tt class="docutils literal">ispc</tt> function called from C/C++ code, the execution
model switches from the application's serial model to <tt class="docutils literal">ispc</tt>'s execution
model. Conceptually, a number of <tt class="docutils literal">ispc</tt> <em>program instances</em> start
running concurrently. The group of running program instances is a
called a <em>gang</em> (harkening to "gang scheduling", since <tt class="docutils literal">ispc</tt> provides
certain guarantees about the control flow coherence of program instances
running in a gang, detailed in <a class="reference internal" href="#gang-convergence-guarantees">Gang Convergence Guarantees</a>.) An
<tt class="docutils literal">ispc</tt> program instance is thus similar to a CUDA* "thread" or an OpenCL*
"work-item", and an <tt class="docutils literal">ispc</tt> gang is similar to a CUDA* "warp".</p>
<p>An <tt class="docutils literal">ispc</tt> program expresses the computation performed by a gang of
program instances, using an "implicit parallel" model, where the <tt class="docutils literal">ispc</tt>
program generally describes the behavior of a single program instance, even
though a gang of them is actually executing together. This implicit model
is the same that is used for shaders in programmable graphics pipelines,
OpenCL* kernels, and CUDA*. For example, consider the following <tt class="docutils literal">ispc</tt>
function:</p>
<pre class="literal-block">
float func(float a, float b) {
return a + b / 2.;
}
</pre>
<p>In C, this function describes a simple computation on two individual
floating-point values. In <tt class="docutils literal">ispc</tt>, this function describes the
computation to be performed by each program instance in a gang. Each
program instance has distinct values for the variables <tt class="docutils literal">a</tt> and <tt class="docutils literal">b</tt>, and
thus each program instance generally computes a different result when
executing this function.</p>
<p>The gang of program instances starts executing in the same hardware thread
and context as the application code that called the <tt class="docutils literal">ispc</tt> function; no
thread creation or context switching is done under the covers by <tt class="docutils literal">ispc</tt>.
Rather, the set of program instances is mapped to the SIMD lanes of the
current processor, leading to excellent utilization of hardware SIMD units
and high performance.</p>
<p>The number of program instances in a gang is relatively small; in practice,
it's no more than 2-4x the native SIMD width of the hardware it is
executing on. (Thus, four or eight program instances in a gang on a CPU
using the the 4-wide SSE instruction set, and eight or sixteen on a CPU
using 8-wide AVX.)</p>
</div>
<div class="section" id="control-flow-within-a-gang">
<h2>Control Flow Within A Gang</h2>
<p>Almost all the standard control-flow constructs are supported by <tt class="docutils literal">ispc</tt>;
program instances are free to follow different program execution paths than
other ones in their gang. For example, consider a simple <tt class="docutils literal">if</tt> statement
in <tt class="docutils literal">ispc</tt> code:</p>
<pre class="literal-block">
float x = ..., y = ...;
if (x < y) {
// true statements
}
else {
// false statements
}
</pre>
<p>In general, the test <tt class="docutils literal">x < y</tt> may have different result for different
program instances in the gang: some of the currently running program
instances want to execute the statements for the "true" case and some want
to execute the statements for the "false" case.</p>
<p>Complex control flow in <tt class="docutils literal">ispc</tt> programs generally works as expected,
computing the same results for each program instance in a gang as would
have been computed if the equivalent code ran serially in C to compute each
program instance's result individually. However, here we will more
precisely define the execution model for control flow in order to be able
to precisely define the language's behavior in specific situations.</p>
<p>We will specify the notion of a <em>program counter</em> and how it is updated to
step through the program, and an <em>execution mask</em> that indicates which
program instances want to execute the instruction at the current program
counter. The program counter is shared by all of the
program instances in the gang; it points to a single instruction to be
executed next. The execution mask is a per-program-instance boolean value
that indicates whether or not side effects from the current instruction
should effect each program instance. Thus, for example, if a statement
were to be executed with an "all off" mask, there should be no observable
side-effects.</p>
<p>Upon entry to an <tt class="docutils literal">ispc</tt> function called by the application, the execution
mask is "all on" and the program counter points at the first statement in
the function. The following two statements describe the required behavior
of the program counter and the execution mask over the course of execution
of an <tt class="docutils literal">ispc</tt> function.</p>
<blockquote>
<p>1. The program counter will have a sequence of values corresponding to a
conservative execution path through the function, wherein if <em>any</em>
program instance wants to execute a statement, the program counter will
pass through that statement.</p>
<p>2. At each statement the program counter passes through, the execution
mask will be set such that its value for a particular program instance is
"on" if and only if the program instance wants to execute that statement.</p>
</blockquote>
<p>Note that these definitions provide the compiler some latitude; for example,
the program counter is allowed to pass through a series of statements with the