<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Sajid Zubair]]></title><description><![CDATA[Compilers | LLVM | MLIR | Graphics | Web Development]]></description><link>https://sajidzubair.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!A1n9!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc494b6ac-2fbc-4ce9-9a39-4125204a296f_1170x1170.png</url><title>Sajid Zubair</title><link>https://sajidzubair.substack.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 02 Jun 2026 20:04:56 GMT</lastBuildDate><atom:link href="https://sajidzubair.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Sajid Zubair]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[sajidzubair@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[sajidzubair@substack.com]]></itunes:email><itunes:name><![CDATA[Sajid Zubair]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sajid Zubair]]></itunes:author><googleplay:owner><![CDATA[sajidzubair@substack.com]]></googleplay:owner><googleplay:email><![CDATA[sajidzubair@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sajid Zubair]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Polyhedral Compilation in MLIR]]></title><description><![CDATA[How MLIR's Affine Dialect Brings Polyhedral Analysis Inside the Compiler]]></description><link>https://sajidzubair.substack.com/p/polyhedral-compilation-in-mlir</link><guid isPermaLink="false">https://sajidzubair.substack.com/p/polyhedral-compilation-in-mlir</guid><dc:creator><![CDATA[Sajid Zubair]]></dc:creator><pubDate>Sun, 31 May 2026 04:16:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HNiX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the previous article, we explored the mathematics behind polyhedral compilation. In this article, we&#8217;ll shift our focus from the mathematics to the infrastructure. Instead of asking how polyhedral compilation works, we&#8217;ll ask how those ideas are represented inside a modern compiler.</p><p>If you&#8217;re unfamiliar with concepts such as SCoPs, iteration domains, schedules, access functions, tiling, or dependence analysis, I&#8217;d recommend reading my previous article, <em><a href="https://sajidzubair.substack.com/p/polyhedral-compilation-how-compilers">Understanding Polyhedral Compilation</a></em>, where I explain these ideas in detail. This will make the rest of this article much easier to follow.</p><p>Before we dive further, let&#8217;s briefly recap the traditional polyhedral compilation workflow.</p><p>At a high level, polyhedral optimization was not a single pass living entirely inside the compiler. Instead, it involved multiple stages, each responsible for a different task.</p><p>The first step was identifying a region of code that could be represented in the polyhedral model. These regions, known as Static Control Parts (SCoPs), contain loop nests and memory accesses that can be expressed using affine functions.</p><p>Once a valid SCoP was identified, the optimization problem was translated into a mathematical representation consisting of iteration domains, schedules, and access functions.</p><p>Then the next step was the scheduling stage where they analyzed this representation and searched for a better execution order. This is where transformations such as loop interchange, tiling, fusion, and skewing were determined. The goal here was to improve locality, expose parallelism, and reduce communication between different parts of the computation.</p><p>After a new schedule was found, the transformed mathematical representation had to be converted back into executable code. A code generator would scan the transformed iteration space and produce the final loop nest, including any boundary conditions required to preserve correctness.</p><p>Conceptually, the workflow looked something like this:</p><pre><code><code>LLVM IR
&#8595;
Extract affine loop nest (SCoP)
&#8595;
Build polyhedral representation
&#8595;
Compute an optimized schedule
&#8595;
Generate transformed loop nest
&#8595;
Continue compilation
</code></code></pre><p>Now the problem was that all these tools were independent of each other and were used at different times. This independence is exactly where the problems started.</p><p>Before looking at how MLIR solves them, let's first understand the major problems of the traditional polyhedral workflow.</p><h2>Problems in Polyhedral Compilation</h2><h3>1. The extraction-transformation-reimport problem</h3><p>This is one of the fundamental problems and it affects everything else around it. Now to understand this clearly, we need to know how the compiler&#8217;s internal representation looks like.</p><p>When LLVM compiles our C code, it converts it into something called LLVM IR, which is basically an internal language that the compiler uses to optimize our code. All the LLVM&#8217;s optimizations operate on this IR. The reason this IR is so important is because it carries important information like types, variable names, control flow, and everything the compiler knows about your program.</p><p>Now there are a couple of things that happen when Polly runs :</p><ol><li><p><strong>Extraction</strong> : Now we know Polly&#8217;s job is to find the loop nests. So when Polly scans the LLVM IR and finds an affine loop nest it extracts it. Now to hand it to Pluto (scheduler), it has to first convert this loop from LLVM IR into a polyhedral mathematical representation which basically consists of iteration domain, access functions, and dependence constraints. Now this conversion is the main problem, because we lose information here. Some information that exists in the LLVM IR simply doesn&#8217;t fit into the polyhedral model and because of that it gets dropped.</p></li><li><p><strong>Transformation</strong> : Now that after the conversion Pluto receives the mathematical representation and it runs the ILP on it. It finds the best schedule and produces a transformed polyhedron. Then CLooG comes in and scans the polyhedron and produces an optimized C code. At this point the loop exists as a piece of string - a C code string.</p></li><li><p><strong>Reimport</strong> : At the end Polly takes the C code string and feeds it back to the compiler. Now again the compiler has to parse it, build the LLVM IR, and figure out how to connect it back to the surrounding code. This honestly is the most painful part.</p></li></ol><p>The problem is the compiler had already built up a careful understanding of your program, it knew which variables were live, which values were constant and what the surrounding flow looked like.</p><h3>2. Targeting multiple hardware backends was painful</h3><p>This problem became even more critical when ML workloads became important. A matrix multiplication in a neural network needs to run efficiently on CPUs, GPUs, and TPUs but with classical tools, each target required a separate pipeline.</p><p>What I mean by a separate pipeline is basically</p><p>For CPU, the workflow was :</p><pre><code><code>Loop &#8594; Polly &#8594; Pluto &#8594; CLooG &#8594; C code &#8594; LLVM &#8594; x86 binary
</code></code></pre><p><strong>For GPU</strong>, CLooG produces C code but GPUs need CUDA. So you needed a completely different tool :</p><pre><code><code>Loop &#8594; some GPU polyhedral tool &#8594; CUDA code &#8594; nvcc &#8594; GPU binary
</code></code></pre><p>And similarly for TPU also the hardware is completely different. So as you can see there was no standard polyhedral tool that targeted it at all.</p><h3>3. Interleaving passes was impossible</h3><p>Now this might be the most subtle problem but in some ways it is the most damaging for real world performance.</p><p>See modern compilers don&#8217;t optimize our programs in one big step. They run hundreds of small passes, each one making the code a little better, and more importantly each one building on what the previous pass found. This interleaving is what makes today&#8217;s compilers so powerful.</p><p>For example a real optimization might look like this :</p><pre><code><code>constant propagation        &#8592; figures out that N=64 in this context
    &#8595;
loop bounds simplification  &#8592; simplifies "i &lt; N" to "i &lt; 64"
    &#8595;
loop tiling                 &#8592; now tiles with knowledge that N=64
    &#8595;
vectorization               &#8592; uses the tiled structure to apply SIMD
    &#8595;
register allocation         &#8592; allocates registers knowing the vector width
</code></code></pre><p>Now lets understand what happens when polyhedral optimization comes in this pipeline :</p><pre><code><code>constant propagation
    &#8595;
loop bounds simplification
    &#8595;
STOP &#8212; extract loop &#8212; leave compiler &#8212; run Pluto -
run CLooG &#8212; get C code &#8212; re-enter compiler &#8212; re-parse
    &#8595;
vectorization
    &#8595;
register allocation
</code></code></pre><p>That hard STOP in the pipeline destroys the information flow in both the directions.</p><p>The result is that each pass is now working with less information than it could have, and the overall optimization is weaker than it could have been.</p><p>So till here we know what the problem is with the Polyhedral Compilation. It&#8217;s not that Pluto&#8217;s scheduler was weak or CLooG is generating bad code, in fact these tools are mathematically very powerful. The real problem was about the architecture.</p><p>Polyhedral compilation existed outside the compiler instead of inside it, and every time the compiler had to perform polyhedral transformation, it had to leave the compiler, do the things that I explained and regenerate the code again.</p><p>At the same time, machine learning workloads were rapidly changing compiler requirements.</p><p>Now compilers did not just target CPUs. They had to lower computations across GPUs, TPUs, tensor accelerators, vector units, and domain-specific hardware, all while preserving high-level information like tensor semantics, parallel dimensions, and memory layouts for as long as possible.</p><p>Traditional compiler IRs like LLVM IR were simply too low level for this.</p><p>And this is exactly the problem MLIR was built to solve.</p><h2>What is MLIR and why was it built</h2><p>As I mentioned earlier traditionally we only had one IR layer. That worked extremely well for languages like C/C++ because :</p><ul><li><p>the abstraction level was relatively low</p></li><li><p>loops/maps/pointers were still visible in LLVM IR</p></li><li><p>optimization goals were mostly CPU-oriented</p></li></ul><p>But the modern systems evolved and now the compiler had to handle things like :</p><ul><li><p>tensor algebra</p></li><li><p>neural networks</p></li><li><p>GPU kernels</p></li><li><p>TPUs</p></li><li><p>sparse tensors</p></li><li><p>vector hardware</p></li><li><p>domain-specific languages</p></li><li><p>distributed execution</p></li><li><p>polyhedral transformations</p></li><li><p>quantization</p></li><li><p>graph-level optimizations</p></li></ul><p>Trying to represent all of this directly in LLVM IR became painful. The reason it is painful because this stuff is too high level and LLVM IR was intentionally low level. Let us understand this with an LLVM IR example :</p><pre><code><code>%1 = load float, float* %ptr
%2 = fmul float %1, %1
store float %2, float* %out
</code></code></pre><p>As you can see, at this level :</p><ul><li><p>tensor semantics are gone</p></li><li><p>loop structure is flattened</p></li><li><p>matrix multiplication no longer looks like matrix multiplication</p></li><li><p>high-level information disappears too early</p></li></ul><p>This causes a huge issue because many optimizations need high-level structure.</p><p>For example:</p><ul><li><p>tensor fusion</p></li><li><p>operator fusion</p></li><li><p>loop tiling</p></li><li><p>layout transformations</p></li><li><p>hardware mapping</p></li><li><p>dependence analysis</p></li></ul><p>All become much harder once lowered fully to LLVM IR and that is where MLIR comes into play.</p><h2>What is MLIR ?</h2><p>The key idea of MLIR is simple, &#8220;Instead of having ONE IR for the whole compiler, allow MANY IR levels to coexist.&#8221; That is literally what the name means <strong>Multi-Level Intermediate Representation</strong>. So instead of forcing everything into one representation, MLIR allows :</p><ul><li><p>high-level IRs</p></li><li><p>mid-level IRs</p></li><li><p>low-level IRs</p></li><li><p>hardware-specific IRs</p></li></ul><p>all inside the same infrastructure.</p><p>The crucial thing to understand here is MLIR is not a compiler. It is a framework for building compilers. It basically gives you tools, the infrastructure, and the conventions so that when Google builds their TPU compiler or when Apple builds their ML compiler, they are both building on the same foundation. Now this allows their IRs to talk each other, their passes can be shared because their tools are compatible.</p><p>One way to understand MLIR is like a shared platform, the way Android is a shared platform for phone manufactures. Each manufacturer builds their own phone, but they share the underlying OS, MLIR is the underlying OS for compiler infrastructure.</p><h3>Why this matters for polyhedral compilation</h3><p>Now remember the problems that we had came down to the same root cause - that classical polyhedral tools lived outside the compiler and spoke a different language. But MLIR fixes this by giving polyhedral compilation its own native IR level - the affine dialect. We will discuss this dialect in more detail but remember that it lives inside the compiler with everything else. Here there is no need for extraction, reimport and translation.</p><h3>What are Dialects and Operations</h3><p>Dialects and Operations are basically the concepts that make MLIR what it is.</p><p>Lets us start with an Op (Operations) is : An Op (operation) is the basic building block of MLIR. Everything in MLIR is an op. A Loop is an op. An Addition is an op. A function definition is an op. A matrix multiply is an op.</p><p>Think of an op as a self-contained unit of computation that has :</p><ul><li><p>A name that identifies what it does</p></li><li><p>Inputs (called operands)</p></li><li><p>Outputs (called results)</p></li><li><p>Attributes (fixed properties known at compile time)</p></li><li><p>A region (optional &#8212; some ops contain other ops inside them)</p></li></ul><p>Lets take an example of a simple addition op :</p><pre><code><code>%result = arith.addi %a, %b : i32
</code></code></pre><p>Now here as we can see :</p><ul><li><p><code>%result</code> &#8212; is the output</p></li><li><p><code>arith.addi</code> &#8212; the name of the op (<code>addi</code> means integer addition, in the <code>arith</code> dialect)</p></li><li><p><code>%a, %b</code> &#8212; these are the inputs</p></li><li><p><code>i32</code> &#8212; this specifies the type of inputs which are 32-bit integer</p></li></ul><p>A loop op looks like this :</p><pre><code><code>affine.for %i = 0 to 10 {
    // other ops live inside here
}
</code></code></pre><p>Now this op has no output, its input is the loop bounds, and it has a region - the loop body, which contains other ops inside it.</p><p>The thing to understand is that ops are composable. We build programs by nesting ops inside ops inside ops. One way to understand this is, lets say a function contains a loop op, the loop op contains arithmetic ops. This nesting is how MLIR represents complex programs.</p><h3>Now dialects</h3><p>Think of dialect as a collection of ops that belong together, plus the rules about how those ops behave.</p><p>It is basically like a vocabulary for a specific domain. What I mean by that is, the english language has specialized vocab for medicine, law and engineering. A doctor and a lawyer both speak English, but they use different specialized terms of their domain. Dialects in MLIR work the same way.</p><p>Each Dialect defines ops that make sense for its domain :</p><ul><li><p>The <code>arith</code> dialect defines arithmetic ops - addition, subtraction, multiplication, comparison</p></li><li><p>The <code>affine</code> dialect defines ops for affine loop nests - <code>affine.for</code>, <code>affine.if</code>, <code>affine.load</code>, <code>affine.store</code></p></li><li><p>The <code>linalg</code> dialect defines named linear algebra ops - <code>linalg.matmul</code>, <code>linalg.conv2d</code></p></li><li><p>The <code>llvm</code> dialect defines ops that map directly to LLVM IR instructions</p></li><li><p>The <code>gpu</code> dialect defines ops for GPU computation</p></li></ul><p>The important thing to notice here is the naming convention, every op is prefixed with its dialect name. <code>affine.for</code> is the <code>for</code> op in the <code>affine</code> dialect. <code>linalg.matmul</code> is the <code>matmul</code> op in the <code>linalg</code> dialect. This makes it immediately clear which dialect an op belongs to.</p><h3>Why multiple dialects instead of one</h3><p>The reason we have multiple dialects is because earlier we only used to have one IR level which was LLVM IR and it was too low-level and we were missing out on high-level optimizations.</p><p>So now different dialects represent different levels of abstraction and that is why we have multiple dialects.</p><p>Now at the highest level a matrix multiple can be best represented as a single <code>linalg.matmul</code> op. It&#8217;s clean, it carries semantic meaning, and it&#8217;s easy to apply high level optimizations to it. But at the low level, that same matrix multiply needs to be broken down into explicit loops, memory accesses, and arithmetic, because that&#8217;s what the hardware actually executes.</p><p>Now since we have multiple dialects we can represent the same computation at multiple levels simultaneously and gradually transform it from one level to the next.</p><p>Lets take an example to understand what I mean by that :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oicy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oicy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 424w, https://substackcdn.com/image/fetch/$s_!Oicy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 848w, https://substackcdn.com/image/fetch/$s_!Oicy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!Oicy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oicy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png" width="1456" height="1018" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1018,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:245952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199889911?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Oicy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 424w, https://substackcdn.com/image/fetch/$s_!Oicy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 848w, https://substackcdn.com/image/fetch/$s_!Oicy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!Oicy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c3cc07b-5eaf-456a-b03a-6e2ecd33129e_1688x1180.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>High level - linalg dialect:</strong></p><pre><code><code>linalg.matmul ins(%A, %B) outs(%C)
</code></code></pre><p>One op. Clean. Carries the full semantic meaning &#8220;this is a matrix multiply.&#8221;</p><p><strong>Mid level - affine dialect:</strong></p><pre><code><code>affine.for %i = 0 to %N {
    affine.for %j = 0 to %N {
        affine.for %k = 0 to %N {
            %a = affine.load %A[%i, %k]
            %b = affine.load %B[%k, %j]
            %c = affine.load %C[%i, %j]
            %mul = arith.mulf %a, %b
            %add = arith.addf %c, %mul
            affine.store %add, %C[%i, %j]
        }
    }
}
</code></code></pre><p>Explicit loops. This is where polyhedral analysis and tiling happen.</p><p><strong>Low level - llvm dialect:</strong></p><pre><code><code>// Individual load instructions, pointer arithmetic,
// SIMD vector operations, branch instructions
</code></code></pre><p>Hardware-level ops. Ready for code generation.</p><p>All three represent the same computation. The compiler moves from one level to the next through a process called <strong>lowering</strong>.</p><h2>How lowering works</h2><p>Lowering essentially is the process of taking ops from a higher level dialect and replacing them with equivalent ops from a lower level dialect. We are not changing what the program computes, we are just changing how it&#8217;s represented, making it more concrete and closer to what the hardware actually executes.</p><p>For example the <code>convert-linalg-to-affine-loops</code> pass finds every <code>linalg.matmul</code> op and replaces it with the explicit <code>affine.for</code> loop nest we showed above. The <code>lower-affine</code> pass finds every <code>affine.for</code> and replaces it with a more basic <code>scf.for</code> (structured control flow). The <code>convert-scf-to-cf</code> pass replaces structured loops with basic branch instructions.</p><p>Each step makes the representation more explicit and less abstract.</p><h3>The full lowering stack for matmul</h3><p>Here is what the complete journey looks like for a matrix multiply going from PyTorch all the way to hardware:</p><pre><code><code>PyTorch: model(x)
    &#8595;  torch.compile
torch dialect: torch.aten.mm
    &#8595;  conversion pass
linalg dialect: linalg.matmul
    &#8595;  tiling pass (polyhedral!)
linalg dialect: tiled linalg.matmul
    &#8595;  lowering pass
affine dialect: affine.for loops    &#8592; We are here
    &#8595;  affine optimization passes (interchange, fusion)
affine dialect: optimized affine.for loops
    &#8595;  lowering pass
vector dialect: vector.contract ops (SIMD)
    &#8595;  lowering pass (choose target)
llvm dialect &#8594; x86 binary (CPU)
nvvm dialect &#8594; PTX code (NVIDIA GPU)
rocdl dialect &#8594; GPU binary (AMD GPU)
</code></code></pre><p>If you look at that stack carefully, the polyhedral world we&#8217;ve been talking about lives in the affine dialect section. Everything above it is higher level abstraction coming down toward it. Everything below it is lower level abstraction moving toward hardware.</p><p>Lets take one more example of lowering and see how the dialects connect with each other :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HNiX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HNiX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 424w, https://substackcdn.com/image/fetch/$s_!HNiX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 848w, https://substackcdn.com/image/fetch/$s_!HNiX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 1272w, https://substackcdn.com/image/fetch/$s_!HNiX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HNiX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png" width="692" height="570.3296703296703" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1200,&quot;width&quot;:1456,&quot;resizeWidth&quot;:692,&quot;bytes&quot;:220602,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199889911?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HNiX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 424w, https://substackcdn.com/image/fetch/$s_!HNiX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 848w, https://substackcdn.com/image/fetch/$s_!HNiX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 1272w, https://substackcdn.com/image/fetch/$s_!HNiX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec2286e8-474a-474f-867f-65e1298a8e24_1560x1286.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is MLIR&#8217;s multi-level lowering pipeline for a matrix multiplication. High-level operations such as <code>linalg.matmul </code>are progressively lowered through different dialects, each representing the computation at a different level of abstraction. Polyhedral optimizations such as tiling, loop interchange, fusion, and dependence analysis are performed in the Affine dialect, where iteration domains and memory accesses are explicitly represented. Once optimized, the same transformed program can be lowered to multiple hardware targets (CPU, GPU, TPU) without requiring separate polyhedral optimization pipelines for each backend.</p><h3>Why this is so much better than classical tools</h3><p>In the classical polyhedral compilation, remember the problem, the polyhedral tool lived outside the compiler, spoke a different language, and required extraction and reimport.</p><p>In MLIR, lowering solves this completely. Every step in that stack above is just a pass operating on MLIR IR. There is no external tool. There is no extraction. There is no reimport.</p><p>Now till now we know MLIR is a framework of dialects, each one representing computation at a different level of abstraction, and that lowering moves code from one dialect to the next. But what exactly is the <strong>affine dialect</strong>, what does it look like, and how does it connect to everything I wrote about in my previous blog - iteration domains, schedules, and access functions? That is what we will be discussing next.</p><h2>What is the affine dialect and why it exists</h2><p>Now if you remember from my previous blog that the polyhedral compilation only works on <strong>affine</strong> loops, which means loops where bounds and array accesses are linear functions of the loop variables. The reason for that was that expressions create flat geometric shapes that the compiler can reason about mathematically.</p><p>Now the problem that we had in LLVM IR was that there was no way to look at a loop and immediately know if its affine or not. LLVM IR is too low level - it represents loops as basic blocks and branch instructions. To do polyhedral analysis at the LLVM level, Polly had to work hard just to figure out which loops were even candidates for optimization.</p><p>This problem is now solved by the affine dialect. The affine dialect solves this by making affineness a <strong>guarantee built into the IR itself</strong>. If a loop is expressed using <code>affine.for</code>, the compiler already knows it&#8217;s affine by definition. You can&#8217;t write a non-affine loop using affine dialect ops. The dialect simply doesn&#8217;t allow it.</p><p>This is the core reason affine dialect exists: to give polyhedral compilation a native home in the compiler where the affine property is enforced structurally, and not discovered after.</p><h3><strong>What the affine dialect covers</strong></h3><p>The affine dialect is basically a collection of ops specifically designed for affine loop nests. It covers :</p><ul><li><p>Loop control &#8212; <code>affine.for</code> and <code>affine.if</code></p></li><li><p>Memory access &#8212; <code>affine.load</code> and <code>affine.store</code></p></li><li><p>Mathematical mappings &#8212; called <code>affine maps</code></p></li></ul><p>Everything in the affine dialect is guaranteed to be analysable by polyhedral tools. The moment our code is in the affine dialect, the compiler can immediately start asking polyhedral questions - what are the dependencies, what&#8217;s the best schedule, where should we tile - without any prior analysis to check eligibility.</p><p>If you remember the three mathematical objects from the previous blog, in the affine dialect those are represented as :</p><ul><li><p><strong>Iteration domain</strong> &#8212; represented by <code>affine.for</code> bounds</p></li><li><p><strong>Schedule</strong> &#8212; reflected by the nesting order of <code>affine.for</code> ops (though schedules are more generally represented as ordering functions over iterations)</p></li><li><p><strong>Access function</strong> &#8212; represented by affine maps in <code>affine.load</code> and <code>affine.store</code></p></li></ul><h3>Now let us look at some of the ops in the affine dialect</h3><h3>1. affine.for</h3><p><code>affine.for</code> is the affine dialect&#8217;s loop op. It represents exactly the kind of affine loop we&#8217;ve been writing throughout my previous blog.</p><p>The basic structure looks like this :</p><pre><code><code>affine.for %i = 0 to 10 {
    // loop body
}
</code></code></pre><p>Here <code>%i</code> is the loop induction variable, this is the <code>i</code> from our iteration domain.</p><p><code>0</code> is the lower bound. <code>10</code> is the upper bound.</p><p>The loop body contains other ops that execute for each value of <code>%i</code>.</p><p>A nested loop looks like this :</p><pre><code><code>affine.for %i = 0 to %N {
    affine.for %j = 0 to %N {
        // loop body
    }
}
</code></code></pre><p>This is our 2D iteration domain <code>0 &#8804; i &lt; N, 0 &#8804; j &lt; N</code> expressed directly as IR.</p><p>Now the nesting of the two <code>affine.for</code> ops encodes the schedule i.e <code>%i</code> is the outer loop and <code>%j</code> is the inner loop. If we want to interchange the loops, we can just swap the nesting order.</p><p>The important thing to remember here is that the <strong>bounds must be affine</strong>. You can&#8217;t have an <code>affine.for</code> with non affine bounds. The upper and lower bounds of an <code>affine.for</code> must be affine functions of surrounding loop variables or constants.</p><p>Lets take an example :</p><p>This is valid since lower bound linearly depends on <code>%i</code></p><pre><code><code>affine.for %j = %i to %N {   // lower bound depends linearly on %i
</code></code></pre><p>But this is not allowed in the affine dialect :</p><pre><code><code>affine.for %j = 0 to %i * %i {   // quadratic &#8212; not affine
</code></code></pre><p>The dialect enforces affineness at the IR level. This is what makes polyhedral analysis possible without any preliminary checking.</p><p></p><h3>2. affine.if</h3><p><code>affine.if</code> is the conditional op in the affine dialect. It allows conditional execution inside the affine loops, but with an important restriction, the condition must be an affine constraint, meaning it must be a linear inequality involving loop variables.</p><p>Lets take an example to understand this.</p><p>Generally inside a nested loop this if condition is written like this :</p><pre><code><code>for(i=0; i&lt;N; i++)
{
    for(j=0; j&lt;N; j++)
    {
        if(i &lt;= j)
        {
            ...
        }
    }
}
</code></code></pre><p>But the same thing in MLIR is written like :</p><pre><code><code>affine.for %i = 0 to %N {
  affine.for %j = 0 to %N {

    affine.if affine_set&lt;(i,j):(j - i &gt;= 0)&gt;(%i,%j) {

      ...
    }
  }
}
</code></code></pre><p>Now the thing to focus on is the <code>affine.if</code> op.</p><pre><code><code>affine.if affine_set&lt;(i,j):(j - i &gt;= 0)&gt;(%i,%j)
</code></code></pre><p>Think of it as :</p><pre><code><code>affine_set&lt;(dimensions):(constraints on those dimensions)&gt;(actual values)
</code></code></pre><p>These are <strong>symbolic variables</strong> used inside the affine set definition. They are not the actual MLIR values.</p><p>Think of them like variables in a math equation.</p><p>For example:</p><pre><code><code>(i,j)
</code></code></pre><p>means: I am defining constraints involving two dimensions called i and j.</p><p>Coming to the constraint</p><pre><code><code>(j - i &gt;= 0)
</code></code></pre><p>This is the affine constraint.</p><p>Equivalent to:</p><pre><code><code>j &gt;= i
</code></code></pre><p>This defines the valid region.</p><p>Geometrically it means All points on or above the diagonal j=i</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZfDd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZfDd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 424w, https://substackcdn.com/image/fetch/$s_!ZfDd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 848w, https://substackcdn.com/image/fetch/$s_!ZfDd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 1272w, https://substackcdn.com/image/fetch/$s_!ZfDd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZfDd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png" width="1354" height="960" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:960,&quot;width&quot;:1354,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:120405,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199889911?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZfDd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 424w, https://substackcdn.com/image/fetch/$s_!ZfDd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 848w, https://substackcdn.com/image/fetch/$s_!ZfDd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 1272w, https://substackcdn.com/image/fetch/$s_!ZfDd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f07008c-d88d-45c1-a87e-e64454c5cba9_1354x960.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><code>affine_set&lt;...&gt;</code></h2><p>Putting the above parts together:</p><pre><code><code>affine_set&lt;(i,j):(j-i&gt;=0)&gt;
</code></code></pre><p>means: Create a set of all integer points (i,j) that satisfy j-i&#8805;0.</p><h3>Actual Values</h3><h2><code>(%i,%j)</code></h2><p>Now we get to the important part.</p><pre><code><code>(%i,%j)
</code></code></pre><p>These are the <strong>actual MLIR SSA values</strong>.</p><p>Example:</p><pre><code><code>affine.for %i = 0 to 100 {
  affine.for %j = 0 to 100 {

    affine.if affine_set&lt;(i,j):(j-i&gt;=0)&gt;(%i,%j) {
      ...
    }

  }
}
</code></code></pre><p>Here:</p><pre><code><code>symbolic dimension i  &#8592; actual loop variable %i
symbolic dimension j  &#8592; actual loop variable %j
</code></code></pre><p>The compiler substitutes:</p><pre><code><code>i = %i
j = %j
</code></code></pre><p>and checks whether:</p><pre><code><code>%j - %i &gt;= 0
</code></code></pre><p>is true or not.</p><p>The most important thing to remember is the constraint <code>j - i &#8805; 0</code> is a linear inequality and it defines a half-space in the iteration domain. This is exactly the kind of condition polyhedral tools can reason about. The compiler can look at this and say &#8220;for these iterations the condition is true, for those iterations it&#8217;s false&#8221; purely from the geometry.</p><p></p><h3>3. Affine Maps</h3><p>An affine map is the mathematical heart of the affine dialect. It&#8217;s actually a function that maps a set of input dimensions to a set of output dimensions using only affine expressions. The notation look like this :</p><pre><code><code>affine_map&lt;(d0, d1) -&gt; (d0, d1 + 1)&gt;
</code></code></pre><p>Here there are two input dimensions <code>d0</code> and <code>d1</code>, which produce two output dimensions <code>d0</code> (unchanged) and <code>d1 + 1</code> (shifted by one).</p><p>Lets understand this in detail with an example :</p><h4>First forget MLIR for a minute</h4><p>Suppose you have this loop:</p><pre><code><code>for(int i = 0; i &lt; N; i++) {
&#9;&#9;A[i+1] = A[i] + 1;
}
</code></code></pre><p>Let&#8217;s look at one iteration:</p><pre><code><code>i = 0  &#8594; accesses A[0] and A[1]
i = 1  &#8594; accesses A[1] and A[2]
i = 2  &#8594; accesses A[2] and A[3]
</code></code></pre><p>Notice what&#8217;s happening here.</p><p>The loop variable is:</p><pre><code><code>i
</code></code></pre><p>But the memory location being accessed is:</p><pre><code><code>i + 1
</code></code></pre><p>So there is a function:</p><pre><code><code>iteration point &#8594; memory location
</code></code></pre><p>which is:</p><pre><code><code>i &#8594; i + 1
</code></code></pre><p>This is exactly what an affine map represents.</p><p>Think of affine maps as coordinate translators, suppose someone asks : &#8220;At iteration i = 5, which memory location are you touching?&#8221;</p><p>This answer is :</p><pre><code><code>i &#8594; i + 1
5 &#8594; 6
</code></code></pre><p>The affine map is simply the rule that performs this translation.</p><p>Let us take a 2D example</p><p>Now consider:</p><pre><code><code>for(int i = 0; i &lt; N; i++){
&#9;for(int j = 0; j &lt; N; j++){
&#9;&#9;&#9;A[i][j+1]=0;
&#9; }
}
</code></code></pre><p>The iteration space is:</p><pre><code><code>(i,j)
</code></code></pre><p>For example:</p><pre><code><code>(2,3)
</code></code></pre><p>means:</p><pre><code><code>i=2
j=3
</code></code></pre><p>Which memory location do we touch?</p><pre><code><code>A[2][4]
</code></code></pre><p>because:</p><pre><code><code>(i,j) &#8594; (i,j+1)
</code></code></pre><p>This transformation is represented in MLIR as:</p><pre><code><code>affine_map&lt;(d0,d1) -&gt; (d0,d1+1)&gt;
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RlmV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RlmV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 424w, https://substackcdn.com/image/fetch/$s_!RlmV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 848w, https://substackcdn.com/image/fetch/$s_!RlmV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 1272w, https://substackcdn.com/image/fetch/$s_!RlmV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RlmV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png" width="1456" height="664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:664,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:138907,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199889911?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RlmV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 424w, https://substackcdn.com/image/fetch/$s_!RlmV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 848w, https://substackcdn.com/image/fetch/$s_!RlmV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 1272w, https://substackcdn.com/image/fetch/$s_!RlmV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81ae878a-d5e6-457e-a7f6-80068dfa03b2_1596x728.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Affine maps are how the compiler represents access functions precisely and mathematically inside the IR. Because since these are linear functions, the compiler can directly do algebra on them and compute their intersection, check if two accesses touch the same memory location, and then derive the dependency vectors.</p><p></p><h3>Common affine map patterns</h3><p><strong>Identity access &#8212; </strong><code>A[i][j]</code><strong>:</strong></p><pre><code><code>affine_map&lt;(d0, d1) -&gt; (d0, d1)&gt;
</code></code></pre><p><strong>Shifted access &#8212; </strong><code>A[i][j-1]</code><strong>:</strong></p><pre><code><code>affine_map&lt;(d0, d1) -&gt; (d0, d1 - 1)&gt;
</code></code></pre><p><strong>Strided access &#8212; </strong><code>A[2*i][j]</code><strong>:</strong></p><pre><code><code>affine_map&lt;(d0, d1) -&gt; (d0 * 2, d1)&gt;
</code></code></pre><p><strong>Tiled access &#8212; </strong><code>A[ii + i][jj + j]</code><strong>:</strong></p><pre><code><code>affine_map&lt;(d0, d1, d2, d3) -&gt; (d0 + d2, d1 + d3)&gt;
</code></code></pre><p>That last one is the tiled matmul access pattern &#8212; four loop variables (tile indices <code>ii</code>, <code>jj</code> and intra-tile indices <code>i</code>, <code>j</code>) mapping to a 2D array location.</p><p></p><h3>4. affine.load and affine.store</h3><p><code>affine.load</code> and <code>affine.store</code> are the memory access ops in the affine dialect. They are how iterations read from and write to arrays.</p><p>A load looks like this :</p><pre><code><code>%val = affine.load %A[%i, %j] : memref&lt;?x?xf32&gt;
</code></code></pre><p>Here, a value from array <code>%A</code> at position <code>[%i, %j]</code> is loaded, where the result is a 32-bit float.</p><p>You might ask why not just use load ?</p><blockquote><p>LLVM already has load instructions. Why do we need affine.load?</p></blockquote><p>Because Affine dialect wants memory accesses to be <strong>mathematically analyzable</strong>.</p><p>For example:</p><pre><code><code>affine.load %A[%i + %j]
</code></code></pre><p>The compiler immediately knows:</p><pre><code><code>Access Function:
(i,j) &#8594; i+j
</code></code></pre><p>which is affine.</p><p>Remember what we learned about affine maps?</p><pre><code><code>Iteration Point
      &#8595;
Memory Location
</code></code></pre><p><code>affine.load</code> is where that mapping is actually used.</p><p>A store looks like this:</p><pre><code><code>affine.store %val, %C[%i, %j] : memref&lt;?x?xf32&gt;
</code></code></pre><p>Reading this: store the value <code>%val</code> into array <code>%C</code> at position <code>[%i, %j]</code>.</p><h3>Why Polyhedral Analysis Cares</h3><p>Imagine:</p><pre><code><code>A[i][j]=A[i][j-1]+1;
</code></code></pre><p>In MLIR:</p><pre><code><code>%prev = affine.load %A[%i, %j - 1]

affine.store %new, %A[%i, %j]
</code></code></pre><p>Compiler now sees:</p><pre><code><code>Load:
(i,j) &#8594; (i,j-1)

Store:
(i,j) &#8594; (i,j)
</code></code></pre><p>Immediately it can ask:</p><pre><code><code>Can a store from one iteration become a load in another iteration?
</code></code></pre><p>This is exactly dependence analysis.</p><p>The indices <code>[%i, %j]</code> in a load or store are not just arbitrary expressions, they must be affine maps. Under the hood, every <code>affine.load</code> and <code>affine.store</code> carries an explicit affine map that describes the access pattern.</p><p>So when we write:</p><pre><code><code>%val = affine.load %A[%i, %j - 1]
</code></code></pre><p>MLIR internally represents this as:</p><pre><code><code>%val = affine.load %A[affine_map&lt;(d0,d1)-&gt;(d0, d1-1)&gt;(%i, %j)]
</code></code></pre><p>The affine map <code>(d0, d1) -&gt; (d0, d1-1)</code> is the access function from my previous blog - <code>(i,j) &#8594; (i, j-1)</code>. It&#8217;s stored explicitly in the IR, not inferred from text.</p><h3>Why this matters for dependence analysis</h3><p>Because every <code>affine.load</code> and <code>affine.store</code> carries an explicit affine map, the compiler can inspect any two memory ops and immediately ask &#8220;can these access the same location?&#8221; It compares the two affine maps mathematically.</p><p>For example consider these two ops inside a loop:</p><pre><code><code>affine.store %val, %A[%i, %j]        // writes A[i][j]
%x = affine.load %A[%i, %j - 1]     // reads A[i][j-1]
</code></code></pre><p>The compiler compares the maps:</p><ul><li><p>Store map: <code>(i,j) &#8594; (i, j)</code></p></li><li><p>Load map: <code>(i,j) &#8594; (i, j-1)</code></p></li></ul><p>It asks: &#8220;is there any <code>(i,j)</code> where these produce the same output?&#8221; That means solving <code>(i, j) = (i, j-1)</code> which gives <code>j = j-1</code> &#8212; impossible. So these two ops never access the same location &#8212; no dependence.</p><p>Now consider:</p><pre><code><code>affine.store %val, %A[%i, %j]        // iteration (i,j) writes A[i][j]
%x = affine.load %A[%i, %j + 1]     // iteration (i,j) reads A[i][j+1]
</code></code></pre><p>The compiler asks: &#8220;is there any pair of iterations where the store and load touch the same element?&#8221; That means: does <code>(i&#8321;, j&#8321;) = (i&#8322;, j&#8322; + 1)</code> have a solution within the loop bounds? Yes because when <code>i&#8321; = i&#8322;</code> and <code>j&#8321; = j&#8322; + 1</code>. Dependence exists, distance vector will be <code>(0, 1)</code>.</p><p>This is basically our Omega test from my previous blog, but now it&#8217;s running natively inside the compiler on IR nodes, and not on extracted text handed to an external tool.</p><p>Lets take a complete example to summarise all of this :</p><pre><code><code>// for (i=0; i&lt;N; i++)
//   for (j=1; j&lt;N; j++)
//     A[i][j] = A[i][j-1] + 1

affine.for %i = 0 to %N {
    affine.for %j = 1 to %N {
        %val = affine.load %A[%i, %j - 1]   // access function: (i,j) &#8594; (i, j-1)
        %c1 = arith.constant 1 : i32
        %new = arith.addi %val, %c1
        affine.store %new, %A[%i, %j]        // access function: (i,j) &#8594; (i, j)
    }
}
</code></code></pre><p>Every polyhedral concept is present here:</p><ul><li><p><strong>Iteration domain</strong> &#8212; encoded in the <code>affine.for</code> bounds: <code>0 &#8804; i &lt; N, 1 &#8804; j &lt; N</code></p></li><li><p><strong>Schedule</strong> &#8212; encoded in the nesting order: <code>i</code> outer, <code>j</code> inner</p></li><li><p><strong>Access functions</strong> &#8212; encoded in the affine maps of <code>affine.load</code> and <code>affine.store</code></p></li><li><p><strong>Dependence</strong> &#8212; derivable by comparing the two affine maps mathematically</p></li></ul><p>The compiler has everything it needs to run polyhedral analysis, apply tiling, check legality, and generate optimized code, all natively, without leaving the IR, without external tools, without extraction or reimport.</p><p>This is why the affine dialect exists.</p><h2>Conclusion</h2><p>Polyhedral Compilation is a mathematically powerful technique, but its traditional implementation had a fundamental architectural limitation, which was that the tools lived outside the compiler. Every time a transformation was needed, the compiler had to extract the loop, hand it to an external tool, and reimport the result. During this whole process there was information loss at every boundary, passes could not interleave, and targeting multiple hardware backends required completely separate pipelines.</p><p>MLIR addressed these issues by making polyhedral analysis native to the compiler IR through the affine dialect.</p><p>The three objects that polyhedral compilation depends on - iteration domains, schedules, and access functions - are no longer computed from the code after the fact. They are encoded directly in the IR. Every <code>affine.for</code> bound is an iteration domain. Every nesting order is a schedule. Every <code>affine.load</code> and <code>affine.store</code> carries its access function as an explicit affine map. This means the compiler can perform dependence analysis, apply transformations, and generate optimized code without ever leaving the IR or crossing a tool boundary.</p><p>This is what makes MLIR&#8217;s approach to polyhedral compilation fundamentally different from what came before, its not better math, but better infrastructure around the same math.</p><p><em>I wrote this as part of my ongoing effort to understand polyhedral compilation and how it fits into modern compiler infrastructure. The goal of this particular post was to build a clear intuition for why classical polyhedral tools had architectural limitations and how MLIR addresses them through the affine dialect and not as a rigorous treatment of the subject, but as an honest attempt to explain the concepts in a way that actually makes sense.</em></p><p><em>If you notice any mistakes, oversimplifications, or missing context, I&#8217;d genuinely appreciate you pointing them out. There is a lot of depth to this topic and I am still working through it.</em></p>]]></content:encoded></item><item><title><![CDATA[Polyhedral Compilation: How Compilers Turn Loops Into Geometry]]></title><description><![CDATA[A deep intuition-first dive into affine loops, dependence analysis, Pluto scheduling, tiling, and how modern compilers mathematically optimize code.]]></description><link>https://sajidzubair.substack.com/p/polyhedral-compilation-how-compilers</link><guid isPermaLink="false">https://sajidzubair.substack.com/p/polyhedral-compilation-how-compilers</guid><dc:creator><![CDATA[Sajid Zubair]]></dc:creator><pubDate>Mon, 25 May 2026 05:37:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YJ3O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>What is Polyhedral Compilation ?</h2><p>The core of polyhedral compilation is basically a mathematical framework for analysing and transforming loops in programs. Before diving into the math, let us first understand what problem is this solving.</p><h2>The Problem It Solves</h2><p>Modern CPUs are fast but memory is slow. What I mean by that is that when a program runs nested loops like matrix multiplication, the pattern in which you access memory determines if your program is gonna run in 1 second or 10 seconds. The question is : &#8220;can a compiler automatically figure out the best loop structure for a given piece of hardware?&#8221; That&#8217;s exactly what polyhedral compilation does.</p><p>Before we dive in I think there are couple of concepts we should be aware about. So lets understand those first.</p><h2>What is an &#8220;Affine Program&#8221;?</h2><p>Now the first and foremost step is to understand an &#8220;Affine Program&#8221;. Polyhedral compilation works on a restricted but very common class of loops called the affine loops. These have :</p><ol><li><p>Loop bounds that are linear functions of outer loop variables or constants (e.g., <code>i = 0 to N</code>, <code>j = i to N</code>)</p></li><li><p>Array accesses that are linear (affine) functions of loop variables (e.g.,  <code>A[i+j]</code>, not <code>A[i*j]</code>)</p></li><li><p>No data-dependent control flow inside the loop body</p></li></ol><p>These are basically programs where everything behaves in a <strong>predictable, straight-line way</strong> (linear).</p><p>Example :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G3P4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G3P4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 424w, https://substackcdn.com/image/fetch/$s_!G3P4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 848w, https://substackcdn.com/image/fetch/$s_!G3P4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 1272w, https://substackcdn.com/image/fetch/$s_!G3P4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G3P4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png" width="500" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20743,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G3P4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 424w, https://substackcdn.com/image/fetch/$s_!G3P4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 848w, https://substackcdn.com/image/fetch/$s_!G3P4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 1272w, https://substackcdn.com/image/fetch/$s_!G3P4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bdbdca0-bc68-4f76-8947-7580efa04f09_500x250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here as you can see everything is linear :</p><ol><li><p><code>i &lt; N</code> &#8594; simple</p></li><li><p><code>j = i to N</code> &#8594; depends linearly on <code>i</code></p></li><li><p><code>A[i + j]</code> &#8594; linear</p></li></ol><p>Everything is just addition, subtraction and constants.</p><p>Most scientific, ML, and image-processing kernels (matrix multiply, convolution, stencil computations) fall squarely into this class.</p><p>Now you might get a doubt that &#8220;why are we restricting to only linear/affine expressions?&#8221; The simple answer is that linear program form flat shapes (planes, boxes, cubes). Whereas non-linear bounds would give complex curved surfaces which we can&#8217;t evaluate easily.</p><h3>Why Does This Matter</h3><p>Computers (compilers) can easily understand and manipulate <strong>straight shapes</strong>, but not curved ones.</p><div><hr></div><h3>Affine case:</h3><p>Like a <strong>grid or rectangle</strong></p><p>You can:</p><ul><li><p>divide it into blocks (tiling)</p></li><li><p>reorder it</p></li><li><p>analyze it easily</p></li></ul><div><hr></div><h3>Non-affine case:</h3><p>Like a <strong>weird curved shape</strong></p><p>You can&#8217;t:</p><ul><li><p>divide it cleanly</p></li><li><p>reason about dependencies easily</p></li><li><p>apply math optimizations safely</p></li></ul><p>The reason we restrict to affine programs only is because they create <strong>simple, straight-line patterns</strong> that can be analyzed and optimized mathematically.</p><h2>The Three Mathematical Objects</h2><p>Now that we know what affine loops are, lets see their representation. Every affine loop nest gets represented as three things :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CX0l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CX0l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 424w, https://substackcdn.com/image/fetch/$s_!CX0l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 848w, https://substackcdn.com/image/fetch/$s_!CX0l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 1272w, https://substackcdn.com/image/fetch/$s_!CX0l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CX0l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png" width="1456" height="574" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:574,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:149770,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CX0l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 424w, https://substackcdn.com/image/fetch/$s_!CX0l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 848w, https://substackcdn.com/image/fetch/$s_!CX0l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 1272w, https://substackcdn.com/image/fetch/$s_!CX0l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F242f09ac-fc81-4f1f-83af-3ca999bda07e_1472x580.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Iteration Domain &#8594; WHERE computations happen</p></li><li><p>Schedule &#8594; WHEN computations happen</p></li><li><p>Access Function &#8594; WHAT memory each computation touches</p></li></ul><h3>Let Us Understand All Three Objects In Detail</h3><h4>Iteration domian</h4><p>Let us take the following loop as an example</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L6Bs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L6Bs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 424w, https://substackcdn.com/image/fetch/$s_!L6Bs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 848w, https://substackcdn.com/image/fetch/$s_!L6Bs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 1272w, https://substackcdn.com/image/fetch/$s_!L6Bs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L6Bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png" width="486" height="172" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f74d364b-b961-44f5-9b06-980f84aaa811_486x172.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:172,&quot;width&quot;:486,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19569,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L6Bs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 424w, https://substackcdn.com/image/fetch/$s_!L6Bs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 848w, https://substackcdn.com/image/fetch/$s_!L6Bs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 1272w, https://substackcdn.com/image/fetch/$s_!L6Bs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74d364b-b961-44f5-9b06-980f84aaa811_486x172.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Iteration domain basically tells us all (i,j) values where the loops runs.</p><p>So : <strong>0 &#8804; i &lt; N</strong> &amp; <strong>0 &#8804; j &lt; N</strong></p><p>Think of each iteration as a point in the plane. So you get (0,0), (0,1), &#8230;&#8230;, (N-1,N-1). Which will be just a grid of points.</p><p>In short, iteration domain means all the points where computation happens.</p><h4>Schedule</h4><p>This basically tells us in what order should we execute these points.</p><p>The default schedule is</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0FKX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0FKX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 424w, https://substackcdn.com/image/fetch/$s_!0FKX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 848w, https://substackcdn.com/image/fetch/$s_!0FKX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 1272w, https://substackcdn.com/image/fetch/$s_!0FKX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0FKX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png" width="308" height="60" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:60,&quot;width&quot;:308,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7815,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0FKX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 424w, https://substackcdn.com/image/fetch/$s_!0FKX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 848w, https://substackcdn.com/image/fetch/$s_!0FKX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 1272w, https://substackcdn.com/image/fetch/$s_!0FKX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1380faec-97e1-49b0-bf3e-d5cd903d9a0d_308x60.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Which means sort by :</p><ol><li><p>i</p></li><li><p>then j</p></li></ol><p>Execution order :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y2Wc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 424w, https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 848w, https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 1272w, https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png" width="450" height="108" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:108,&quot;width&quot;:450,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12224,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 424w, https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 848w, https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 1272w, https://substackcdn.com/image/fetch/$s_!Y2Wc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f2dc9a5-39e3-4d5f-a378-2ba11d116c4b_450x108.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>which is a row-wise execution.</p><p>The transformation that happen like (tiling, interchange, etc.) are just changing of the schedule.</p><h4>Access Function</h4><p>Now we ask, when we are at (i,j), which memory location do we access ?</p><p>Access function tells us which memory location each iteration touches. The reason Iteration functions are important is because this is where dependencies come from.</p><p><strong>The access function</strong> maps an iteration point to an array element. </p><p><code>A[i][j]</code> is the function <code>(i,j) &#8594; (i,j)</code>. </p><p><code>A[i][j+1]</code> is <code>(i,j) &#8594; (i, j+1)</code>.</p><h2>Example:</h2><pre><code><code>(i, j) reads A[i][j-1]
</code></code></pre><p>Which was written by:</p><pre><code><code>(i, j-1)
</code></code></pre><div><hr></div><h2>So dependency:</h2><pre><code><code>(i, j-1) &#8594; (i, j)
</code></code></pre><p>This is how the compiler knows:</p><ol><li><p>what must come before what</p></li><li><p>what transformations are safe</p></li></ol><h2>Dependency Analysis</h2><p>Now before transforming loops, the compiler must ensure it doesn&#8217;t break the required execution order between iterations. Basically it means if I reorder loops, the program should still produce the correct result.</p><p>That&#8217;s it, the entire topic revolves around this one thing.</p><p>The reason modern compiler reorder loops is to make programs run faster and they do this by :</p><ul><li><p>parallelizing loops</p></li><li><p>tiling loops for cache efficiency</p></li><li><p>interchanging loops</p></li><li><p>fusing loops</p></li><li><p>vectorizing loops</p></li></ul><p>For example the compiler may transform:</p><pre><code><code>for (i=0; i&lt;N; i++)
  for (j=0; j&lt;M; j++)
</code></code></pre><p>into:</p><pre><code><code>for (j=0; j&lt;M; j++)
  for (i=0; i&lt;N; i++)
</code></code></pre><p>But before doing this the compiler has to make sure it doesn&#8217;t change the meaning of the program because some iterations depend on others.</p><h4>When are Iterations Independent ?</h4><p>We can understand this using a very simple example :</p><pre><code><code>for (i = 0; i &lt; 4; i++)
   A[i] = 10;
</code></code></pre><p>Here each iteration touches a completely different memory location. So no iterations affect each other and the loop can run in parallel. So because of this the compiler can reorder iterations safely.</p><h4>When do Iterations Depend on Each Other ?</h4><p>Lets take another example for this :</p><pre><code><code>for (i = 1; i &lt; 4; i++)
  A[i] = A[i-1] + 1;
</code></code></pre><p>This might look similar but is very different in terms of how memory is accessed. Lets understand this in more detail.</p><h2>Iteration i = 1</h2><pre><code><code>A[1]=A[0]+1;
</code></code></pre><p>Touches:</p><ul><li><p>READ <code>A[0]</code></p></li><li><p>WRITE <code>A[1]</code></p></li></ul><h2>Iteration i = 2</h2><pre><code><code>A[2]=A[1]+1;
</code></code></pre><p>Touches:</p><ul><li><p>READ <code>A[1]</code></p></li><li><p>WRITE <code>A[2]</code></p></li></ul><p>The important thing to notice here is :</p><ol><li><p>At iteration i = 1, we write A[1]</p></li><li><p>At iteration i = 2, we read A[1]</p></li></ol><p>So both the iterations touch the same memory location. And because of this it creates a relationship :</p><p>i = 2 depends on i = 1</p><p>because iteration 2 needs the value produced by iteration 1.</p><h4>Dependence Vector</h4><p>Now that we know that some iterations are dependent on each other, dependence vector tells us how far apart are these dependent iterations.</p><p>Lets take this example again :</p><pre><code><code>for (i = 1; i &lt; 5; i++)
    A[i] = A[i-1] + 1;
</code></code></pre><p>We already know that iteration <code>i=2</code> depends on iteration <code>i=1</code> because <code>i=2</code> reads <code>A[1]</code> which was written by <code>i=1</code>.</p><p>When iteration <code>q</code> depends on iteration <code>p</code> (meaning <code>p</code> must run before <code>q</code>), the dependence vector is simply:</p><p><code>d = q - p</code></p><p>It&#8217;s the difference in coordinates between the two iterations. That&#8217;s it. For the loop above, <code>p = 1</code> and <code>q = 2</code>, so:</p><p><code>d = 2 - 1 = (1)</code></p><p>The vector <code>(1)</code> means: <em>&#8220;there is a dependence that jumps exactly 1 step forward in i.&#8221;</em> For all pairs in this loop:</p><ul><li><p><code>i=2</code> depends on <code>i=1</code> &#8594; <code>d = (1)</code></p></li><li><p><code>i=3</code> depends on <code>i=2</code> &#8594; <code>d = (1)</code></p></li><li><p><code>i=4</code> depends on <code>i=3</code> &#8594; <code>d = (1)</code></p></li></ul><p>Every dependence has the same distance. The dependence vector is uniform here.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C2hF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C2hF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 424w, https://substackcdn.com/image/fetch/$s_!C2hF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 848w, https://substackcdn.com/image/fetch/$s_!C2hF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 1272w, https://substackcdn.com/image/fetch/$s_!C2hF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C2hF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png" width="1400" height="854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:854,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98568,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C2hF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 424w, https://substackcdn.com/image/fetch/$s_!C2hF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 848w, https://substackcdn.com/image/fetch/$s_!C2hF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 1272w, https://substackcdn.com/image/fetch/$s_!C2hF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecdad4c6-42e9-4538-a920-ce831a613a3a_1400x854.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Dependence Direction Vector</h4><p>We know dependence vector tells us the exact numeric distance i.e +1 or +2 or (0, +1) etc.. but dependence direction vector tell us only the direction i.e forward ? , backward ? or same ?</p><p>Instead of numbers, it uses symbols :</p><ol><li><p>&#8220; &lt; &#8220; means Forward</p></li><li><p>&#8220; &gt; &#8220; means Backward</p></li><li><p>&#8220; = &#8220; means same</p></li></ol><p>Lets take a simple example to understand this clearly :</p><pre><code><code>for (i = 0; i &lt; N; i++)
  for (j = 1; j &lt; M; j++)
    A[i][j] = A[i][j-1] + 1;
</code></code></pre><p>At iteration (i,j) it reads A[i][j-1] which was written by (i,j-1). So dependence is (i,j-1) &#8594; (i,j)</p><p>Dependence Vector (Consumer - Producer) gives us :</p><p>(i,j) &#8212; (i,j-1) = (0, +1)</p><p>Meaning there is :</p><ol><li><p>No movement in i</p></li><li><p>forward movement in j</p></li></ol><h4>Why direction vector matter more</h4><p>Exact distances may change after transformations, but the direction should remain the same for the loop transformation to be legal.</p><p>Let us take Legal Loop Interchange :</p><p>Example 1 :</p><p>Suppose : direction is (=,&lt;)</p><p>After interchange is (&lt;,=)</p><p>This is still in forward direction, so this is Legal.</p><p>Example 2 :</p><p>If After interchange it produces this : (&gt;,=) then it basically means a backward dependence which is illegal. <strong>Because Consumer executes before Producer.</strong></p><p>So in short the Dependence vector tells us &#8220;Exactly how far the data flows&#8221; and Direction vector tells us &#8220;Which way the data flows&#8221;. A loop transformation is legal if it preserves the producer-before-consumer ordering, which usually means the transformed dependence direction vectors remain lexicographically forward (not backward).</p><h2>The Omega Test and Fourier&#8211;Motzkin Elimination</h2><p>So far we have understood dependency analysis conceptually:</p><ul><li><p>Two iterations depend on each other if they access the same memory location</p></li><li><p>At least one access is a write</p></li><li><p>The compiler represents accesses mathematically</p></li><li><p>Then it checks whether valid iteration pairs exist</p></li></ul><p>But now comes the important question : &#8220;How does the compiler actually solve these equations?&#8221;</p><p>This is where the &#8220;geometry&#8221; of polyhedral compilation comes into play. Underneath all the high-level compiler transformations is a math proof engine that answers questions like : &#8220;Do these constraints have a valid integer solution?&#8221;</p><h4>Lets understand the whole pipeline :</h4><ol><li><p>Lets start up with loop example</p></li></ol><pre><code><code>for (i = 1; i &lt; 5; i++)
    A[i] = A[i - 1] + 1;
</code></code></pre><p>We already know that Iteration 2 depends on Iteration 1 and that there is a dependence here.</p><ol><li><p>What the Compiler Whats to know</p></li></ol><p>The compiler asks &#8220;can two different iterations access the same array location?&#8221;. To reason this mathematically, it creates two copies of the loop variable :</p><ul><li><p><code>i&#8321;</code> &#8594; first iteration</p></li><li><p><code>i&#8322;</code> &#8594; second iteration</p></li></ul><p>Note : The compiler still doesn&#8217;t know that there is a dependence.</p><p>For dependence : They must access the same location, so compiler writes : i1 = i2 - 1</p><p>This equation basically means A value written by one iteration is read by another.</p><ol><li><p>Why loop bounds matter</p></li></ol><p>The compiler also knows :</p><pre><code><code>for (i = 1; i &lt; 5; i++)
</code></code></pre><p>So :</p><pre><code><code>1 &#8804; i1 &#8804; 5
1 &#8804; i2 &#8804; 5
</code></code></pre><p>Now the question that compiler asks is do integer values exist satisfying all these equations.</p><ol><li><p>Solve it manually</p></li></ol><p>We have :</p><pre><code><code>i&#8321; = i&#8322; - 1
1 &#8804; i&#8321; &lt; 5
1 &#8804; i&#8322; &lt; 5
</code></code></pre><p>If i2 = 2 then i1 = 1, which is valid. So dependency Exists.</p><h4>The Entire Compiler Problem is Basically This</h4><p>It repeatedly asks &#8220;Can I find valid integer values?&#8221;</p><p>If yes : Dependency Exists</p><p>If no : No Dependency</p><h4>Fourier-Motzkin Elimination</h4><p>This is basically a way of solving those equations by removing variables one-by-one until problem becomes simpler.</p><p>Suppose we have:</p><pre><code><code>x + y &#8804; 10
x &#8805; 2
</code></code></pre><p>Rewrite first equation:</p><pre><code><code>x &#8804; 10 - y
</code></code></pre><p>Now:</p><pre><code><code>2 &#8804; x &#8804; 10 - y
</code></code></pre><p>For this to be possible:</p><pre><code><code>2 &#8804; 10 - y
</code></code></pre><p>So:</p><pre><code><code>y &#8804; 8
</code></code></pre><p>Now the problem Fourier-Motzkin faces that it works over REAL numbers, but loops use intergers. This difference matters a lot.</p><p>Example :</p><pre><code><code>2x = 1
</code></code></pre><p>Over real numbers:</p><pre><code><code>x = 0.5
</code></code></pre><p>Valid answer, but loop iterations cannot be:</p><pre><code><code>i = 0.5
</code></code></pre><p>Iterations must be integers.</p><h4>Omega Test Intuition</h4><p>To solve our previous problem faced by using Fourier-Motzkin, this test is basically a smarter interger-aware elimination algorithm. It extends Fourier-Motzkin while carefully handling integer rules.</p><h2>Example</h2><p>Suppose compiler gets:</p><pre><code><code>2i&#8321; = 2i&#8322; + 1
</code></code></pre><p>Basically means:</p><pre><code><code>even = odd
</code></code></pre><p>Which is impossible. The Omega Test detects this quickly.</p><p>The Reason this matters is because after this the compiler know if there is any dependence or not. So that it safely parallelize or reorder loops.</p><p>Okay till now we know that the compiler builds an iteration domain, detects dependencies and checks if the transformations are legal or not. But the important part that we have to understand is detecting <strong>whether a transformation is legal</strong> is different from <strong>finding the best one</strong>.</p><p>There are infinitely many valid loop orderings for a given program. The compiler has to figure out which one to pick. That is basically the Scheduling problem.</p><h2>Scheduling Problem</h2><p>As I mentioned earlier, a schedule is just a function that assigns a timestamp to every iteration point.</p><p>The default schedule for</p><p><code>for i: for j:</code> is <code>&#952;(i,j) = (i, j)</code></p><p>meaning sort first by &#8220;i&#8221;, then by &#8220;j&#8221;</p><p>Now imagine if you are allowed to change that function. You can try :</p><ul><li><p><code>&#952;(i,j) = (j, i)</code> &#8594; loop interchange</p></li><li><p><code>&#952;(i,j) = (i/B, j/B, i, j)</code> &#8594; tiling</p></li><li><p><code>&#952;(i,j) = (i+j, j)</code> &#8594; skewing</p></li></ul><p>All of these schedules are different but each one visits the same set of points just in different order. This compiler is basically searching through this huge space of possibilities and asking : <strong>which one runs faster on this hardware?</strong></p><p>Now a valid schedule must satisfy two things simultaneously :</p><ol><li><p>Legality : Every dependency has to be respected. If iteration &#8220;p&#8221; must run before iteration &#8220;q&#8221;, then the schedule must assign &#8220;p&#8221; a timestamp that comes before &#8220;q&#8221;. In short the schedule must keep all dependences pointing &#8220;Forward&#8221;.</p></li><li><p>Optimality : Among all the legal schedules, pick the one that minimizes execution time. On a modern CPU this mostly means: minimize how often you fetch data from RAM, and maximize how much work you do with data while it&#8217;s still in cache. On a multi-core machine it also means: maximize the number of iterations that can run in parallel.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YJ3O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YJ3O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 424w, https://substackcdn.com/image/fetch/$s_!YJ3O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 848w, https://substackcdn.com/image/fetch/$s_!YJ3O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 1272w, https://substackcdn.com/image/fetch/$s_!YJ3O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YJ3O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png" width="1420" height="766" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:1420,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:154525,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YJ3O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 424w, https://substackcdn.com/image/fetch/$s_!YJ3O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 848w, https://substackcdn.com/image/fetch/$s_!YJ3O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 1272w, https://substackcdn.com/image/fetch/$s_!YJ3O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa48d187c-2417-40c4-8656-6fcc587cc38f_1420x766.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now the compiler cannot go through all of these options because its not just large - its infinite. So we need to find a better and more optimal way to do it, which is exactly what <strong>Pluto</strong> provides.</p><h2>The Pluto Algorithm</h2><p>Pluto&#8217;s central idea is clever and straight forward. Instead of saying &#8220;&#8217;let me try all the different loop transformations and see which one is fastest&#8221; it says : &#8220;let me directly search for a schedule that maximizes data reuse, subject to legality constraints - and let the loop transformations fall out as a consequence.&#8221;</p><h4>What Pluto is actually searching for</h4><p>Pluto is looking for a schedule of the form <code>&#952;(i,j) = c&#8321;&#183;i + c&#8322;&#183;j + c&#8320;</code> a linear function of the loop variables. The unknowns are the coefficients <code>c&#8321;, c&#8322;, c&#8320;</code>.</p><p>Finding the right coefficients indirectly means finding the right schedule.</p><p>For example:</p><ul><li><p>If <code>c&#8321;=1, c&#8322;=0</code> &#8594; sort by <code>i</code> first &#8594; original order</p></li><li><p>If <code>c&#8321;=0, c&#8322;=1</code> &#8594; sort by <code>j</code> first &#8594; interchange</p></li><li><p>If <code>c&#8321;=1, c&#8322;=1</code> &#8594; sort by <code>i+j</code> &#8594; skewing</p></li></ul><p>The transformation comes from these coefficients. Pluto doesn&#8217;t have a list of &#8220;try interchange, try tiling or try skewing&#8221;. It just solves for these coefficients and whatever is the solution, will be the transformation.</p><p>Lets try to understand a little back story to actually why we are using this Pluto Algorithm.</p><p>As I mentioned in the beginning of this blog, today&#8217;s CPUs are fast but memory is slow. What I actually mean by that is, whenever a program needs to run, it needs to fetch data. Now there are couple places where it can fetch the data from, it could either be RAM or cache.</p><p>Whenever a program needs data from RAM, it first tries to see if that data is already present in the cache.</p><ul><li><p>If the data is already there &#8594; then data is retrieved very fast (<strong>cache hit</strong>)</p></li><li><p>If not &#8594; it has to go all the way to RAM &#8594; which is slow (<strong>cache miss</strong>)</p></li></ul><p>So the biggest goal in loop optimization is:</p><blockquote><p>&#8220;Use data again before it gets kicked out of the cache.&#8221;</p></blockquote><p>This is what Pluto tries to optimize.</p><p>Let us understand using an example what I mean by &#8220;Use data again before it gets kicked out of the cache&#8221;</p><p>Example 1 : Consider this simple loop</p><pre><code><code>for (i = 0; i &lt; 3; i++) {
    for (j = 0; j &lt; 3; j++) {
        use(A[i]);
    }
}
</code></code></pre><p>For i = 0, notice that</p><pre><code><code>use(A[0])
use(A[0])
use(A[0])
</code></code></pre><p>The same value <code>A[0]</code> is reused many times immediately.</p><p>That is good for cache because once <code>A[0]</code> is loaded, it stays in cache while we keep using it for different iterations as well. Because of this it reduces the time fetching the data.</p><p>Example 2 :</p><pre><code><code>for (j = 0; j &lt; 3; j++) {
    for (i = 0; i &lt; 3; i++) {
        use(A[i]);
    }
}
</code></code></pre><p>Now the accesses become:</p><pre><code><code>A[0], A[1], A[2],
A[0], A[1], A[2],
A[0], A[1], A[2]
</code></code></pre><p>Here as we can see between two uses of <code>A[0]</code>, many other values were accessed. Maybe cache already evicted <code>A[0]</code> so if we tried to access A[0] again we would have to go the RAM. That is worse locality.</p><h4>What Pluto wants :</h4><p>Pluto basically asks &#8220;Can I reorder iterations so reused data is accessed closer together in time?&#8221;</p><p>Because :</p><ul><li><p>close together in time = likely still in cache</p></li><li><p>far apart in time = probably evicted</p></li></ul><p>So Pluto tries to make reused values happen as soon as possible.</p><h4><strong>The Cost Function (the actual optimization target) :</strong></h4><p>The cost function basically means &#8220;what number are we trying to minimize of maximize?&#8221;</p><p>For Pluto it is : <strong>Cost = total dependence distance</strong></p><p>Smaller cost means:</p><ul><li><p>producer and consumer iterations execute closer together</p></li><li><p>reused data stays in cache</p></li><li><p>fewer cache misses</p></li><li><p>all of which leads to faster program</p></li></ul><p>So Pluto searches for loop schedules that make this number as small as possible.</p><h2>How the Compiler Actually Solves It</h2><p>Now that we understand why Pluto wants better schedules, let&#8217;s try to understand how it actually computes them mathematically using <strong>ISL (Integer Set Library)</strong>.</p><p>Think of <strong>Pluto</strong> as an optimization strategy and <strong>ISL</strong> as the mathematical engine underneath all of this.</p><p>When Pluto decides &#8220;I want a schedule that minimizes reuse distance.&#8221; Then at that point ISL is the thing that actually manipulates the equations, solves the constraints, and computes the schedule.</p><p>Let us tie all of this together with an example :</p><h3>1. Represent Loops mathematically</h3><pre><code><code>for (i = 0; i &lt; N; i++)
    for (j = 0; j &lt; N; j++)
        S(i,j);
</code></code></pre><p>Now as we know Pluto converts this loop into an iteration space and each iteration becomes a point (i,j). So the entire loop becomes a 2D grid of points.</p><p>Lets take N = 3 :</p><pre><code><code>(0,0) (0,1) (0,2)
(1,0) (1,1) (1,2)
(2,0) (2,1) (2,2)
</code></code></pre><p>ISL represents this as a polyhedron :</p><p>0 &#8804; i &#8804; N, 0 &#8804; j &lt; N</p><p>This is the basically the mathematical representation of the loop.</p><h3>2. Dependencies Become Constraints</h3><p>Now suppose iteration (i,j) produces a value used by (i, j + 1). That means (i,j) has to be executed before (i,j+1). Otherwise the program will become invalid, this is Pluto encodes as the legality constraint.</p><h3>3. Pluto Introduces An Unknown Schedule</h3><p>As we discussed earlier that Pluto doesn&#8217;t try to find all the legal loop transformation, instead it solves a linear schedule instead.</p><p>Now Pluto assumes an unknown linear schedule:</p><p>&#952;(i,j) = c1i + c2j + c0</p><p>The coefficients :</p><pre><code><code>c&#8321;, c&#8322;, c&#8320;
</code></code></pre><p>are unknown. Finding these coefficients means finding the execution order.</p><h3>4. Converting Correctness Into Equations</h3><p>Now we already know that the <strong>Producer must be executed before Consumer</strong>.</p><p>So Pluto writes :</p><p>&#952;(i , j+1) &#8722; &#952;(i , j) &#8805; 1</p><p>Now substitute the schedule equation. After substitution we get :</p><p>(c1i + c2(j+1) + c0) &#8722; (c1i + c2j + c0) &#8805; 1</p><p>Everything cancels except :</p><p>c2 &#8805; 1</p><p>This is really important because the compiler just transformed program correctness into a linear inequality. ISL is actually really good at solving systems of inequalities like this.</p><h3>5. Locality Becomes The Optimization Objective</h3><p>At this point, Pluto has ensured that the schedule is correct, but that alone is not enough. There are many schedules which may satisfy the legality constraints. Now Pluto asks &#8220;Among all legal schedules, which one gives the best cache locality?&#8221;</p><p>This becomes the actual optimization objective.</p><p>Consider the slightly modified loop :</p><pre><code><code>for (i = 0; i &lt; N; i++)
    for (j = 0; j &lt; N; j++)
        use(A[i]);
</code></code></pre><p>Here you can notice the accessed value only depends on i and not j.</p><p>That means :</p><pre><code><code>(i=0,j=0) &#8594; A[0]
(i=0,j=1) &#8594; A[0]
(i=0,j=2) &#8594; A[0]
</code></code></pre><p>The same value is reused as <code>j</code> changes.</p><p>So reuse happens when : j changes while i stays fixed.</p><p>Because of this Pluto wants iterations differing only in &#8220;j&#8221;, to execute close together in time, because nearby accesses are likely to stay in cache.</p><p>Now consider the schedule agian :</p><p>&#952;(i , j) = c1i + c2j</p><p>The coefficient <strong>c2</strong> controls how quickly the time changes as &#8220;j&#8221; changes. If <strong>c2</strong> is very large, reused accesses become far apart in schedule time.</p><p>For example:</p><p>&#952;(i , j) = i+ 100j</p><p>gives:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QOFi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QOFi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 424w, https://substackcdn.com/image/fetch/$s_!QOFi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 848w, https://substackcdn.com/image/fetch/$s_!QOFi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 1272w, https://substackcdn.com/image/fetch/$s_!QOFi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QOFi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png" width="918" height="752" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:752,&quot;width&quot;:918,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37198,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QOFi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 424w, https://substackcdn.com/image/fetch/$s_!QOFi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 848w, https://substackcdn.com/image/fetch/$s_!QOFi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 1272w, https://substackcdn.com/image/fetch/$s_!QOFi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3af3cc5-b898-4ea1-b810-0b4d0edaebc6_918x752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Even though all these iterations reuse A[0], they execute very far apart in time. That increase the chances that A[0] gets evicted from cache before we get to reuse it.</p><p>So Pluto prefers smaller coefficients for dimensions where reuse occurs frequently. This intuition become the cost function (This is a simplified intuition for understanding locality optimization, not Pluto&#8217;s exact ILP objective) :</p><p><strong>Cost = c1 + c2</strong></p><p>Pluto now basically asks ISL to minimize the cost while still satisfying all legality constraints.</p><h3>6. ISL Solves The Integer Linear Program (ILP)</h3><p>At this point, Pluto has transformed the scheduling problem into pure mathematics.</p><p>It now has:</p><h3>Constraints (correctness)</h3><p>c2 &#8805; 1</p><h3>Objective (locality)</h3><p>min&#8289;(c1 + c2)</p><p>This is a standard Integer Linear Program (ILP) and ISL solves this system mathematically and returns the best legal coefficients.</p><h3>7. The schedule becomes a loop transformation</h3><p>Suppose ISL returns :</p><p>c1 = 0</p><p>c2 = 1</p><p>Then the schedule becomes : &#952;(i , j) = j</p><p>This means sort the iterations primarily by j. Which corresponds to loop interchange.</p><p>For a different loop with a diagonal dependence, ISL might return <code>c&#8321;=1, c&#8322;=1</code> giving:</p><p>&#952;(i , j) = i + j</p><p>then execution proceeds diagonally, which corresponds to loop skewing.</p><p>The important thing to know here is that Pluto never explicitly says :</p><p>&#8220;Perform interchange.&#8221;</p><p>&#8220;Perform skewing.&#8221;</p><p>&#8220;Perform tiling.&#8221;</p><p>It only tries to find the schedule coefficients that minimize reuse distance while preserving correctness.</p><p>Now Pluto has found the best loop order, but that alone is also not enough for large matrices. Even the best ordering still cause cache misses. So to solve this problem let us deep dive into Tiling.</p><h2>What Problem Does Tiling Solve</h2><p>Let us take an example of naive matmul and see why it destroys ours cache</p><pre><code><code>for (i = 0; i &lt; N; i++)
    for (j = 0; j &lt; N; j++)
        for (k = 0; k &lt; N; k++)
            C[i][j] += A[i][k] * B[k][j];
</code></code></pre><p>To understand the cache problem we need to know how arrays are stored in memory. They are stored in a <strong>row-major order.</strong> A 2D array A[row][col] is stored as one long line in memory. First all row 0, then all of row 1, and so on.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fgz4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fgz4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 424w, https://substackcdn.com/image/fetch/$s_!fgz4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 848w, https://substackcdn.com/image/fetch/$s_!fgz4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 1272w, https://substackcdn.com/image/fetch/$s_!fgz4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fgz4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png" width="1456" height="282" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:282,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65949,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/199097747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fgz4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 424w, https://substackcdn.com/image/fetch/$s_!fgz4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 848w, https://substackcdn.com/image/fetch/$s_!fgz4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 1272w, https://substackcdn.com/image/fetch/$s_!fgz4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c35754e-7cd4-45bf-823d-8b91eeb825cf_1496x290.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The important to understand is, the problem is NOT the multiplication itself, the problem is HOW memory is accessed.</p><p>Now now know that the actual memory layout is like this :</p><pre><code><code>[B00 B01 B02 B03 B10 B11 B12 B13 B20 B21 B22 B23]
</code></code></pre><p>Suppose we access :</p><pre><code><code>B[0][0]
B[0][1]
B[0][2]
B[0][3]
</code></code></pre><p>These are adjacent in memory and CPU cache loves this. The reason behind this is that cache loads memory in chunks called <strong>cache lines</strong>. Usually 64 bytes at once.</p><p>So when CPU loads : B[0][0] it automatically also fetches nearby values for free which are :</p><pre><code><code>B[0][1]
B[0][2]
B[0][3]
</code></code></pre><p>This is called <strong>Spatial Locality</strong> where nearby memory gets reused, which is very efficient.</p><h3>Now Lets Understand Why Columns Are Terrible</h3><p>Lets say we access :</p><pre><code><code>B[0][1]
B[1][1]
B[2][1]
</code></code></pre><p>Notice one thing here, that we are down a column. But remember our memory layout :</p><pre><code><code>[B00 B01 B02 B03 B10 B11 B12 B13 B20 B21 B22 B23]
</code></code></pre><p>If we were look carefully we can find that</p><pre><code><code>B[0][1] -&gt; position 1
B[1][1] -&gt; position 5
B[2][1] -&gt; position 9
</code></code></pre><p>these are FAR apart in memory.</p><p>Now the reason this is a huge problem is because lets say we have a matrix size :</p><p>N = 1024 and each float occupies 4 bytes. Then : B[0][1] to B[1][1] is</p><pre><code><code>1024 x 4 = 4096 bytes apart
</code></code></pre><p>But cache line size is only 64 bytes, so that means every access jumps FAR outside the current cache line. So every access needs a new RAM fetch which is going to be extremely slow and inefficient.</p><p>Now coming back to our original example of matrix multiplication :</p><pre><code><code>for (i = 0; i &lt; N; i++)
    for (j = 0; j &lt; N; j++)
        for (k = 0; k &lt; N; k++)
            C[i][j] += A[i][k] * B[k][j];
</code></code></pre><p>If notice carefully we have <strong>B[k][j]</strong> inside the innermost loop <strong>for(k).</strong> So here we are gonna access :</p><pre><code><code>B[0][j]
B[1][j]
B[2][j]
</code></code></pre><p>that means walking down a column which we now know is cache-unfriendly.</p><p>Now Pluto&#8217;s loop interchange definitely helps, it may discover :</p><pre><code><code>for(k)
    for(i)
        for(j)
</code></code></pre><p>Now for fixed K :</p><pre><code><code>B[k][0]
B[k][1]
B[k][2]
</code></code></pre><p>These are being accessed sequentially, which means walking across rows instead of columns. Which is a HUGE improvement.</p><p>But even with perfect order : A, B, and C are still HUGE. Entire matrices cannot fit in cache. So eventually cache eviction still happens. We need to still reload data repeatedly. That is where Tiling comes into play.</p><h2>The Gap That Tiling Fills</h2><p>Tiling says &#8220;Don&#8217;t work on the whole matrix at once.&#8221; Instead work on tiny block that fit in cache. For example instead of processing 1024 x 1024 matrix process a 64 x 64 block at a time.</p><h3>What the tiled loop looks like</h3><p>The naive loop:</p><pre><code><code>for (i = 0; i &lt; N; i++)
  for (j = 0; j &lt; N; j++)
    for (k = 0; k &lt; N; k++)
      C[i][j] += A[i][k] * B[k][j];
</code></code></pre><p>The tiled loop with tile size B:</p><pre><code><code>for (ii = 0; ii &lt; N; ii += B)       // step through row-tiles
  for (jj = 0; jj &lt; N; jj += B)     // step through col-tiles
    for (kk = 0; kk &lt; N; kk += B)   // step through depth-tiles
      for (i = ii; i &lt; ii+B; i++)   // work inside tile
        for (j = jj; j &lt; jj+B; j++)
          for (k = kk; k &lt; kk+B; k++)
            C[i][j] += A[i][k] * B[k][j];
</code></code></pre><p>The outer three loops (<code>ii</code>, <code>jj</code>, <code>kk</code>) move between tiles. The inner three loops (<code>i</code>, <code>j</code>, <code>k</code>) work inside one tile. The computation is identical, same multiplications, same additions, same result. But the memory access pattern is completely different.</p><p>Lets try to understand this by taking a small example :</p><h3>1. Take a tiny matrix</h3><p>Suppose : N = 4 , so matrices are 4x4. We&#8217;ll use tile size B = 2. So now the matrix gets divided into 2x2 blocks.</p><pre><code><code>+----+----+
| T1 | T2 |
+----+----+
| T3 | T4 |
+----+----+
</code></code></pre><p>Each tile is a small square.</p><h3>2. What We Do In Normal Multiplication</h3><p>To compute C[0][0] we load :</p><pre><code><code>A[0][0] A[0][1] A[0][2] A[0][3]
</code></code></pre><p>and</p><pre><code><code>B[0][0] B[1][0] B[2][0] B[3][0]
</code></code></pre><p>Now later to compute for C[0][1] we AGAIN need many of the same A values. But the cache may have already evicted them. So we reload them which is an inefficiency.</p><h3>3. What Tiling Does</h3><p>Instead of computing just C[0][0] we compute :</p><pre><code><code>C[0][0] C[0][1]
C[1][0] C[1][1]
</code></code></pre><p>All together. That 2x2 square is one tile.</p><h3>4. Now Visualize The Reuse</h3><p>To compute the output tile :</p><pre><code><code>+---------+
| C00 C01 |
| C10 C11 |
+---------+
</code></code></pre><p>we load one small tile from A and one from B.</p><p>A tile :</p><pre><code><code>A00 A01
A10 A11
</code></code></pre><p>B tile :</p><pre><code><code>B00 B01
B10 B11
</code></code></pre><p>Once A00 is loaded into cache it gets reused for BOTH C00 and C01. Similarly B00 gets reused for BOTH C00 and C10.</p><p>Tiling is one of those concepts that becomes instantly obvious the moment you see it animated, if the memory access pattern isn&#8217;t clicking yet, searching for &#8220;cache tiling visualization&#8221; or &#8220;loop blocking animation&#8221; on YouTube will make it snap into place in seconds.</p><p>As for where tiling comes from in the compiler, it doesn&#8217;t get added as a separate step after Pluto. When Pluto is allowed to search over schedules that have a tile-index dimension and an intra-tile dimension for the same loop variable, the ILP it solves naturally lands on a tiled solution. The reason is straightforward: tiling is precisely what minimizes reuse distance when the full matrix doesn&#8217;t fit in cache. Pluto doesn&#8217;t know the word &#8220;tiling&#8221; is, it just finds the schedule with the smallest cost, and that schedule, when written out as loop code, is a tiled loop. The math basically forces it there.</p><h2>Scanning The Polyhedron</h2><p>Now the question this topic answers is after Pluto runs, the compiler has a transformed polyhedron, a mathematical object described as a system of inequalities.</p><p>You can think of these inequalities as describing a geometric region containing all valid loop iterations.</p><p>The inequalities can be something like :</p><pre><code><code>0 &#8804; ii &lt; N, step B
0 &#8804; jj &lt; N, step B  
ii &#8804; i &lt; ii+B
jj &#8804; j &lt; jj+B
i &lt; N
j &lt; N
</code></code></pre><p>This is obviously not code and CPU cannot execute a system of inequalities. Someone has to look at this math object and write out the corresponding <code>for</code> loops with correct bounds. That process of turning a polyhedron back into loop code, is called <strong>scanning the polyhedron</strong>. That job is done by CLooG or ISL code generator.</p><h3>How do they work ?</h3><p>Now the question is:</p><blockquote><p>&#8220;How does ISL/CLooG look at inequalities and actually generate nested for loops?&#8221;</p></blockquote><p>The answer is simple :</p><p>They repeatedly ask: &#8220;For this dimension, what are the valid integer values?&#8221; That&#8217;s literally the entire algorithm.</p><h3>Lets start with the simplest inequalities</h3><p>Suppose ISL receives : 0 &#8804; i &lt; 4</p><p>This describes valid values of : i , So the compiler asks &#8220;What values can i take ?&#8221;</p><p>And the answer is : 0, 1, 2, 3</p><p>So code generator emits : for(i = 0; i &lt; 4; i++)</p><p>That&#8217;s it.</p><h3>Now Comes Tiling</h3><p>Suppose the inequalities are :</p><p>0 &#8804; ii &lt; N</p><p>ii &#8804; i &lt; ii + B</p><p>i &lt; N</p><p>Now compiler must generate loops for this particular inequality.</p><p>So compiler asks : &#8220;What values can ii take?&#8221;</p><p>Answer : 0 &#8804; ii &#8804; N with step size B.</p><p>So it emits :</p><pre><code><code>for(ii = 0; ii &lt; N; ii += B)
</code></code></pre><p>Now the compiler asks : &#8220;For this <strong>ii,</strong> what values can <strong>i</strong> take?&#8221; Compiler sees TWO upper bounds :</p><p>Constraint 1 : i &lt; ii + B (tile Boundary)</p><p>Constraint 2 : i &lt; N (matrix boundary)</p><p>Now to actually satisfy BOTH inequalities, we have to find the actual upper bound. And the answer to that is basically <strong>smaller of the two.</strong></p><p>Which becomes : i &lt; min(ii + B, N) and</p><p>The Lower bound comes from : ii &#8804; i</p><p>So the generated loop becomes :</p><pre><code><code>for(i = ii; i &lt; min(ii+B, N); i++)
</code></code></pre><p>The reason why this min() function is important because the tile wants to continue until ii + B, but the matrix itself ends at N. Most tiles fit perfectly, but the last tile may partially fall outside the matrix.</p><p>So the compiler chooses whichever boundary comes first. That is how min() appears, the compiler sees two competing upper-bound constraints and takes the smaller one.</p><p>So CLooG/ISL are basically geometric interpreters, they take a look at a geometric region and ask : &#8220;What are the legal integer coordinates?&#8221; and then they systematically turn those coordinates into nested for loops by :</p><ul><li><p>picking one dimension</p></li><li><p>finding its lower bound</p></li><li><p>finding its upper bound</p></li><li><p>emitting a loop</p></li><li><p>moving inward recursively</p></li></ul><p>This is the complete pipeline, from source code to optimized loops. The compiler builds the iteration domain, finds dependences, computes the best schedule via Pluto, and finally scans the resulting polyhedron to emit runnable code. In the next post we'll see where all of this lives in modern compilers: MLIR.</p><p></p><h1>Conclusion</h1><p>Polyhedral compilation looks intimidating at first because of all the math, inequalities, schedules, and optimization problems. But underneath all of it, the core idea is surprisingly simple:</p><p>A loop nest can be viewed as a geometric space of computations.</p><p>Once the compiler converts loops into mathematical objects, transformations that normally feel complicated like loop interchange, skewing, fusion, parallelization, or tiling become mathematical operations on that space.</p><p>The compiler is no longer &#8220;guessing&#8221; optimizations. It is proving which transformations are legal and then searching for schedules that improve locality and parallelism.</p><p>That is what makes the polyhedral model so powerful.</p><p>Modern systems like Pluto, ISL, and MLIR use these ideas to optimize scientific computing, machine learning kernels, stencil computations, and high-performance code generation. Even though the theory can become very deep, the entire pipeline fundamentally revolves around one question:</p><p>&#8220;How can we reorder computations without changing the meaning of the program, while making the hardware happier?&#8221;</p><p>And surprisingly, the answer turns out to be geometry.</p><p><em>I wrote this blog as part of my own journey learning polyhedral compilation and compiler optimizations. My goal here was not to give a fully rigorous research-level treatment, but to build an intuition-first understanding of how the entire pipeline fits together.</em></p><p><em>If you notice any mistakes, oversimplifications, or inaccuracies, please feel free to point them out. I&#8217;d genuinely appreciate corrections and deeper insights from people more experienced in this space.</em></p>]]></content:encoded></item><item><title><![CDATA[Building a Loop Invariant Code Motion (LICM) Pass in LLVM ]]></title><description><![CDATA[A walkthrough of writing a custom LICM pass using the new pass manager - what it does, how it works, and how to verify it.]]></description><link>https://sajidzubair.substack.com/p/building-a-loop-invariant-code-motion</link><guid isPermaLink="false">https://sajidzubair.substack.com/p/building-a-loop-invariant-code-motion</guid><dc:creator><![CDATA[Sajid Zubair]]></dc:creator><pubDate>Sat, 25 Apr 2026 17:49:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!PMGs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before we go into what Loop Invariant Code Motion (LICM) is and what it solves, we first need to understand the problem that we are facing. Lets understand this with an example program.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bcy3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bcy3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 424w, https://substackcdn.com/image/fetch/$s_!Bcy3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 848w, https://substackcdn.com/image/fetch/$s_!Bcy3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 1272w, https://substackcdn.com/image/fetch/$s_!Bcy3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bcy3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png" width="1028" height="280" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:280,&quot;width&quot;:1028,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44970,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bcy3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 424w, https://substackcdn.com/image/fetch/$s_!Bcy3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 848w, https://substackcdn.com/image/fetch/$s_!Bcy3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 1272w, https://substackcdn.com/image/fetch/$s_!Bcy3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f46d1e-c1fe-4f75-9fbe-367449f143f3_1028x280.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here we can clearly see that the value of &#8220;x&#8221; never changes inside the loop. So x*4 produces the same value for every single iteration. 1,2,3&#8230;..all the way to 1,000,000. We are multiplying the same two numbers a million times and then not using 999,999 of the answers.</p><p>The CPU doesn&#8217;t know this, it just executes the instructions that we give. It has no notion of &#8220;I already computed this&#8221;.</p><p></p><p>The fix is obvious once you see it :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ykK-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ykK-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 424w, https://substackcdn.com/image/fetch/$s_!ykK-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 848w, https://substackcdn.com/image/fetch/$s_!ykK-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 1272w, https://substackcdn.com/image/fetch/$s_!ykK-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ykK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png" width="1430" height="242" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:242,&quot;width&quot;:1430,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51263,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ykK-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 424w, https://substackcdn.com/image/fetch/$s_!ykK-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 848w, https://substackcdn.com/image/fetch/$s_!ykK-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 1272w, https://substackcdn.com/image/fetch/$s_!ykK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62e47a2b-a586-4e94-85ad-094ddbbfb3d4_1430x242.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>Moving the computation of &#8220;t&#8221; outside of the loop. Same result just 999,999 fewer multiplications. For a million iterations of a tight loop this is a meaningful speedup. That&#8217;s what the entire project is about, teaching the compiler to do this transformation automatically.</p><h2>What does a compiler actually do ? (the IR Layer)</h2><p>Whenever we write C code, the CPU executes machine code (raw binary instruction). Between these two things, compilers have an intermediate layer called IR short for Intermediate Representation. LLVM&#8217;s IR is a textual format that is present in <strong>.ll</strong> files.</p><p>Now the reason IR exist is because transformations like LICM are much easier to do on IR that on C source code (too high-level, too many syntactic complications) or on machine code (too low-level, architecture specific). IR sits at the perfect space which is abstract enough to reason about, concrete enough to optimize.</p><p>If you wanna see the IR yourself for the above code run the below command :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F-it!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F-it!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 424w, https://substackcdn.com/image/fetch/$s_!F-it!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 848w, https://substackcdn.com/image/fetch/$s_!F-it!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 1272w, https://substackcdn.com/image/fetch/$s_!F-it!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F-it!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png" width="864" height="116" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:116,&quot;width&quot;:864,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20854,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F-it!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 424w, https://substackcdn.com/image/fetch/$s_!F-it!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 848w, https://substackcdn.com/image/fetch/$s_!F-it!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 1272w, https://substackcdn.com/image/fetch/$s_!F-it!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10ce0e7a-3bb2-4462-9c05-27e82257c907_864x116.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><h2>Static Single Assignment (SSA)</h2><p>Before we dive into the complexities its important to understand SSA and what it does.</p><p>The rule is simple : every variable is defined exactly once, in exactly one place in the code. New values are represented as new variables.</p><p>In normal C code, this is constantly violated :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c2P7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c2P7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 424w, https://substackcdn.com/image/fetch/$s_!c2P7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 848w, https://substackcdn.com/image/fetch/$s_!c2P7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 1272w, https://substackcdn.com/image/fetch/$s_!c2P7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c2P7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png" width="1160" height="176" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:176,&quot;width&quot;:1160,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42082,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c2P7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 424w, https://substackcdn.com/image/fetch/$s_!c2P7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 848w, https://substackcdn.com/image/fetch/$s_!c2P7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 1272w, https://substackcdn.com/image/fetch/$s_!c2P7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243601a7-29ce-43e5-907c-6869cd683b58_1160x176.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In SSA form (which is what LLVM IR is), this become :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U790!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U790!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 424w, https://substackcdn.com/image/fetch/$s_!U790!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 848w, https://substackcdn.com/image/fetch/$s_!U790!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 1272w, https://substackcdn.com/image/fetch/$s_!U790!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U790!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png" width="926" height="178" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:178,&quot;width&quot;:926,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35604,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U790!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 424w, https://substackcdn.com/image/fetch/$s_!U790!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 848w, https://substackcdn.com/image/fetch/$s_!U790!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 1272w, https://substackcdn.com/image/fetch/$s_!U790!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a52f9a-33e4-4325-b8e1-39e13ac86e6f_926x178.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Every new value gets a new name, and its value are permanently bonded i.e &#8220;%x1&#8221; will always mean 5 not something else. This concept is gonna be useful in our LICM pass.</p><p></p><h2>Def-Use chain</h2><p>This is also one of the important concepts that will help us in the future. Lets us understand this with a proper example.</p><p>Consider this C code below :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bju8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bju8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 424w, https://substackcdn.com/image/fetch/$s_!Bju8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 848w, https://substackcdn.com/image/fetch/$s_!Bju8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 1272w, https://substackcdn.com/image/fetch/$s_!Bju8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bju8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png" width="500" height="118" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:118,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bju8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 424w, https://substackcdn.com/image/fetch/$s_!Bju8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 848w, https://substackcdn.com/image/fetch/$s_!Bju8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 1272w, https://substackcdn.com/image/fetch/$s_!Bju8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c638022-e68e-4f4c-8f9a-265056dcfa6a_500x118.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In LLVM IR, this becomes :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d-FY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d-FY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 424w, https://substackcdn.com/image/fetch/$s_!d-FY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 848w, https://substackcdn.com/image/fetch/$s_!d-FY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 1272w, https://substackcdn.com/image/fetch/$s_!d-FY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d-FY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png" width="1206" height="124" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:124,&quot;width&quot;:1206,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37383,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d-FY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 424w, https://substackcdn.com/image/fetch/$s_!d-FY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 848w, https://substackcdn.com/image/fetch/$s_!d-FY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 1272w, https://substackcdn.com/image/fetch/$s_!d-FY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7046b4a-01e6-489c-8061-81297b4367c1_1206x124.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Now forget about loops for a second. Just focus on these two instructions and how LLVM remembers them internally.</p><p>In LLVM every instruction is a <strong>Value object</strong>. That means %t is a Value, %res is a Value and %x is a Value. Even the constant 4 is a value.</p><p>Now Every single Value object in LLVM holds two things :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5WPU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5WPU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 424w, https://substackcdn.com/image/fetch/$s_!5WPU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 848w, https://substackcdn.com/image/fetch/$s_!5WPU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 1272w, https://substackcdn.com/image/fetch/$s_!5WPU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5WPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png" width="1224" height="242" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:242,&quot;width&quot;:1224,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42228,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5WPU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 424w, https://substackcdn.com/image/fetch/$s_!5WPU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 848w, https://substackcdn.com/image/fetch/$s_!5WPU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 1272w, https://substackcdn.com/image/fetch/$s_!5WPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6122328b-aada-47c5-a236-0c0e708fa564_1224x242.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This isn&#8217;t something that we have to do, LLVM maintains this automatically for us at all times, for every Value in our IR.</p><p>The first is its definition, the one instruction that produced it. For &#8220;%t&#8221; that&#8217;s the <strong>mul</strong> instruction. For &#8220;%x&#8221; that&#8217;s whatever instruction outside the loop created it.</p><p>The second is its users list, every instruction that consumes this value as an input. For &#8220;%t&#8221; that&#8217;s the <strong>add</strong> instruction that uses it. For &#8220;%x&#8221; that&#8217;s the <strong>mul</strong> instruction.</p><p>These two pieces of information together form what&#8217;s called the <strong>def-use chain</strong>. You can traverse it in either direction.</p><h3>The two directions of traversal</h3><p>Backward direction : From an instruction, to its inputs, to where those inputs were defined. You call <code>I.operands()</code> to walk this direction. You&#8217;re basically asking &#8220;where did my inputs come from?&#8221;</p><p>Forward direction : From a definition, to everything that consumes it. You call <code>I.users()</code> to walk this direction. You&#8217;re asking &#8220;who depends on my result?&#8221;</p><p>Our invariance check uses the use-def direction backwards.</p><p></p><h2>Basic Block in IR</h2><p>IR is not organised as a flat list of instructions. It&#8217;s organised into <strong>basic blocks</strong>. A basic block is actually a sequence of instructions with one strict rule: execution enters at the top and exits at the bottom. There are no branches in the middle.</p><p>Every basic block ends with exactly one <strong>terminator instruction</strong> it could either be a branch (<code>br</code>), a return (<code>ret</code>), or a switch. The terminator says &#8220;after this block, go to block X, or go to block Y depending on a condition.&#8221;</p><p>All the basic blocks are connected through terminators. Basically one block&#8217;s terminator points to other blocks as successors. These connections for a graph called <strong>Control Flow Graph</strong> (CFG).</p><h2>How do loops look like in CFG ?</h2><p>A loop in IR is just a cycle in the CFG, a path from some block back to a block you&#8217;ve already visited. The specific structure that LLVM recognises is called a natural loop and it has 3 parts (Entry block, Header block, and Body block) :</p><p><strong>The header block :</strong></p><p>The single entry point of the loop. Every time the loop starts (or restarts after an iteration), execution enters through this one block. This is where the loop condition is typically checked (<code>i &lt; n</code>).</p><p><strong>The back edge :</strong></p><p>An edge in the CFG that goes from a block inside the loop back to the header. This is what makes it a loop a cycle. At the end of each iteration, execution follows the back edge back to the header to check the condition again.</p><p><strong>The body blocks :</strong></p><p>These are all the blocks that are &#8220;inside&#8221; the loop, reachable from the header via the back edge. For a simple loop this might be just one block. For a complex loop with if-statements inside, there could be many.</p><p>Visually a simple loop looks like :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6MMk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6MMk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 424w, https://substackcdn.com/image/fetch/$s_!6MMk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 848w, https://substackcdn.com/image/fetch/$s_!6MMk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 1272w, https://substackcdn.com/image/fetch/$s_!6MMk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6MMk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png" width="1042" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfae058e-3212-4239-8472-49eece67033e_1042x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1042,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72718,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6MMk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 424w, https://substackcdn.com/image/fetch/$s_!6MMk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 848w, https://substackcdn.com/image/fetch/$s_!6MMk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 1272w, https://substackcdn.com/image/fetch/$s_!6MMk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfae058e-3212-4239-8472-49eece67033e_1042x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now in LLVM, <strong>LoopInfo</strong> analysis automatically finds all of these structures. If you want to ask &#8220;what loops exist in this function?&#8221; It will hand you <strong>Loop* </strong>objects with methods like L-&gt;getHeader(), L-&gt;getBlocks(), L-&gt;getExitBlocks(), L-&gt;contains(BB).</p><p>Okay Now we know that the compiler turns this into IR and breaks it into three separate blocks :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PMGs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PMGs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 424w, https://substackcdn.com/image/fetch/$s_!PMGs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 848w, https://substackcdn.com/image/fetch/$s_!PMGs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 1272w, https://substackcdn.com/image/fetch/$s_!PMGs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PMGs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png" width="1406" height="1216" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1216,&quot;width&quot;:1406,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184396,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PMGs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 424w, https://substackcdn.com/image/fetch/$s_!PMGs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 848w, https://substackcdn.com/image/fetch/$s_!PMGs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 1272w, https://substackcdn.com/image/fetch/$s_!PMGs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F097d25ef-0144-43fe-be6a-f13099c1b544_1406x1216.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Notice that the loop header has two different kind of predecessors arriving to it.</p><p></p><h2>Now the actual problem that our pass needs to solve</h2><p>We have identified that the computation <strong>%t = mul i32 %x, 4</strong> as loop-invariant. So we wanna move it someplace else where it is executed only once. Where ?</p><p><strong>Option A : Put it in the entry block.</strong> That might work for this simple case, but the entry block could be doing a hundred other things. More importantly, there could be multiple loops in the function, each needing their own hoisted instructions. Dumping everything into entry will definitely create problems in the long run.</p><p><strong>Option B : Put it at the top of loop.header.</strong> This seems natural. The header is the first thing that runs before the body. But if we look at the diagram again, the header runs on <strong>every iteration</strong>, not just once. The back edge from loop.body goes back to the header. So if you put the instruction in the header, it still executes a million times.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r4Qc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r4Qc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 424w, https://substackcdn.com/image/fetch/$s_!r4Qc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 848w, https://substackcdn.com/image/fetch/$s_!r4Qc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!r4Qc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r4Qc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png" width="1242" height="1018" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1018,&quot;width&quot;:1242,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:153349,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r4Qc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 424w, https://substackcdn.com/image/fetch/$s_!r4Qc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 848w, https://substackcdn.com/image/fetch/$s_!r4Qc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!r4Qc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd675c53c-5f42-42a9-bca3-2e1ee5686370_1242x1018.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Option C : Create a new dedicated block that sits between entry and the header.</strong> The pre-header is a basically a brand new block that you <strong>insert between entry and the header</strong>. The thing is the back edge still goes to the header and not to this new block. So the new block only gets executed once, when entry jumps to it at the very beginning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v53r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v53r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 424w, https://substackcdn.com/image/fetch/$s_!v53r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 848w, https://substackcdn.com/image/fetch/$s_!v53r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 1272w, https://substackcdn.com/image/fetch/$s_!v53r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v53r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png" width="1456" height="1179" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1179,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191435,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v53r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 424w, https://substackcdn.com/image/fetch/$s_!v53r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 848w, https://substackcdn.com/image/fetch/$s_!v53r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 1272w, https://substackcdn.com/image/fetch/$s_!v53r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ddc9f20-20e2-4587-8b66-c22898d0c291_1460x1182.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Now look at this in actual IR</h3><p>Before our pass :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yAeI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yAeI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 424w, https://substackcdn.com/image/fetch/$s_!yAeI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 848w, https://substackcdn.com/image/fetch/$s_!yAeI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 1272w, https://substackcdn.com/image/fetch/$s_!yAeI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yAeI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png" width="1340" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1340,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:148367,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yAeI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 424w, https://substackcdn.com/image/fetch/$s_!yAeI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 848w, https://substackcdn.com/image/fetch/$s_!yAeI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 1272w, https://substackcdn.com/image/fetch/$s_!yAeI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F489be98e-c747-43ce-8f46-85731f2e67a8_1340x728.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After our pass :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5sVm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5sVm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 424w, https://substackcdn.com/image/fetch/$s_!5sVm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 848w, https://substackcdn.com/image/fetch/$s_!5sVm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 1272w, https://substackcdn.com/image/fetch/$s_!5sVm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5sVm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png" width="1456" height="827" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:827,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204579,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5sVm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 424w, https://substackcdn.com/image/fetch/$s_!5sVm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 848w, https://substackcdn.com/image/fetch/$s_!5sVm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 1272w, https://substackcdn.com/image/fetch/$s_!5sVm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4dcf6c27-101d-4e24-b2ec-8978a29304b8_1518x862.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Couple to things to notice in the updated IR :</p><ul><li><p><code>%t = mul i32 %x, 4</code> is moved from <code>loop.body</code> to <code>loop.preheader</code>. It now runs only once.</p></li><li><p>The back edge <code>br label %loop.header</code> at the bottom of <code>loop.body</code> is still unchanged. It still goes to the header and back edge completely skips the preheader which is good since it will not recompute the values.</p></li><li><p><code>%t</code> is still used in <code>loop.body</code> with <code>%res = add i32 %val, %t</code>. This works because the preheader runs before the loop starts, so <code>%t</code> is already defined by the time any iteration uses it. SSA def-use chains remain valid, uses still reference <code>%t</code>, and <code>%t</code> is now defined in the preheader which runs before everything.</p></li></ul><p></p><h3>Why the two-predecessors problem matters</h3><p>Here&#8217;s the specific thing that makes the pre-header necessary. Look at <code>loop.header</code> in the before case. It has two predecessors. The PHI node at the top of the header exists because of this:</p><p><code>%i = phi i32 [ 0, %entry ], [ %i.next, %loop.body ]</code></p><p>This PHI node says: if I arrived from <code>%entry</code>, then <code>%i = 0</code>. If I arrived from <code>%loop.body</code>, then <code>%i = %i.next</code>. The header has two predecessors, so values that flow into it need a PHI node to merge them.</p><p>Now imagine you put your hoisted instruction <code>%t = mul %x, 4</code> directly in the header, above the PHI nodes. It would again execute every time the header runs: once from entry, and once again every time the back edge fires. Million iterations = million multiplications. Nothing gained.</p><p>The preheader solves this by being a block that <strong>only entry can reach</strong>. No back edge. No PHI node is needed. It executes once, produces <code>%t</code>, and the loop body uses it forever.</p><p></p><p></p><h2>Writing the LICM pass</h2><p>Now that we have an understanding of all the things that are happening conceptually, lets dive to into actually writing the pass.</p><p>Implementation-wise, LLVM LICM boils down to <strong>3 phases</strong>:</p><ol><li><p><strong>Detect loops</strong> &#8594; using LoopAnalysis</p></li><li><p><strong>Find invariant instructions</strong> &#8594; <code>isInvariant()</code></p></li><li><p><strong>Check safety + hoist</strong> &#8594; move to preheader</p></li></ol><h3>Let us start with the <strong>&#8220;run&#8221;</strong> function :</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J7mP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J7mP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 424w, https://substackcdn.com/image/fetch/$s_!J7mP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 848w, https://substackcdn.com/image/fetch/$s_!J7mP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 1272w, https://substackcdn.com/image/fetch/$s_!J7mP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J7mP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png" width="1260" height="902" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:902,&quot;width&quot;:1260,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:194098,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J7mP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 424w, https://substackcdn.com/image/fetch/$s_!J7mP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 848w, https://substackcdn.com/image/fetch/$s_!J7mP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 1272w, https://substackcdn.com/image/fetch/$s_!J7mP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6b29b15-7950-4b1b-93f2-0a0d6918494b_1260x902.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This run() function is essentially your control center of the LICM pass. This is the entry point of our pass, and LLVM calls it once for every function and expect it to transform if needed.</p><p>The run() function takes in two arguments :</p><ol><li><p>Function &amp;F : This is the function our pass is working on.</p></li><li><p>FunctionAnalysisManager &amp;AM : This is how we can access the precomputed analyses. Instead of recomputing things like loops or dominance, you just ask:</p></li></ol><ul><li><p>auto &amp;LI = AM.getResult&lt;LoopAnalysis&gt;(F);</p></li><li><p>auto &amp;DT = AM.getResult&lt;DominatorTreeAnalysis&gt;(F);</p></li><li><p>auto &amp;AA = AM.getResult&lt;AAManager&gt;(F);</p></li></ul><h3><code>LoopAnalysis</code> &#8594; <code>LI</code></h3><ul><li><p>All loops in the function</p></li><li><p>Structure of loops</p></li><li><p>Blocks inside loops</p></li></ul><p>Without this &#8594; you can&#8217;t even detect loops</p><div><hr></div><h3><code>DominatorTreeAnalysis</code> &#8594; <code>DT</code></h3><ul><li><p>Which blocks dominate others</p></li></ul><p>Used for:</p><ul><li><p>Safety check (does instruction always execute?)</p></li></ul><div><hr></div><h3><code>AAManager</code> &#8594; <code>AA</code></h3><ul><li><p>Alias analysis</p></li></ul><p>Used for:</p><ul><li><p>Memory safety (do two pointers refer to same memory?)</p></li></ul><p>Then we are using a variable called &#8220;changed&#8221; to know if we have modified the IR or not.</p><p>Using LI.getLoopsInPreorder() function we get all the list of loops ordered as [Outer, Inner]. But we need to process the inner loops first so we just reverse the order of iteration using &#8220;loops.rbegin()&#8221; and &#8220;loops.rend()&#8221;.</p><p>At the end you might see &#8220;PreservedAnalyses::none()&#8221; and &#8220;PreservedAnalyses::all()&#8221;. See whenever LLVM computes this like Loop structure, Dominance and Alias info which are expensive computations, so LLVM caches them and resuses them across passes.</p><p>So the problme is, when our pass changes the IR, it can break previously computed analyses. So LLVM asks us &#8220;Did we change anything that might invalidate analyses?&#8221;</p><p>Case 1 : We changed the IR so return PreservedAnalyses::none() which means we have modified the program, don&#8217;t trust any old analysis.</p><p>Case 2 : No changes so return PreservedAnalyses::all() which means we didn&#8217;t touch anything, everything is still valid.</p><p></p><h3>Moving on to the &#8220;<strong>processLoop()&#8221;</strong> function :</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B2ai!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B2ai!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 424w, https://substackcdn.com/image/fetch/$s_!B2ai!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 848w, https://substackcdn.com/image/fetch/$s_!B2ai!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!B2ai!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B2ai!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png" width="1326" height="1244" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1244,&quot;width&quot;:1326,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:239542,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B2ai!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 424w, https://substackcdn.com/image/fetch/$s_!B2ai!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 848w, https://substackcdn.com/image/fetch/$s_!B2ai!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!B2ai!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be68bf1-597f-4185-871f-aa41aa10b5e7_1326x1244.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now this is our core optimization engine which takes a single loop, finds invariant + safe instructions, and moves them to the loop preheader.</p><p>The arguments that we are providing to the processLoop() function are :</p><ul><li><p><code>Loop *L</code> &#8594; the loop to optimize</p></li><li><p><code>DominatorTree &amp;DT</code> &#8594; for safety (execution guarantee)</p></li><li><p><code>LoopInfo &amp;LI</code> &#8594; needed for CFG updates (preheader creation)</p></li><li><p><code>AAResults &amp;AA</code> &#8594; for memory safety</p></li></ul><p>The high level workflow is gonna be something like this :</p><ol><li><p>Get/Create preheader</p></li><li><p>Scan loop &#8594; find candidates</p></li><li><p>Store candidates</p></li><li><p>Move them outside loop</p></li></ol><p>Firstly we get the preheader for the given loop and if it doesn&#8217;t exist then we will create it using :</p><pre><code><code>**preheader = InsertPreheaderForLoop(L, &amp;DT, &amp;LI, nullptr, false);
</code></code></pre><p>This transforms our CFG :</p><p>Before: entry &#8594; loop.header</p><p>After: entry &#8594; preheader &#8594; loop.header</p><p>This makes LICM possible, because without this preheader, hoisting has no safe destination.</p><p>Then create storage for candidates :</p><pre><code><code>std::vector&lt;Instruction *&gt; toHoist;
</code></code></pre><p>We are storing first because never modify the IR while iterating it.</p><p>Then Itearate over every block in loop and then over each Instruction. Inside those loops we have to check two things :</p><ol><li><p>isInvariant() - Does it depend on loop ?</p></li><li><p>isSafeToHoist() - meaning it has no side effects, no aliasing</p></li></ol><p>If these checks satisfy then push the instruction into the <strong>toHoist</strong> vector that we created for storage.</p><p>Hoisting Phase :</p><pre><code><code>for (Instruction *I : toHoist){
&#9;I-&gt;moveBefore(preheader-&gt;getTerminator());
}
</code></code></pre><p>This basically moves the instructions from the loop body to the preheader.</p><h3>Lets move on to the isInvariant() and isSafeToHoist() functions</h3><p>LICM has two filters :</p><ol><li><p>Invariant? &#8594; Does value change across iterations?</p></li><li><p>Safe? &#8594; Will moving it break the program?</p></li></ol><p>If both are true then we hoist.</p><h4>1. isInvariant() :</h4><p>Checks whether the instruction produces the same value every iteration or not. An instruction is invariant if all its inputs are defined outside the loop</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ULJd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ULJd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 424w, https://substackcdn.com/image/fetch/$s_!ULJd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 848w, https://substackcdn.com/image/fetch/$s_!ULJd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 1272w, https://substackcdn.com/image/fetch/$s_!ULJd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ULJd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png" width="746" height="350" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:746,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65675,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ULJd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 424w, https://substackcdn.com/image/fetch/$s_!ULJd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 848w, https://substackcdn.com/image/fetch/$s_!ULJd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 1272w, https://substackcdn.com/image/fetch/$s_!ULJd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f4f3b1f-91b9-4847-9d0b-3f1f3ae2719e_746x350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p>Firstly skip the instruction if it is PHINode since PHI nodes represent changing values.</p></li><li><p>Then Iterate over the operands and check where each operand is defined.</p></li></ol><pre><code><code>if (auto *defInst = dyn_cast&lt;Instruction&gt;(op))</code></code></pre><ol start="3"><li><p>Check if those are inside the loop, and if they are inside the loop return false. Since operands computed inside the loop may change each iteration.</p></li><li><p>If all operands pass then return true.</p></li></ol><p></p><h4>2. isSafeToHoist() :</h4><p>This basically makes sure that the instruction does not change program behaviour.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xKL7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xKL7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 424w, https://substackcdn.com/image/fetch/$s_!xKL7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 848w, https://substackcdn.com/image/fetch/$s_!xKL7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 1272w, https://substackcdn.com/image/fetch/$s_!xKL7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xKL7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png" width="878" height="966" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:966,&quot;width&quot;:878,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:174948,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/195457791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xKL7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 424w, https://substackcdn.com/image/fetch/$s_!xKL7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 848w, https://substackcdn.com/image/fetch/$s_!xKL7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 1272w, https://substackcdn.com/image/fetch/$s_!xKL7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a316537-12c4-4d81-980a-592e9b94d9c3_878x966.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now this function is just asking us &#8220;If I move this instruction out the loop, will the program still behave the same?&#8221;</p><p>To check this we have 3 conditions that it has to follow :</p><ol><li><p>No side effects</p></li><li><p>Memory should be safe</p></li><li><p>It always executes anyway</p></li></ol><h4>Side Effect check :</h4><p>This check basically tells to reject anything that writes to memory, calls functions or changes state.</p><h4>Memory Safety (alias analysis) :</h4><p>Lets understand with a simple example first.</p><p>Loads read memory:</p><pre><code><code>t = *ptr;
</code></code></pre><p>But what if inside loop:</p><pre><code><code>*ptr = i;
</code></code></pre><p>Then value of <code>*ptr</code> changes each iteration.</p><p>So our code basically iterates over the load and store instruction and checks if they have the same memory location or not.</p><pre><code><code>if (auto *loadInst =dyn_cast&lt;LoadInst&gt;(&amp;I))
</code></code></pre><p>Only check loads</p><div><hr></div><h4>Scan entire loop</h4><pre><code><code>for (BasicBlock *BB :L-&gt;getBlocks()) {
for (Instruction &amp;other : *BB) {
</code></code></pre><p>Look at ALL instructions in loop</p><div><hr></div><h3>Find stores</h3><pre><code><code>if (auto *storeInst =dyn_cast&lt;StoreInst&gt;(&amp;other))
</code></code></pre><div><hr></div><h4>Check alias</h4><p>Basically means do these two pointers refer to the same memory?</p><pre><code><code>AliasResultresult =AA.alias(
&#9;loadInst-&gt;getPointerOperand(),
&#9;storeInst-&gt;getPointerOperand()
);
</code></code></pre><div><hr></div><h3>Dominance Check :</h3><p>The reason we are checking this is because of one problem, lets understand it using an example :</p><pre><code><code>for (...) {
    if (i % 2 == 0) {
        t = x * 4;
    }
}
</code></code></pre><p>Here we can see &#8220;t&#8221; only runs sometimes and If hoisted :</p><pre><code><code>t = x * 4;
for (...) {
    if (...) { }
}
</code></code></pre><p>Now it runs every time which is not good.</p><p>So what we are doing is getting all the exit points of an instruction :</p><pre><code><code>SmallVector&lt;BasicBlock *, 4&gt; exitBlocks;
L-&gt;getExitBlocks(exitBlocks);
</code></code></pre><p>and then checking that instruction must dominate ALL exit blocks, or in other words the instruction must execute on every path through the loop. If not that means its conditional and it cannot be hoisted.</p><p></p><p>This implementation is intentionally conservative. It avoids unsafe transformations, but it also misses some opportunities that a production compiler like LLVM would handle. Real-world LICM includes more advanced techniques such as:</p><ul><li><p>Handling chains of dependent invariant instructions</p></li><li><p>Speculative execution checks</p></li><li><p>More precise memory analysis (like MemorySSA)</p></li></ul><p>That said, the goal here was not to replicate LLVM&#8217;s full complexity, but to understand the core idea deeply enough to build it ourselves.</p><p></p><p><em>The full implementation is on <a href="https://github.com/Sajid-Zubair/LICM-Pass">GitHub</a> - feel free to explore or build on top of it.</em></p><p><em>I&#8217;m still learning LLVM myself, so if you spot anything off or have suggestions, I&#8217;d love to hear them. This is as much a learning log as it is a tutorial.</em></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Dead Code Elimination Pass]]></title><description><![CDATA[What exactly is Dead Code Elimination (DCE) ?]]></description><link>https://sajidzubair.substack.com/p/dead-code-elimination-pass</link><guid isPermaLink="false">https://sajidzubair.substack.com/p/dead-code-elimination-pass</guid><dc:creator><![CDATA[Sajid Zubair]]></dc:creator><pubDate>Wed, 15 Apr 2026 05:33:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!a8AA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>What exactly is Dead Code Elimination (DCE) ?</h3><p>Its a compiler optimization that removes code that does not affect the final output of a program. Any line of code in a codebase that isn&#8217;t being used is removed using DCE pass.</p><p>For Example :</p><pre><code><code>int main() {

  int x = 10; //dead code

  return 5;
}
</code></code></pre><p>Here we can clearly see that the variable &#8220;x&#8221; doesn&#8217;t affect the final output of the program, hence that piece of code will be removed by the compiler.</p><p>Note: Not all unused-looking code can be removed. Instructions that have side effects (such as memory writes or function calls) must be preserved.</p><div><hr></div><h3>Why is there a need for this ?</h3><p>Now we know that CPU executes instructions one by one. Therefore having fewer instructions helps the CPU complete the tasks faster and in less time. Generally, fewer instructions means less work for the CPU. Let&#8217;s understand this in more detail :</p><p>Example</p><pre><code><code>int main() {
&#9;int x = 10; // unused
&#9;int y = 20; // unused
&#9;return 5;
}
</code></code></pre><p>As you can see here we are not using the variables &#8220;x&#8221; and &#8220;y&#8221; and since we didn&#8217;t remove those lines of code, the compiler will generate instructions for them.</p><pre><code><code>load 10 into register
store into x
load 20 into register
store into y
return 5
</code></code></pre><p>While individual load/store operations may seem small in isolation, redundant instructions still increase total execution time and memory traffic.</p><p>More instructions = more cycles</p><div><hr></div><h3>Now since we know what and why of DCE let us understand where does DCE actually happen.</h3><h4>High-Level View : What is actually happening and where ?</h4><p>When we usually write any C++ program, the computer doesn&#8217;t understand the C++ code directly. This is where the compiler comes into play.</p><p>The compiler works in stages :</p><pre><code><code>Your C++ code
      &#8595;
Convert to an intermediate form (IR)
      &#8595;
Optimize the IR (DCE happens here)
      &#8595;
Convert to machine code
</code></code></pre><p>The compiler simplifies your code into a standard internal form called the Intermediate Representation (IR) so it can analyze and optimize it easily.</p><p>Now you might get a doubt that why aren&#8217;t we directly optimizing the C++ code that we have.</p><p>The reason for that it is :</p><ul><li><p>complex</p></li><li><p>ambiguous</p></li><li><p>high-level</p></li></ul><p>This could involve operator precedence, function calls, and overloads which would become very hard to optimize reliably.</p><div><hr></div><h3>So now let us try to understand what is LLVM IR and how is it actually helps us optimize.</h3><p>LLVM IR is like a low-level, simplified version of your program.</p><p>Example :</p><p>C++:</p><pre><code><code>int main() {
   int x = 5 + 10;
   return 0;
}
</code></code></pre><div><hr></div><p>LLVM IR:</p><pre><code><code>%1 = add i32 5, 10
ret i32 0
</code></code></pre><ul><li><p>Here %1 represents a temporary variable</p></li><li><p>add i32 5, 10 computes 5 + 10</p></li><li><p>ret i32 0 return 0</p></li></ul><p>Here &#8220;i32&#8221; basically denotes that it is a 32 bit integer.</p><p>IR is the perfect place for DCE because of the following reasons:</p><p>In IR:</p><pre><code><code>%1 = add i32 5, 10
</code></code></pre><ol><li><p>Everything is explicit:</p><ul><li><p>Everything is clearly defined</p></li><li><p>no hidden behaviour</p></li></ul></li><li><p>Data flow is clear :</p><p>You can easily track:</p><ul><li><p>where a value is used</p></li><li><p>where it is not</p></li></ul></li><li><p>Language-Independent</p><p>Same DCE works for:</p><ul><li><p>C++</p></li><li><p>Rust</p></li><li><p>Swift</p></li></ul><p>Because all become LLVM IR</p></li><li><p>Easier than machine code :</p><p>Machine code is:</p><ul><li><p>too low-level</p></li><li><p>tied to hardware</p></li></ul><p>IR is:</p></li></ol><blockquote><p>perfect balance between abstraction and control</p></blockquote><p></p><h3>Write your own DCE Pass</h3><p>Now we have enough back story and we can move on to actually write a proper DCE pass.</p><p>Our project setup will look like this :</p><p>We start by creating a simple project structure :</p><pre><code><code>DCEPass/
&#9500;&#9472;&#9472; test.cpp       &#8592; Input C++ program
&#9500;&#9472;&#9472; test.ll        &#8592; LLVM IR (generated)
&#9500;&#9472;&#9472; DCEPass.cpp    &#8592; Our custom optimization pass
&#9500;&#9472;&#9472; CMakeLists.txt &#8592; Build configuration
&#9492;&#9472;&#9472; build/         &#8592; Compiled plugin
</code></code></pre><div><hr></div><h3>1. Writing the Input Program (test.cpp)</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LQYD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LQYD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 424w, https://substackcdn.com/image/fetch/$s_!LQYD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 848w, https://substackcdn.com/image/fetch/$s_!LQYD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 1272w, https://substackcdn.com/image/fetch/$s_!LQYD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LQYD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png" width="438" height="344" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:344,&quot;width&quot;:438,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21291,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/194202185?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LQYD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 424w, https://substackcdn.com/image/fetch/$s_!LQYD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 848w, https://substackcdn.com/image/fetch/$s_!LQYD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 1272w, https://substackcdn.com/image/fetch/$s_!LQYD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff424f4e8-e9bf-4f75-b820-3d9c69378cbe_438x344.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here</p><ul><li><p><code>x</code> is never used</p></li><li><p>So everything related to <code>x</code> is <strong>dead code</strong></p></li></ul><h3>2. Converting C++ &#8594; LLVM IR</h3><p>Now we know Computers don&#8217;t optimize C++ directly. They optimize an intermediate form called <strong>LLVM IR</strong>.</p><p>We generate IR using:</p><pre><code><code>clang -S -emit-llvm -O0 test.cpp -o test.ll
</code></code></pre><p>Type this in the terminal and a &#8220;test.ll&#8221; file will be generated.</p><h3>3. Understanding the IR (test.ll)</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ano_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ano_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 424w, https://substackcdn.com/image/fetch/$s_!Ano_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 848w, https://substackcdn.com/image/fetch/$s_!Ano_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 1272w, https://substackcdn.com/image/fetch/$s_!Ano_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ano_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png" width="586" height="322" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:322,&quot;width&quot;:586,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44308,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/194202185?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ano_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 424w, https://substackcdn.com/image/fetch/$s_!Ano_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 848w, https://substackcdn.com/image/fetch/$s_!Ano_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 1272w, https://substackcdn.com/image/fetch/$s_!Ano_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e9df41-c8ca-4815-87f7-089e1b3c9642_586x322.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><code>alloca</code> &#8594; allocate memory (like variables)</p></li><li><p><code>store</code> &#8594; assign values</p></li><li><p><code>ret</code> &#8594; return</p></li></ul><h3>Problem that we are trying to solve:</h3><ul><li><p><code>%1</code>, <code>%2</code>, and their stores are <strong>never used</strong></p></li><li><p>This is exactly what DCE should remove</p></li></ul><div><hr></div><h3>4. Writing the DCE Pass (DCEPass.cpp)</h3><p>Before we dive into the Pass, we have to understand what a basic block is. A Basic block is a straight-line sequence of instructions with one entry and exit point and no jumps in between.</p><p>Now we have to write our core logic which basically removes instructions that :</p><ul><li><p>have no users</p></li><li><p>are not control flow</p></li><li><p>have no side effects</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a8AA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a8AA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 424w, https://substackcdn.com/image/fetch/$s_!a8AA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 848w, https://substackcdn.com/image/fetch/$s_!a8AA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!a8AA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a8AA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png" width="1456" height="950" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:950,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:186752,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/194202185?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a8AA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 424w, https://substackcdn.com/image/fetch/$s_!a8AA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 848w, https://substackcdn.com/image/fetch/$s_!a8AA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!a8AA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe771da15-11e2-48da-aec8-6ffee27009ab_1538x1004.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here the function traverses over all the Basic Blocks present in the IR. Inside each basic block it iterates on the instructions. We have to check these instructions and see if they affect our final output or not. If they don&#8217;t we simply delete them.</p><p>One important thing to note is that we are incrementing the iterator before we delete the instruction to safely perform the check and to not get a segmentation fault.</p><p>Some Important functions used here are :</p><h3><code>I.use_empty()</code></h3><pre><code><code>No instruction is using this value &#8594; safe to delete

This means the result produced by this instruction is never used anywhere else in the program.</code></code></pre><div><hr></div><h3><code>I.isTerminator()</code></h3><pre><code><code>Do NOT delete return/branch instructions.

These instructions control the flow of the program, so removing them would break execution.</code></code></pre><div><hr></div><h3><code>I.mayHaveSideEffects()</code></h3><pre><code><code>Do NOT delete:
- stores
- function calls

Even if their result is unused, they still affect program state (like modifying memory or printing something.</code></code></pre><h3>5. Building the Pass</h3><p>We use CMake to compile our pass into a plugin:</p><pre><code><code>cmake -S .-B build
cmake --build build
</code></code></pre><h3>Output:</h3><pre><code><code>build/libDCEPass.dylib
</code></code></pre><h3>6. Running the Pass</h3><p>We run our pass using LLVM&#8217;s <code>opt</code> tool:</p><pre><code><code>opt -S -load-pass-plugin ./build/libDCEPass.dylib \\
-passes="dce-pass" test.ll -o out.ll
</code></code></pre><p>This makes an out.ll file which removes the unwanted stores that we saw earlier in the test.ll file</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!We59!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!We59!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 424w, https://substackcdn.com/image/fetch/$s_!We59!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 848w, https://substackcdn.com/image/fetch/$s_!We59!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 1272w, https://substackcdn.com/image/fetch/$s_!We59!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!We59!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png" width="604" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:604,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sajidzubair.substack.com/i/194202185?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!We59!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 424w, https://substackcdn.com/image/fetch/$s_!We59!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 848w, https://substackcdn.com/image/fetch/$s_!We59!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 1272w, https://substackcdn.com/image/fetch/$s_!We59!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7ecd95-7f15-4dd0-a8dd-e070b08a79c4_604x256.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In practice, DCE is often applied repeatedly, because removing one instruction can make other instructions dead.</p><p><strong>A quick note</strong> : modern compilers like Clang/GCC already do this automatically when you compile with -O1 or higher. You don&#8217;t need to write this pass to get DCE in production. The goal here was to understand what&#8217;s happening under the hood, stripping away the magic and seeing exactly how the compiler identifies and removes dead instructions at the IR level. Think of it as reading the compiler&#8217;s source code, but from scratch.</p><p>If you&#8217;d like to dive deeper and experiment with the implementation yourself, I&#8217;ve uploaded the complete code for this DCE pass on GitHub.</p><p>GitHub Repo : <a href="https://github.com/Sajid-Zubair/DCEPass">https://github.com/Sajid-Zubair/DCEPass</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://sajidzubair.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>