Advent of Code 2023
- Piotr Nosek
- 1st Dec 2023
- 80 min of reading time
Hello! I’m Piotr from Erlang Solutions Poland and I have the pleasure of saving Christmas this year with the power of Erlang for you!
This is the second time we participate in the amazing event called the Advent of Code. Last year’s edition was solved by my colleague Aleksander and as far as I know – many of you enjoyed following his efforts. I hope you’ll like my tale of helping Santa too!
I’m going to publish my solutions in my GitHub repository. They will be accompanied by a commentary, added to this page on a daily basis. I will add solutions for each day in an individual folder, with my input file downloaded from the AoC website.
I’m also going to include a bit of microbenchmarking in every solution, with and without JIT. Firstly, to measure the overall performance of the code and secondly to see how much the efficiency improves thanks to JIT. I’m going to measure the computation time only with `timer:tc/3`
call, as I consider time needed to compile a module and load the input file irrelevant. By “load” I mean: read it and split it into lines. Any further processing of individual lines is considered a computation. I will provide min, max and arithmetic average of 100 runs.
I’ll be implementing solutions as EScripts, so running them is a bit more straightforward. And frankly – I think they are underrated and sometimes I prefer them over writing BASH scripts. I’ll always include `-mode(compile).`
directive to avoid the interpretation performance penalty. For those who are not aware of this capability, I’ll also run Day 1 without this option to show you how the timings change.
I’m going to run every piece of the code on Linux Mint 21.2 VirtualBox machine with 4 cores and 8GB of memory, hosted on my personal PC with Ryzen 3700X and DDR4 at 3200MHz. I will use OTP 26.1.1.
I would never suspect that I’ll begin the AoC challenge with being loaded onto a trebuchet. I’d better do the math properly! Or rather – have Erlang do the calibration for me.
FYI: I do have some extra motivation to repair the snow production: my kids have been singing “Do You Want to Build a Snowman?” for a couple of days already and there is still nowhere enough of it where I live.
I considered three approaches to the first part of the puzzle:
I chose the last one, as (1) felt like shooting a mosquito with a M61 Vulcan Cannon. Second one felt kind of less Erlang-ish than the third one. After all, matching binaries and recursive solutions are very natural in this language.
Min | Avg | Max | |
Compiled + JIT | 0.000091s | 0.000098s | 0.000202s |
Compiled + no JIT | 0.000252s | 0.000268s | 0.000344s |
Interpreted | 0.091494s | 0.094965s | 0.111017s |
By choosing the method of matching binaries, I was able to add support for digits as words pretty easily. If there were more mappings than just nine, I’d probably use a map to store all possible conversions and maybe even compile a regular expression from them.
Eventually, the temptation of violating the DRY rule a bit was too strong and I just went for individual function clauses.
And my solution was invalid. Shame on me but I admit I needed a hint from other participants – it turned out that some words can overlap and they have to be treated as individual digits. It wasn’t explicitly specified and ignoring overlaps in the example did not lead to an invalid result – a truly evil decision of AoC maintainers!
Simply put, at first I thought such a code would be enough:
parse(<<"one", Rest/binary>>, First, _Last) -> store(1, Rest, First);
But the actual Rest
must be defined as <<_:8, Rest/binary>>.
Min | Avg | Max | |
Compiled + JIT | 0.000212s | 0.000225s | 0.000324s |
Compiled + no JIT | 0.000648s | 0.000679s | 0.000778s |
Interpreted | 0.207670s | 0.213344s | 0.242223s |
JIT does make a difference, doesn’t it?
I’m not complaining but maybe if the elves spent more time on polishing engineering skills than inventing games, perhaps the Snow Island would have never broken. However, an army of technically skilled elves could render our jobs obsolete. All of our customers and employers would only have to write a letter to Santa. …don’t you dare to mention AI here.
Lucky for us, there were no traps in the puzzle input and the specification today. Actually, parsing each line was probably more challenging than computing the result itself.
In part 1 I decided to retain all information from the file in the internal structure. I was expecting that I may need it in part 2. Plus, I obviously began the work with testing the parser itself and printing a complete structure that was friendly to a human debugger’s eye.
Using classic regular expressions was tempting, just like in Day 1, but I was a bit afraid that a small mistake in the pattern or my reasoning could lead to hours of debugging and losing a bit of sanity. Laziness won and I just decomposed each line with a chain of `binary:split`
calls.
Min | Avg | Max | |
JIT | 0.000692s | 0.000732s | 0.000802s |
No JIT | 0.000799s | 0.000849s | 0.000987s |
JIT influence was less noticeable this time but still measurable.
Thanks to building the internal data structure with complete information from the input, the second part was very straightforward. For every game I simply needed to find the biggest number of red, blue and green draws and multiply them.
Implementing it was very fast so I decided it’s a nice occasion to experiment with performance of splitting vs. scanning. To be more specific: the code so far used lots of `binary:split`
calls and I’ve replaced it with sequential parsing of a binary, (almost) character after character. My intuition told me that it will be most likely similar or even a bit slower but I wanted to have an empirical proof.
I’ve also tried to merge parsing and computations to avoid unnecessary iterations over the full game list. After all, I needed only the “power” of a game and maximums of each color could be found at the time of reading the input. Actually, there even was no need to distinguish between individual draws!
Min | Avg | Max | |
Binary split, JIT | 0.000703s | 0.000730s | 0.000790s |
Recursive scan, JIT | 0.000136s | 0.000156s | 0.000206s |
Scan and compute, JIT | 0.000097s | 0.000105s | 0.000148s |
Binary split, no JIT | 0.000868s | 0.000938s | 0.001229s |
Recursive scan, no JIT | 0.000250s | 0.000278s | 0.000470s |
Scan and compute, no JIT | 0.000203s | 0.000224s | 0.000331s |
Maybe it’s just me but I think it was definitely a surprise! Recursive scan of a binary was about 4-5x faster than `binary:split`
approach, even with such short lines. It seems that more efficient usage of sub binaries and match contexts makes quite a difference.
Computing the solution in one pass also brought measurable improvement but the code might be considered less understandable if read without any comments. Well, we do sometimes have to balance between readability and performance, right?
I did try regular expressions this time.
Today’s puzzle had a very interesting input to parse. Its “syntax” was simple but at first I wasn’t sure what would be the most suitable representation of the provided data internally. The most straightforward one, that came to my mind, was to simply keep two maps: for locations of numbers and engine parts. The value in the numbers map was the number itself and its length.
Since the task was to find isolated part numbers, my code iterates over all of them and checks all possible part locations candidates in a bit naive way – because it doesn’t bother to check the engine plan boundaries and doesn’t take into account that parts cannot be located inside the number. It is a small waste of CPU cycles but I’ve considered part 1 just a warmup – the optimizations will come later.
Min | Avg | Max | |
JIT | 0.001688s | 0.001915s | 0.002737s |
No JIT | 0.002685s | 0.002879s | 0.003540s |
My expectation was that the task for part 2 will be to find the correct engine elements’ numbers in a manner similar to the Minesweeper. Fortunately, the actual puzzle was easier.
So, the data model that I’ve chosen for part 1 wasn’t that great in the second problem. It was possible to solve it nevertheless, but it was a bit awkward, because it required filtering out all parts that are not ’*’ and then collecting lists of all adjacent numbers for each possible gear. It did work and allowed me to pass the second stage of the puzzle but I definitely wanted to do it better, with significantly better performance.
So I decided to give regular expressions a try, since I was confident I can get it right with such simple objects as “nonempty series of digits” or “anything other than digits and periods”. And I did but unfortunately the performance got significantly worse! It may be partially caused by a fact, that `re:run`
returns either `{Offset, Length}`
pairs or the matched string but not both. In this case I had to take the former and use `binary:part`
and `binary:at`
to get the actual values. I tried to precompile the expressions too, as I thought that repeated compilation might accumulate into a significant delay but it didn’t bring that much improvement.
The next version involved iterating over ‘*’ instead of numbers. Also, this time the map of part numbers became an index: for each number there are as many keys as digits in it. As an example: the previous implementation encoded `2137`
at coordinates `{4, 2}`
as `{4, 2} => {2137, 4}`
, while this time it is stored in the map as: `{4, 2} => 2137, {5, 2} => 2137, {6, 2} => 2137, {7, 2} => 2137`
. Now, for each potential gear the following coordinates are checked:
Final optimisation step introduced different means of storage than two maps. For the gears, a plain list of coordinates is used (as the values from the map weren’t even used in the previous version) and in case of the numbers index – an ETS table. I suspected that the latter will be especially beneficial, as the index can become quite a large map.
Min | Avg | Max | |
No optimisations | 0.002262s | 0.002519s | 0.003287s |
RegEx | 0.004212s | 0.004541s | 0.006180s |
Precompiled RegEx | 0.003902s | 0.004199s | 0.005541s |
Numbers index | 0.001990s | 0.002254s | 0.002577s |
Index in ETS | 0.001355s | 0.001487s | 0.002081s |
No optimisations, no JIT | 0.003488s | 0.003774s | 0.005147s |
RegEx, no JIT | 0.004811s | 0.005070s | 0.008774s |
Precompiled RegEx, no JIT | 0.004687s | 0.005065s | 0.006503s |
Numbers index, no JIT | 0.002694s | 0.003005s | 0.003677s |
Index in ETS, no JIT | 0.002032s | 0.002156s | 0.002382s |
Once again, my intuition misled me. I expected the RegEx version to be at least as good as plain scan but likely faster… but it was almost 2x slower! Precompiling the expression did help a bit but it was still nowhere near scan’s performance.
Also, using the numbers index brought much smaller improvement than I thought, given that iterating over engine parts was bound to have less misses than iterating over numbers. Also, it allowed the code to calculate the sum of “gear ratios” on the fly. Still, it did perform slightly better.
Replacing the map with the ETS table gave the calculations a significant boost. It was expected, as it avoided repeated copying of the map in the parsing stage.
Well, off we go! The gondola lift works again and maybe we’ll finally find out what’s broken in the water source of the Snow Island. Ironically, there is a water outage where I live at the moment of writing these words…
OK, I won’t try parsing with regular expressions in the next days. Binary matching.
Until today, I considered two options of retiring as a developer one day: become a truck driver or a teacher at school. But now I’m seriously considering the career of gambling addiction therapist for elves. A game on day 2, scratchcards on day 4… are we going to deal with Lapland Hold’em and slot machines too?
Overall, it was a pretty easy task – the only performance risk was the fact that for every game card an intersection of two sets of numbers was needed. However, thanks to them being relatively short, the result was calculated quite fast.
Min | Avg | Max | |
JIT | 0.000452s | 0.000495s | 0.000603s |
No JIT | 0.001110s | 0.001167s | 0.001368s |
JIT is particularly strong today, bringing over 2x improvement!
I’ve chosen to keep the counts of each card in a list, because it would be sequentially processed anyway:
I’ve tried once again to make the parsing faster with regular expressions, assuming that with as simple pattern as `\\d+`
, it could be even faster than matching binaries one character at a time. Eventually, binary matching won once again.
It was an interesting experiment to check if the new type of sets (“version 2”), introduced in OTP 24, could provide better performance. I wasn’t sure about it, since we’re operating on very short lists. Empirical proof has shown that indeed the overhead of creating new structures, finding the intersection and retrieving its size is more expensive after all than plain membership check in short lists.
Min | Avg | Max | |
Initial, JIT | 0.000431s | 0.000460s | 0.000511s |
Regular expression, JIT | 0.006323s | 0.006593s | 0.008358s |
Sets V2, JIT | 0.000618s | 0.000721s | 0.000943s |
Initial, no JIT | 0.001104s | 0.001144s | 0.001325s |
Regular expression, no JIT | 0.007620s | 0.008014s | 0.011486s |
Sets V2, no JIT | 0.001285s | 0.001420s | 0.001826s |
I never really liked gardening. Too many spiders and insects. I’m glad that AoC only expects me to solve a puzzle, not weed the weeds or dig through an acre of land.
When I successfully solved the example and saw the real input file, I was SO HAPPY that I did not decide to store the mappings as discrete pairs and kept them as tuples. First of all, the calculations would be less efficient and secondly they might not even fit in the memory.
I implemented the parsing of mappings in a way that would allow just any series of transitions, as long as they begin with seeds and finish with locations. After all, I was very suspicious of AoC maintainers’ malicious ideas (Day 1, part 2 took its toll!) and I preferred to write a code that can handle mappings provided out of order and that are not restricted to the ones from specification (seed, soil, fertilizer, water, light, temperature, humidity and location), instead of being forced to make any necessary adjustments later.
Min | Avg | Max | |
JIT | 0.000086s | 0.000095s | 0.000126s |
No JIT | 0.000216s | 0.000240s | 0.000389s |
I was scared that I might not be able to describe and present the solution to you on time. Spoiler alert: I made two trivial mistakes and I was able to spot them after a careful analysis of the code.
To keep the code efficient, I did not expand the seed ranges and took the challenge to keep them as `{Start, End}`
tuples. My implementation gave the correct answer for the example input but was wrong for the real one. In order to get it right, I had to realize two facts:
I’ve included the detailed explanation of the recursive solution inside the code as comments, so I encourage you to take a look. It does feel nice to have a solution with sub-millisecond execution time.
Min | Avg | Max | |
JIT | 0.000107s | 0.000125s | 0.000196s |
No JIT | 0.000262s | 0.000283s | 0.000331s |
Upon inspection, you’ll notice that the secret ingredient is the decomposition of the problem: when you consider all possible relative “positions” of a range, it’s pretty straightforward to deal with one case at a time and just delegate (possibly split the range) the rest to the other function clauses.
Hello boys and girls! Do you remember math classes at school? Good, because we’ll be using the square function today!
I had a hunch that besides a brute force approach, today’s puzzle should be solvable with an analytical solution. Initially I thought I’ll have to dig out knowledge about function derivatives but eventually much simpler tools were needed.
So, ignore units completely and assume the following symbols:
t = tP + tM
tM = t – tP
tP * tM = d
tp * (t – tP) = d
-tP2 + t * tP – d = 0
Now, we need to know the lowest and the highest tP, that match the race’s best distance. We calculate this by finding the square function zeros.
Δ = t2 – 4 * (-1) * (-d) = t2 – 4 * d
tPmin = ( -t + sqrt(t2 – 4 * d) ) / -2
tPmax = ( -t – sqrt(t2 – 4 * d) ) / -2
Of course this gives us the answer in real numbers space, so we need to get a ceiling of the minimum and floor of the maximum.
Today’s task doesn’t seem to be that hard to implement with brute force too, as the numbers aren’t as big as they sometimes tend to be in these challenges. However, it’s always a good idea to avoid this approach in AoC because usually there is a smarter way to crack the problem.
Min | Avg | Max | |
JIT | 0.000000s | 0.000001s | 0.000023s |
No JIT | 0.000001s | 0.000002s | 0.000010s |
Thanks to the analytical solution, part 2 was as easy and fast as part 1 and it required only minor changes to the code.
Min | Avg | Max | |
JIT | 0.000000s | 0.000001s | 0.000030s |
No JIT | 0.000000s | 0.000001s | 0.000007s |
It’s interesting that in both parts there was at least one execution that took more with JIT than without it. It may be caused by the initial cost of native compilation of the code.
I did mention “Lapland Hold’em” on day 4, didn’t I? They do have gambling issues and when I become a therapist for elves – I’m going to be insanely rich!
When parsing, I decided to convert everything to numbers for easy comparisons (and there will be lots to match and compare!). I suspect that the AoC invented their own poker rules, so people won’t be able to reuse existing poker libraries (I’m sure there are some out there!). Lucky for us, because these are fairly simple. Hierarchy (not to mention card colors – we don’t have these in this puzzle!) in the classic game is more complex and would be more challenging to implement.
Min | Avg | Max | |
JIT | 0.000593s | 0.000649s | 0.000832s |
No JIT | 0.001191s | 0.001299s | 0.001626s |
It was a real fun to figure out the correct logic to take Joker into account. At first I tried to match on the counts list, just like in part 1 but I quickly realized that it should be easier with Jokers’ count taken from the list. With these pieces of data:
Min | Avg | Max | |
JIT | 0.000626s | 0.000675s | 0.000858s |
No JIT | 0.001250s | 0.001303s | 0.001440s |
When I started reading the description of the puzzle, I expected that we’ll need to do a graph search to find the correct route. Fortunately, it was way easier, as we are provided with the sequence of movements, it only has to be repeated until we reach the goal.
Min | Avg | Max | |
JIT | 0.000885s | 0.000935s | 0.001099s |
No JIT | 0.000975s | 0.001030s | 0.001167s |
My workmate Dennis wrote today: “They should call this the Advent of Math.”
As one of the steps towards the final solution, I made the code print the individual paths’ lengths for each starting node. In my case these were [19637, 21251, 12643, 15871, 19099, 11567]. It was very likely it will require lots of iterations for these to “meet”, so I tried to find a shortcut. I decided to use school math again and this time it was even more basic knowledge than square functions: lowest common multiple.
Lucky for me – it did work! Later, my colleagues made me aware that if the input was more tricky, this wouldn’t work. One of them was even disappointed that there were no ugly traps in the input data today.
To be more specific – LCM can be used only with paths repeating the same way every time. If, for example, going from AAA to ZZZ took 12345 steps in the first iteration and 54321 in the next one, things would get more complicated.
Min | Avg | Max | |
JIT | 0.005665s | 0.005769s | 0.005987s |
No JIT | 0.006812s | 0.006934s | 0.007138s |
… is it going to be a gambling puzzle again tomorrow? Perhaps dice or roulette simulator? Blackjack?
Huh? No gambling today? I’m almost disappointed. Well, today we’re analysing the environment of an oasis and I suppose tomorrow we’ll be calculating how to properly use hot air to reach the metal island with hang glider… so perhaps on Monday…?
The core of the solution was the proper decomposition of the problem into recursive steps. At the top, we need to know what is the next value in the sequence of numbers (the original input line). In order to compute it, we need to add the next delta to the last number of the sequence. The next delta can be acquired by treating the list of deltas as a normal input sequence, of course if there is at least one non-zero value.
I fell into two traps today:
Min | Avg | Max | |
JIT | 0.000398s | 0.000414s | 0.000520s |
No JIT | 0.000961s | 0.000985s | 0.001103s |
This was most likely one of the easiest modifications required to solve part 2. All I had to do was to reverse the input lists.
Min | Avg | Max | |
JIT | 0.000402s | 0.000427s | 0.000571s |
No JIT | 0.000966s | 0.000985s | 0.001118s |
Maybe it’s just my impression but the difficulty progression this year is seriously broken. After quite trivial day 9, today it took me several hours to get everything right. Obviously, traveling by train and programming isn’t the most concentration-friendly combination, but still I kind of expected to complete the puzzle in 3 hours at most.
I do think that probably there is a more efficient data structure for the parsed input of today’s task. Certain pieces of code can most likely be refactored to not attempt being unnecessarily generic. For example: we do know that we won’t have more than 4 paths outgoing from the starting point.
Ultimately, I believe that the way I used elaborate structures and a certain dose of boilerplate, will make the whole implementation easier to understand while maintaining decent performance.
Min | Avg | Max | |
JIT | 0.004062s | 0.004421s | 0.005392s |
No JIT | 0.005224s | 0.005607s | 0.006058s |
Oooh, this one gave me more than a bit of a headache. I suspect that the problem of calculating the area of an arbitrary 2D shape has already been long solved but I took the challenge to figure this one out by myself.
First of all I had to modify the code from part 1 to store the full path traversed, so all pipes that belong to the loop are known.
I decided that it’s probably a good idea to check the area inside the loop by “tracing” one vertical slice at a time, as it’s possible to come up with rules to identify which parts are “inner” and which are “outer”.
This approach did work but I rewrote the logic several times from scratch until I figured out the correct collection of function clauses:
Of course the actual implementation is a bit more nuanced but these are the core rules to follow.
The most significant mistakes I made today:
Min | Avg | Max | |
JIT | 0.010449s | 0.010908s | 0.011680s |
No JIT | 0.012210s | 0.012841s | 0.013636s |
There was nothing overly complex with today’s puzzle. After the classic operation of reading the locations of galaxies into the map, applying a proper shift is pretty straightforward. Then all we have to do is get the taxicab distance between every pair.
Since today’s description is pretty short and the puzzle is about observing the galaxies, let’s spend a minute or two on contemplating November picture of the month from JWST:
[source: ESA/Webb]
I suppose I’m not the only one to see a serpent in the lower part. It seems that part of our Universe is used by some cosmic entity to play Snake and a fraction of it is in fact a huge Nokia 3310 emulator.
Min | Avg | Max | |
JIT | 0.008142s | 0.008496s | 0.011024s |
No JIT | 0.017631s | 0.018599s | 0.020832s |
As easy as part 2 on day 9. Offset used for calculating final Xs and Ys has to be increased by 999,999 instead of 1 for every “expansion”, so it’s not even a full one-liner.
Min | Avg | Max | |
JIT | 0.008112s | 0.008603s | 0.010933s |
No JIT | 0.017386s | 0.018262s | 0.023179s |
I started today’s challenge with a brute force approach – I tend to begin the puzzle solving early in the morning, before it’s time to wake up my kids to school. At the same time – it’s definitely not my optimal time of the day to think. Therefore, brute force was the best I could come up with at first. And I hoped for the best.
Foolish…
While this approach was very slow but acceptable for part 1, obviously part 2 had to include a much more complex input. The superficial simplicity of the raw input file was a trap!
I came up with a recursive solution that takes the first “checksum” (number of damaged hot springs) and finds out all possible ways it can fit into the sequence. For each of them, the rest of the sequence is checked against the rest of the checksums. And so on, until we exhaust all checksums – if the remaining sequence doesn’t contain any “bad” springs, we’re good.
This approach was very fast in part 1 but when I tried it for part 2, it was painfully slow. With some extra debugging, I saw that it’s chewing through each line but some of them take many seconds to process.
I came to the conclusion that I should try memoization – there was a very good chance that certain sequence+checksums combinations do repeat. Adding a simple cache in ETS was very straightforward. And BOOM! Part 2 cracked! Although, I admit, when I saw 850504257483930 as my output, I was seriously afraid it’s way too high.
Min | Avg | Max | |
Brute, JIT | 0.970235s | 1.005418s | 1.101696s |
Brute, No JIT | 1.595984s | 1.672389s | 1.925581s |
Efficient traverse, JIT | 0.003736s | 0.004082s | 0.005317s |
Efficient traverse, No JIT | 0.008716s | 0.009372s | 0.011776s |
Memoized traverse, JIT | 0.009096s | 0.010023s | 0.013193s |
Memoized traverse, No JIT | 0.011752s | 0.012488s | 0.014494s |
It’s noteworthy that the memoized version is noticeably slower in this case. It’s a very good example that adding a cache doesn’t always guarantee improved performance (although often it does). In unlucky cases – it may even slow things down!
Min | Avg | Max | |
Memoized traverse, JIT | 0.771392s | 0.823118s | 0.909711s |
Memoized traverse, No JIT | 0.887931s | 0.911738s | 1.002154s |
It’s a bit late when I’m writing it but I’m going to check tomorrow how the memoized version performs with parallel processing! I’d like to try the following:
Do you have your favourite? My bet is on “4 processors, shared cache” – less context switching and it’s probably more likely to find a value cached by another processor. Of course on the other hand we’ll have a certain overhead of parallel access to ETS. Perhaps it’s a good idea to try Persistent Terms too?
Since the task involved lots of comparisons between lines that consisted only of two possible values, I decided to play with bitmaps today. Simply put, “..#.” would be converted to 0010. Integer comparisons are way cheaper than matching two lists, maps or other complex data types.
Min | Avg | Max | |
JIT | 0.000969s | 0.001066s | 0.001594s |
No JIT | 0.001432s | 0.001507s | 0.001705s |
At first, it seemed to be very expensive to introduce mirror cleaning to the current logic. However, when I examined the provided examples once again, I checked if there was exactly one reflection in each pattern in part 1 after all. Since it turned out that it is the case – it made the process way easier, as I could discard all the ideas I had about multiple reflection candidates, looking for the longest one etc.
I chose to continue my fun with binary operations, as verification whether two lines/rows have only one symbol different can be performed by XORing the values and checking if the result has only 1 bit set (i.e. only one bit was different in the input integers).
Min | Avg | Max | |
JIT | 0.000995s | 0.001046s | 0.001255s |
No JIT | 0.001513s | 0.001561s | 0.001741s |
I promised to benchmark the parallelized version of day 12 part 2 solution. To keep the table at a reasonable size, I’m not going to test “No JIT” variants this time. Here it comes!
Note: Persistent term has only “Avg”, as it is very time consuming to clean the created persistent terms. Therefore, I’d have to start the script from scratch on every attempt and it won’t be comparable to the other results.
Min | Avg | Max | |
Original | 0.817489s | 0.896030s | 1.093369s |
4 processors, shared cache | 0.676303s | 0.832595s | 1.285685s |
4 processors, shared cache, read concurrency | 0.735774s | 0.858655s | 1.215173s |
4 processors, shared cache,write concurrency | 0.293589s | 0.329503s | 0.404031s |
4 processors, individual cache | 0.195259s | 0.237037s | 0.267828s |
4 processors, persistent term | – | 1.227170s | – |
one per line, shared cache | 0.617219s | 0.727700s | 1.056201s |
one per line, shared cache, read concurrency | 0.619448s | 0.832244s | 1.472503s |
one per line, shared cache, write concurrency | 0.265507s | 0.304517s | 0.419459s |
one per line, individual cache | 0.164978s | 0.181643s | 0.208810s |
one per line, persistent term | – | 19.110257s | – |
Mirrors’ alignment, just like after the launch of JWST! Well, the telescope is not calibrated with rocks. Fortunately.
Min | Avg | Max | |
JIT | 0.000548s | 0.000627s | 0.000764s |
No JIT | 0.000680s | 0.000787s | 0.001001s |
I had enough off-by-one mistakes for a lifetime.
So we have to perform lots of rotations in this part. It’s quite obvious that 4 billion transformations of a large table is not something we can brute force. At least not with an average hardware.
We had to implement moving the rocks in all 4 directions. I decided it’s probably better to rotate the platform clockwise and always slide the rocks “up”. Now I’m pretty sure that I made my life harder with this, since rotating arrays represented as a list of lists of positions is possible but requires more mental effort than just sliding in 4 different directions.
Nevertheless, the number of iterations clearly indicated that the platform arrangements had to loop at some point. But the important “gotcha” today was the fact that the loop doesn’t begin from the first iteration – there is an offset.
And this is where I made lots of tiny mistakes in my implementation: offset was off by one, the loop length was off by one, the number of remaining steps was off by one. Or by two or three at some point too.
The good news is that this is the last island! Now we’re slowly going to retrace our steps!
Min | Avg | Max | |
JIT | 0.251737s | 0.268959s | 0.315577s |
No JIT | 0.296287s | 0.318066s | 0.370740s |
Today’s puzzle was so easy that I’m seriously concerned about the possible difficulty level on Saturday. I learned my lesson yesterday and I did not try to over engineer the solution with ultra efficient data structures and I went for a plain map of lists and used standard list manipulation functions from OTP. Lucky for me, there were absolutely no performance traps in the input (or at least I didn’t encounter any).
Since the solution is pretty straightforward, let’s visit our science corner again and in reference to Day 14, talk about “Archimedes’ Heat Ray”. Some of you may be familiar with the concept but let’s recap.
Certain sources describe a powerful weapon used during the Siege of Syracuse by the Greek to defend against Roman ships. It is claimed that 70 mirrors were used to focus the sunlight in order to set enemy vessels on fire. It was verified more than once by “Mythbusters” that this was not only unlikely (only a minor damage was done over a relatively long time and under perfect conditions) but also impractical (it’s easier to burn a ship with more conventional means).
I suppose that the usage of focusing lenses could help to a degree but I think we can conclude that the Lava Island’s devices are heavily enchanted to enhance sun rays power output!
Sources:
Min | Avg | Max | |
JIT | 0.000231s | 0.000276s | 0.000444s |
No JIT | 0.000549s | 0.000569s | 0.000709s |
Min | Avg | Max | |
JIT | 0.002239s | 0.002878s | 0.004472s |
No JIT | 0.003118s | 0.003938s | 0.007493s |
Due to the relaxed mood of Saturday and the nature of today’s puzzle, I decided to have some fun.
Have you ever played these puzzle games where you have to use various optical instruments to manipulate the laser light to reach certain targets? Usually it involved not only plain mirrors or splitters but also prisms, filters etc. Although I don’t remember any specific name, I did enjoy solving them and AoC brought back these memories!
I am aware that probably it would be more optimal to keep a list of concurrent rays (their position and movement vector) but we’re implementing the solution in Erlang – let’s model each ray as a process!
We begin with a single ray/process. Bouncing off the mirrors is easy. In case of encountering a splitter, two new processes are spawned, heading in opposite directions. The “old” process waits for them to finish. A common ETS table is maintained as a global list of energized tiles (if a ray passes a tile that was energized in the same direction – the ray may be terminated).
Min | Avg | Max | |
JIT | 0.017631s | 0.105174s | 0.221526s |
No JIT | 0.018842s | 0.024031s | 0.045180s |
Part 2 uses exactly the same ray tracing method, with the only difference being that now we have many more initial, parallel rays! Each initial ray has an individual ETS table assigned, so states do not collide.
Solution is picked from the complete list of all results from all initial rays.
Min | Avg | Max | |
JIT | 5.526677s | 5.796180s | 6.208749s |
No JIT | 5.783146s | 6.083252s | 6.757137s |
The 17th puzzle made me question my CS degree. Although I understood pretty quickly that I’m dealing with a directed weighted graph and I’m going to need an algorithm to find the shortest path. It’s been too long since I needed one but a quick search refreshed my memory – I had to use Dijkstra’s Algorithm.
However, Mr Dijkstra didn’t have to solve Advent coding problems. At least I think he didn’t. I struggled quite a lot with finding a good modification of the base version of the algorithm. I did suspect that I can’t operate purely on `{X, Y}` vertices but what would be a good representation that won’t explode the memory?
With a trial and error and a bit of inspiration from Reddit, I understood that each vertex consists of:
This means that each 2D coordinate can have 4 (up, left, right, down) * 3 (crucible can move in a specific direction only between 1 and 3 tiles) = 12 variations. Fortunately we don’t have to consider all of them but instead of naively populating the priority queue with all combinations we need to generate neighbors of the current node on the fly: one along the movement vector if less than 3 moves were made, plus node to the left and to the right of the vector (with new vectors and tile counter reset to 1).
When all vertices are visited (i.e. no more valid neighbors to generate), we have to pick the lowest distance of all “flavors” of the bottom right corner from the distance table.
Min | Avg | Max | |
JIT | 10.285737s | 10.386970s | 11.041682s |
No JIT | 13.660166s | 13.760896s | 14.031004s |
Fortunately the only required change here is the way neighbors are generated. Instead of always generating “left” and “right” turns plus “forward” until the 3rd tile in a row, now we don’t generate “left” and “right” until the 4th tile (exclusive). Also, “forward” moves are generated until the 10th tile (exclusive).
While the solution worked, I considered its performance unacceptable. My primary suspect was the priority queue implemented on top of an ordinary sorted list. When I replaced it with the implementation that uses `gb_trees`, the script became over 30x faster!
Min | Avg | Max | |
Priority queue on list, JIT* | – | 95.956005s | – |
GBT priority queue, JIT | 2.857002s | 3.075149s | 3.250199s |
GBT priority queue, No JIT | 3.127588s | 3.467205s | 3.767187s |
* Since a single execution of this version takes about 90-100s, I’m providing the result of just a single run as “Avg”.
I’ve started solving today’s puzzle with printing my input trench in the terminal – just to ensure there are no nasty edge cases. Lucky for me – I saw a “boring” polygon with no surprises like intersecting or overlapping edges.
Staying true to my recent conclusion to avoid premature optimizations, I’ve reused the algorithm from day 10. Actually, I’ve rewritten it to look nicer, but the core remained almost the same. The important difference is the fact that edges count into the area.
Min | Avg | Max | |
JIT | 0.002791s | 0.003466s | 0.005258s |
No JIT | 0.003157s | 0.003568s | 0.004584s |
It’s SO OBVIOUS that AoC deities love throwing huge integers at us out of nowhere. I knew these hexadecimal “colors” in the input file look suspicious but I admit this caught me a bit off guard.
Human beings learn every day and after today I’m richer with knowledge about Shoelace Formula and Pick’s Theorem. It is possible I came across them in the past but well… it’s easy to forget about things we don’t use every now and then.
After all, it wasn’t rocket science – simply a matter of translating a well-known and proven formula into Erlang.
Min | Avg | Max | |
JIT | 0.000226s | 0.000272s | 0.000474s |
No JIT | 0.000353s | 0.000396s | 0.000511s |
When an Erlang developer sees the requirement to “send parts between workflows”, it’s very hard to resist the temptation to model each workflow as an independent process. I couldn’t.
I chose to register each workflow process under its name from the input file. Besides the defined ones, my code also spawns special ‘A’ and ‘R’ processes and they forward all parts to the parent including their name as a tag.
Min | Avg | Max | |
JIT | 0.003807s | 0.005190s | 0.010223s |
No JIT | 0.004412s | 0.007273s | 0.015742s |
I’ve kept the original architecture of running a process per workflow but their implementation had to significantly change.
Instead of a stream of parts from the input we send a single part to the “in” process and instead of specific ratings, it includes valid ranges for each parameter (so at first all ranges are [1, 4000]). Then, each workflow for each condition in its list checks if:
The last condition is the one that I got wrong at first and it led to wasted hour or two on the attempts to properly calculate the number of combinations. The part can “partially” satisfy the condition when the compared value is inside the range. In this case the part needs to be “split” into a variant that passes the check and one that fully fails it. I made the mistake of using the unchanged range for the latter one, while we need two new ranges. As a result, I got lots of overlapping ranges.
Collection of results is less elegant than in part 1, because there is only one input part and a completely unknown number of output ones. I solved it by keeping a shared counter in ETS. It keeps the number of parts still processed by workflows.
Min | Avg | Max | |
JIT | 0.003988s | 0.006356s | 0.012356s |
No JIT | 0.004298s | 0.008020s | 0.022086s |
It was so tempting to implement a parallel solution today as well! Unfortunately due to the cycles in the input and the explicit requirement to process the signals in order, I had to stick to the boring loop with a queue of signals.
Min | Avg | Max | |
JIT | 0.014104s | 0.014916s | 0.017333s |
No JIT | 0.023655s | 0.024460s | 0.028135s |
Once again my attempt to be too clever and cautious backfired.
Ultimately, all of these approaches failed. Then I saw a visualisation of the input made by one of the Redditors and it looked just like the graph I started sketching on a sheet of paper. I realised that the layout of the graph has a pattern and the whole problem can be decomposed into simulating 4 subgraphs.
Conveniently, each subgraph emitted a single low signal after a long series of high signals. Also conveniently, modules between `rx` and subgraphs are always connected in a way that emits low signal to `rx` only when all subgraphs emit low at the same time.
It means that everything narrows down to calculating Least Common Multiple of all 4 cycles. Generic solutions are absolutely not necessary!
Min | Avg | Max | |
JIT | 0.027580s | 0.030409s | 0.037126s |
No JIT | 0.054880s | 0.057583s | 0.063418s |
Breadth First Search again but with a twist: we can go backwards. Naive solution was efficient enough.
Min | Avg | Max | |
JIT | 0.055365s | 0.067503s | 0.099729s |
No JIT | 0.063782s | 0.069720s | 0.086406s |
This was probably the most exhausting task so far.
At first, I tried to figure out a generic solution. Well, I had one but it was absolutely not fast enough. I saw at least one claim in a Reddit megathread, that one exists but I didn’t investigate the provided code. I focused on my implementation.
It became quite clear that certain properties of the input data have to be exploited. These are:
For performance reasons, instead of keeping the list of complete states of each garden, I’m using the list of numbers of the flip-flops added in each simulation step. It’s way more compact and it’s relatively easy to calculate the final state after an arbitrary number of steps. You only need to be VERY careful about “off by one” mistakes.
Then, in order to get the result, we need a sum of “on” flip-flops in:
Min | Avg | Max | |
JIT | 0.132134s | 0.150313s | 0.209061s |
No JIT | 0.158194s | 0.176707s | 0.241650s |
The key to the correct solution is to understand what “snapshot” means. It’s the state of all bricks in the same point in time, not the list of subsequent falling bricks. It means that in order to perform the correct 3D Tetris simulation, we need to sort the bricks by their bottom Z coordinates.
For the space model I concluded that I don’t need to store a complete map of occupied locations, because to find where a brick stops and what bricks are going to support it, I only need to remember the highest Z for every {X, Y} pair and what brick is already there.
This worked pretty efficiently, because the sizes of the bricks were not that big. Otherwise, we’d probably need to play around with ranges again.
Also, for the purpose of this solution my code maintains the complete map from brick ID to the list of bricks that support it and those that are supported. What heavily payed off in part 2.
Min | Avg | Max | |
JIT | 0.002166s | 0.002955s | 0.009769s |
No JIT | 0.003778s | 0.004995s | 0.011249s |
Remember kids – always ensure you’re executing the correct file!
My first implementation had a bug that caused it to run over 2 minutes (although it produced a correct result!). I thought the slowness is caused by using lists a lot and I wanted to try to improve the performance by replacing them with sets and maybe to add some caching. So I copied the original to the new file. I’m sure you can guess which one I kept running by mistake and what level of frustration I reached?
By the way: The faulty line was the recursive call to the fall function. I forgot to deduplicate the list of fall candidates for the next iteration.
All in all, given the second parts of the recent puzzles, this one was really benign! The core of the solution is to check if the supporting bricks for the current candidates are already removed. Then the candidate is removed as well.
Min | Avg | Max | |
JIT | 0.065106s | 0.075434s | 0.100548s |
No JIT | 0.093616s | 0.109218s | 0.138036s |
Naive BFS with traversing one tile at a time worked reasonably fast. I did not perform any extra processing of the input in this part.
Min | Avg | Max | |
JIT | 0.223056s | 0.272931s | 0.359681s |
No JIT | 0.288227s | 0.410099s | 0.576530s |
I underestimated how the “slopes” reduced the complexity of the problem. The major optimisation I could think of was to transform the “map” into an undirected weighted graph. The vertices are the “crossroads” and, obviously, the weights are the distances between them (excluding starting tile, including the next crossroad one). It is possible to acquire both vertices and distances in a single pass but I didn’t want to introduce the extra complexity and increase the chance of making a mistake.
The solution was produced in a bit over 20s with sets being used to store visited vertices and with a somewhat better result with plain lists. There aren’t that many vertices so we’re still in the sweet spot where list operations (adding a head and checking membership) are faster than sets’ API.
I was wondering if there is any optimization I can introduce. I even drew the complete graph and I couldn’t see any reasonable pattern. Then I checked Reddit megathread and it seems that there is no other way than to perform a plain brute force DFS. In the generic case the longest path problem is NP-hard.
Note: The benchmark of today’s solution has a sample of 10 runs instead of 100.
Min | Avg | Max | |
JIT | 14.803970s | 15.28991s | 15.694286s |
No JIT | 20.148084s | 21.007705s | 22.846742s |
In order to solve this puzzle, we just need to play around a bit with algebraic expressions. Since we only need to take ‘x’ and ‘y’ coordinates into account, we need to find where each pair meets after adding a vector multiplied by a certain factor.
When there is no solution, then it means the lines are parallel. If either ‘a’ or ‘b’ (or both) are negative, it means the lines intersect in the “past”.
Min | Avg | Max | |
JIT | 0.003238s | 0.003538s | 0.005159s |
No JIT | 0.008181s | 0.008480s | 0.011210s |
This time the expressions are way more convoluted!
Min | Avg | Max | |
JIT | 0.000162s | 0.000181s | 0.000266s |
No JIT | 0.000469s | 0.000515s | 0.000615s |
It may be difficult to track what’s happening so let’s explain step by step:
Min | Avg | Max | |
JIT | 0.000162s | 0.000181s | 0.000266s |
No JIT | 0.000469s | 0.000515s | 0.000615s |
I’ve immediately discarded the idea of choosing random edges and testing the graph without them. In the case of my input file there were over 1500 edges! It means that in the worst case I would have to try over 560 million triplets!
At some point I tried to find all adjacent triangles, because this is what would work with the example data. Guess what – it failed miserably with the real file…
My next idea was to perform BFS from every vertex and count how many each edge was traversed in total. With a bit of luck – the three bridges would stand out more or less significantly. After all, they would have to be traversed more often than other edges to reach all vertices.
However, before I implemented this concept, I decided to google a bit about this problem and I learned about Karger’s algorithm. The problem is, it is probabilistic (I don’t think that AoC deities would like such a solution) and has potentially quite high complexity. I saw someone employ algebra and matrix operations to get the solutions but I decided that I left such advanced math long behind me in my university days and it would probably take too much time to relearn all the necessary information.
So, eventually, I tried the “edge heat” approach. To my delight – the bridges were quite nicely visible in the example input and were clearly outstanding in the real one! With a set of three edges it was easy to adjust the incidence map and perform BFS one last time to get the size of both subgraphs.
Min | Avg | Max | |
JIT | 3.564212s | 3.867032s | 5.548451s |
No JIT | 4.533271s | 4.794619s | 5.873216s |
It was quite a ride!
Some days were pure fun, some were a pure headache. If I were to choose the ones I enjoyed the most, these would be: ex aequo first place goes to day 6 (boat race) and day 24 (hailstorm) – for an amusing usage of algebra. On the second place we have day 12 (hot springs) – because it reminds us about the vast benefits of clever caching. The third puzzle I’d like to highlight is day 19 – parts workflows, because the parallel solution was very natural and looked cool in Erlang code.
Among the most dreadful ones, the ultimate winner is the day 21 (infinite gardens), as it relied heavily on the properties of the input and was very difficult to solve without painful and obscure off by one errors. All kinds of 2D plane operations were tiresome too, as these aren’t the kind of problems I deal with in my daily work so I definitely lack practice and knowledge about essential algorithms. Second place belongs to day 17 and its “Clumsy Crucible” – a very unorthodox usage of the classic Dijkstra algorithm was mindblowing in both good and bad ways. Last but not least on the podium is day 14 – dish calibration with rocks. Again, it was too easy to make “off by one” mistakes and it took me a long time to eliminate them.
Also I need to mention that almost every time JIT brough more or less significant improvement. Although most of the software we write in Erlang has a rather different focus than AoC puzzles, it is quite obvious that there is a very high chance your applications will benefit from switching to the most recent OTP releases!
Will we see each other again in the journey through the Advent of Code? Who knows… a lot depends on you and your feedback! Don’t hesitate to share on LinkedIn and other social media and tell us what you think!
Join Lorena in this years Advent of Code 2024. She'll be solving daily puzzles throughout the month of December.
Attila Sragli explores the BEAM VM's inner workings, comparing them to the JVM to highlight their importance.
Pawel Chrząszcz introduces MongooseIM 6.3.0 with Prometheus monitoring and CockroachDB support for greater scalability and flexibility.