Migrating Theodo's tech blog from Gatsby to Astro


A little context about the tech blog and the migration

I was working at Theodo, a high-end service company. As part of our diverse missions, we would often have to solve some interesting business challenges using interesting technical solutions. To share those learnings with the world, we have a technical blog at https://blog.theodo.com/.

Over the years, the blog was migrated from WordPress to Gatsby for blazing fast performance and nice features. Blazing being about a medium 60 lighthouse score for a mostly static site, not very blazing. Sadly, when writing new articles, a slightly annoying 5 minutes boot time was required before any change could be shown. It was about 2 years after we started using Vitejs in production for most SPA clients, and the slowness was terrible.

A nice workaround was to only load the last given month of articles, to reduce Gatsby’s internal query resolution in development, allowing a reasonable boot time.

More reasons to quit Gatsby

After a few articles on the blog, I was curious. Could I improve the developer experience while keeping the user experience at least at the same level? I wanted to dive into our Gatsby blog internals to find if I could improve query resolution and dev server boot time. Sadly, after learning about Gatsby and its abstraction of using a GraphQL layer to aggregate all data from any data source, I was at a loss. GraphQL can be great in some cases, like using Relay to send only one request on a client to load a full page worth of data. GraphQL as an abstraction layer to query data over markdown files, not as great. Looking at those queries, it felt like seeing an ant being squashed by a steamroller.

Another aspect of the long build time was our continuous deployment pipeline: the site was deployed on Cloudflare Pages, and the build time was often big enough to encounter timeouts.

Gatsby itself seemed less and less maintained, with very little activity on the main repository.

Lastly, for a blog consisting mainly of text and images, we were shipping the full power of React.js. Quite overkill to toggle a search bar.

Reasons to consider Astro

Astro, on the other hand, had excellent press. Beginning with pristine documentation, including a migrate from Gatsby entry. From a few experiments on other projects, it was quite a solid framework with a great developer experience. Importantly, there was little glass ceiling in terms of framework features: we get to keep React.js and JSX.

An incredibly simpler model regarding markdown files as blog post entry collections with built-in validation of frontmatter, excellent typing, etc.

Built on top of Vite, meaning a faster boot time in development (spoiler: about 20s for 200 entries with no optimizations), meaning fast hot module replacement.

A golden path regarding performance, with the island architecture, meaning no client-side JavaScript for static content.

The migration from Gatsby to Astro

Migrating from StyledComponents to Tailwind

First of all, as the codebase was quite old and as I didn’t want to bring more tech than what was required, I started to migrate my few React components on Gatsby from StyledComponent (a great CSS-in-JS solution) to Tailwind CSS. Mostly because I wanted to see if I could measure the impact of moving from CSS-in-JS to pure CSS. The second goal was to allow Astro to run without client-side JS. To do so, I either needed to set up StyledComponent in Astro or migrate to Tailwind. Tailwind is documented and largely used on most projects now, and I was curious about the performance impact.

The migration was actually easier than anticipated. Mostly done in the span of a few days, I had the privilege of allowing myself breaking changes on some design (like using Tailwind prose for blog posts).

One last “find the 7 errors game” later, the migration was live.

For curiosity’s sake, I measured the performance of some pages. The verdict was both surprising and anticlimactic. For such simple pages, there was absolutely no difference performance-wise. User experience-wise, I used the opportunity to fix a few flashes of incorrectly styled content (think JS responsive), meaning a slight improvement unrelated to Tailwind.

My biggest known dependency was solved, it was time to write some Astro!

Creating an Astro shell

After a quick setup using Astro create-app and adding React and Tailwind dependencies, it was time to recreate the shell pages. I copied 2 posts and got to work on the main homepage, which is just a basic post listing page. Let’s go!

Alright, I copied over my React component pages, commenting Gatsby GraphQL page queries to convert them into Astro content collection syntax… And it’s already screaming at me “this blog post has no author”! Quick check: did I forget to copy an author? No, it’s indeed missing and not accounted for in Gatsby. Thanks to TypeScript and Astro strict typing, 5 minutes in, I already know of an unaccounted-for edge case.

After working out how to fetch my posts and display them on my homepage, to match the current blog features, I needed some more grunt work. I needed to create post excerpts.

Alright, to do so, I simply needed to render the post and take the first K words of the content. Some sanitizing later and it’s done.

I have a few more pages to write: posts by author, by category.

And it’s quite simple to write as well: it’s just a JS .filter() on a list after all. Astro even documents the use case or better yet, creating filters with an arbitrary list of tags and paginating those.

Okay, just need to create an RSS feed, there is an Astro integration so it’s quite trivial.

Migrating data

Okay, so I have a nice blog with 2 posts. I had to adapt those posts a bit to be rendered correctly (edit image path to be relative and not just image slug, think ./my_image.png vs my_image.png which isn’t shocking).

And that’s where shit hit the fan. Astro (in TypeScript strict mode) is so incredibly much stricter than Gatsby. It’s an incredible pain to migrate those files, there are over 200 blog posts and if image names can be automated easily (basic glob + regex stuff), some changes need to be dealt with by hand.

A ton of posts are simply referencing missing images. A quick check on the live blog allows me to verify how Gatsby deals with it. It’s simply a broken image, not the best default. Having a “missing image error” blocking the build is a pain when you are migrating all those posts but so incredibly useful to avoid making a typo mistake when contributing.

Some posts are nearly completely opting out of markdown and using a ton of HTML markup to create custom layouts. With Astro, those could just have used MDX, but I’m faced with a migration issue. I need to keep those posts looking similar, but the rework is quite tedious as MDX is quite a strict flavor of markdown. I highly recommend using the VSCode MDX extension which helps quite a bit in detecting broken syntax.

My recommendation is to do those changes using a script. When you have 200+ posts to change, doing this by hand is tedious and error-prone. If a new post is published, you need to import it as well. It’s a bit longer to set up first but having a script that does every modification you need directly allows you to check your data migration and play it easily on the latest posts when releasing the migration.

Post data migration nice to have

After migrating the data, I still needed to address a few features that I didn’t implement yet:

  • Search: Using our old search was not possible nor wanted and I found a great static page search in Pagefind. There is even a simple Astro integration to avoid boilerplate code. The search is both faster than before, accessible through the shortcut / and has a nice excerpt as well as a picture for each result. Nice win!

  • By default, there is no React.js on the client, see results for the impact, but it’s clearly a better golden path for static sites. I even chose to only keep JSX as Astro components to opt-in to a very light Alpine.js client-side library for light interactivity like the search/header.

  • I also needed to keep the infinite scroll on the homepage, authors, and categories pages which allow for scrolling all 200 posts as paginated pages of about 10 posts. To do so, I used partial HTML render routes that I render at build time with Astro (like any other page) and used HTMX to fetch pages on scroll. You can try it by scrolling on the blog homepage to see requests going out when scrolling to load the next page and inject it into the current document.

Result of migrating from Gatsby to Astro

My goal with this migration was to:

  • ✅ Simplify features contribution to the blog: Using basic JS/JSX helps in driving down complexity, TypeScript helps in avoiding edge cases.

  • ✅ At least keep the user experience on par: The blog performance on Lighthouse increased from 69 to a comfortable 99 (migration to Astro + using Partytown for analytics) and a few features/ugly content flash were added/fixed.

  • ✅ Improve developer experience when contributing articles: All posts are rendered in development, page live update is a lot faster on content change, TypeScript helps with typos to avoid missing images.

What I’m less proud of and might have been too much of an experiment:

  • Using Alpine.js/HTMX:❓ Both are quite simple tools but especially HTMX requires thinking a bit differently about how to update frontend UI. This might make the contribution a bit harder. However, it needs to be weighed against the amount of logic that would have been a React.js hook + component to create such an infinite scrolling with the corresponding REST API routes.