extending remark with custom pluginsa deep dive into markdown-to-html conversion with remark

minute read

As developers, we often find ourselves in situations where we need to extend or customize the functionality of existing tools to fit our specific needs or use case.

If you're like me, and recently decided to add Markdown support to your (new) blog (again), then maybe you're in one of those situations right now too.

The Need for Customization

My blog is built with Next.js, and I wanted to be able to write posts directly in my repository, in Markdown. As with most times I've worked with Markdown in the past, I reached for remark as my Markdown processor. And as far as processing Markdown into HTML, it does the job well.

But I had two specific requirements, neither of which remark could handle out of the box:

  • Add unique ID attribute to all headings
  • Wrap tables in a specific div and class

Adding unique ids to headings is important for creating anchor links. This not only makes it easier for users to navigate the post but also helps with SEO, as important elements are more clearly identified and allow linking directly to specific sections. This has always been a subtle nicety that I've appreciated when reading (and particularly sharing) any sort of content on the web.

The table wrapper is necessary for styling. Okay, well it's not necessary, but it makes life a lot easier. In most cases you'll want your table to match the width of the content. And if there's a lot of content, you'll want the table to scroll horizontally. This is especially important for mobile devices, where the screen width is limited. Without that wrapper, making that happen is cumbersome. The wrapper makes it easy.

Building Custom Remark Plugins

So what's involved with building custom plugins for Remark? It's not as complicated as it might sound. Think of the pipeline that the Markdown travels on during it's journey from Markdown to HTML as a straight line from point A to point B. Plugins are like points along the line where you can add, remove, or modify the content.

As the Markdown travels from point A to point B, remark first converts it into what's called an Abstract Syntax Tree (referred to as AST from hereon out). This is a tree structure where each node represents a part of the document, or eventual HTML.

With that in mind, our plugin will need to do three things:

  1. Traverse the AST, and find the nodes we're looking for
  2. Modify the AST, adding, removing, or modifying nodes according to our needs
  3. Return the modified AST, passing it back into the pipeline for the next plugin

With this context, let's take a closer look at the plugins I built for my blog.

Plugin 1: Adding IDs to Headings

The first plugin I created was for adding unique IDs to each heading in the Markdown content. As we know, this can be a useful feature for both users and SEO.

First, I needed to traverse the AST and find all the heading nodes. Although the recursive function might look a bit intimidating, it was a simple matter of checking the type property of each node. If the type is a heading, then the text content is extracted and converted into a slug. The slug can then be assigned to the id property of that same node.

export function remarkAddIdsToHeadings() {
  return tree => {
    const addIdToHeading = node => {
      if (node.type === 'heading') {
        const textContent = node.children.map(n => n.value).join('');
        const slug = textContent
        .toLowerCase()
        .replace(/\s+/g, '-')
        .replace(/[^\w\-]+/g, '');
        node.data = node.data || {};
        node.data.hProperties = node.data.hProperties || {};
        node.data.hProperties.id = slug;
      }
    };

    const traverseTree = nodes => {
      nodes.forEach(node => {
        addIdToHeading(node);
        if (node.children) {
          traverseTree(node.children);
        }
      });
    };

    traverseTree(tree.children);
  };
}

But let's dig a bit further into what's happening in this plugin, especially in the recursive traverseTree function.

  1. Near the last line, the traverseTree function is called with an argument of tree.children. This is the root node of the AST, and it's an array of all the top-level nodes in the document.
  2. The traverseTree function then iterates over each node in the array, using a forEach loop.
  3. In the loop, the function checks if the current iteration's node is a heading. If it is, it extracts the text content and generates a slug from it. If it's not, it does nothing.
  4. If the current node has children of its own, then the traverseTree function is called again, this time with the current node's children as the argument. This is where the recursion happens. The function will continue to call itself until it reaches a node that has no children, and has moved all the way down the tree, thus processing all the data in the AST.

Plugin 2: Wrapping Tables in a Div

The second plugin was to wrap tables in a div with a specific class for styling purposes. This was a bit more challenging as it involved manipulating the tree structure more significantly.

export function remarkWrapTables() {
  return tree => {
    const processNode = (node, index, parent) => {
      if (node.type === 'table') {
        const wrapper = {
          type: 'div',
          data: {
            hName: 'div',
            hProperties: { className: 'overflow-x-auto' },
          },
          children: [node],
        };
        parent.children.splice(index, 1, wrapper);
      }
    };

    const traverseTree = (nodes, parent) => {
      nodes.forEach((node, index) => {
        if (node.children) {
          traverseTree(node.children, node);
        }
        processNode(node, index, parent);
      });
    };

    traverseTree(tree.children, tree);
  };
}

Let's break down what's happening in the processNode portion of this plugin.

  1. The function is called with three arguments: the current node (which we get from the recursive traverseTree function), the index of the current node in the parent node's children array, and the parent node itself.
  2. The function checks if the current node is a table. If it is, it creates a new node, which will be the wrapper div element. This node has a type of div, and a children array with the current node as the only item.
  3. The function then replaces the current node in the parent node's children array with the new wrapper node.
  4. Since the wrapper node's children is actually the current node that the wrapper is 'replacing', the output is actually a wrapping of the current node.

Say that five times fast.

Implementing the Plugins

Okay. We built some cool shit, learned about Abstract Syntax Trees (ASTs), and are about to make our rendered Markdown content look oh-so-fresh. It's time to plug in the plugins.

Below is a simplified example of my Markdown-to-HTML pipeline, with some extra comments to better explain what's actually happening with each step.

import { remark } from 'remark';
import html from 'remark-html';
import gfm from 'remark-gfm';
import matter from 'gray-matter';
import { remarkWrapTables } from './customRemarkPlugins/remarkWrapTables';
import { remarkAddIdsToHeadings } from './customRemarkPlugins/remarkAddIdsToHeadings';

// Read the Markdown file's raw contents from the file system
const fullPath = path.join(postsDirectory, `${fileName}.md`);
const fileContents = fs.readFileSync(fullPath, 'utf8');

// Extract frontmatter (metadata) and Markdown content using gray-matter
const matterResult = matter(fileContents);

// Process the Markdown content with remark
const processedContent = await remark()
  // At this point, remark converts the Markdown content into an AST (Abstract Syntax Tree)
  .use(html) // Prepare the AST for HTML conversion
  .use(gfm) // Apply GitHub-Flavored Markdown transformations on the AST
  .use(remarkAddIdsToHeadings) // Custom plugin to manipulate the AST by adding IDs to headings
  .use(remarkWrapTables) // Custom plugin to manipulate the AST by wrapping tables
  .process(matterResult.content); // Process the Markdown content (AST) and convert it to an HTML string
const contentHtml = processedContent.toString(); // Final HTML content, ready for use

Keep in mind, if you try to replicate this example in Next.js, you'll need to use getStaticPaths and getStaticProps to generate the HTML content at build time.

Conclusion

These custom plugins not only solved a problem for me, but gave me a much welcomed chance to really dive into how remark and plugins work behind the scenes. Although remark was what I reached for first, this extensive flexibility ensures it's what I'll reach for next time too.