How we built a cross-platform ImageNode for Lexical — design decisions and trade-offs

Storyie Engineering Team
8 min read

Lexical has no built-in image node. Here is how we designed one that works identically on Next.js and Expo — shared serialization format, platform-specific renderers, upload lifecycle management, and DOM-free Markdown export.

How we built a cross-platform ImageNode for Lexical — design decisions and trade-offs

Storyie uses Lexical for rich-text editing on both web (Next.js) and mobile (Expo). When we moved from text-only diary entries to photo-supported ones, we had to design a custom ImageNode from scratch. Lexical does not ship one.

This post covers the three decisions that shaped the design: why we split the node class by platform, why the serialization format is the one thing we never split, and how we modeled the upload lifecycle inside the editor state.

TL;DR

  • Lexical has no built-in image node. You build the whole thing.
  • Web uses DecoratorNode (returns React JSX from decorate()). Mobile uses ElementNode (DOM manipulation via createDOM() / updateDOM()). The split is forced by the WebView runtime on mobile.
  • The JSON serialization shape is defined once in lexical-common and imported by both platforms. Neither side can drift without a type error.
  • Upload state lives on the node: status (uploading / completed / failed) and uploadProgress. A tempId field lets the upload callback find the right node to update.
  • Markdown export skips the headless Lexical editor entirely and parses JSON directly — no platform imports, no DOM globals required.

Layer

Package

What it contains

Shared types

packages/lexical-common

SerializedImageNode, commands, Markdown conversion

Web node

packages/lexical-editor

ImageNode extends DecoratorNode, ImageRenderer component

Mobile node

apps/expo/…/dom/nodes/ImageNode.dom.ts

ImageNode extends ElementNode, DOM-based rendering

Why Lexical does not have an image node

Lexical's built-in node set is intentionally headless-safe: text, paragraph, heading, list, link, code block. These nodes can render to a DOM or serialize to JSON without pulling in any rendering framework.

An image node cannot be headless in the same sense — you have to decide how to render it, and the right answer is different on every platform. Lexical leaves the decision to you, which is the correct call.

There are two base classes to choose from:

  • DecoratorNode — lets decorate() return a React element (or any framework component) that Lexical inserts into its rendering tree.
  • ElementNode — renders by constructing DOM nodes directly in createDOM() and updating them in updateDOM().

Which one fits depends on what your runtime gives you.

Web: DecoratorNode

On the web, the Lexical editor runs inside a React tree, so DecoratorNode is the natural fit. decorate() returns a React element; Lexical mounts it into the editor. We get the full React model — hooks, state, context — for the image renderer.

class ImageNode extends DecoratorNode<React.ReactElement> {
  decorate(): React.ReactElement {
    return React.createElement(ImageRenderer, {
      src: this.__src,
      uploadStatus: this.__uploadStatus,
      uploadProgress: this.__uploadProgress,
      nodeKey: this.__key,
    });
  }
}

The ImageRenderer component handles everything visual: a skeleton placeholder while the image loads (using aspect-ratio so the layout does not jump), reduced opacity while uploading, and a red error badge on failure.

<img
  style={{
    aspectRatio: width && height ? `${width / height}` : "auto",
    opacity: uploadStatus === "uploading" ? 0.6 : 1,
    maxHeight: "600px",
    objectFit: "contain",
  }}
/>

The maxHeight: 600px cap is a diary-specific UX call — we do not want a portrait photo from a phone filling the entire viewport. objectFit: "contain" preserves the aspect ratio within that cap.

Mobile: ElementNode

Mobile is where it gets more constrained. Storyie's Expo editor runs Lexical inside a "use dom" WebView. The WebView provides a DOM, but it does not run the same React reconciler that drives DecoratorNode.decorate() in the main Lexical tree.

The solution is ElementNode. Instead of returning React JSX, we build the image container imperatively:

class ImageNode extends ElementNode {
  createDOM(_config: EditorConfig): HTMLElement {
    const container = document.createElement("div");
    const img = document.createElement("img");
    img.src = this.__src;
    // apply styles
    container.appendChild(img);

    if (this.__uploadStatus === "uploading") {
      container.appendChild(this.createProgressBar());
    }
    return container;
  }
}

Progress bars, error overlays, and retry buttons are all built by hand as DOM nodes. It is more verbose than the React path and updateDOM() requires manual diffing to avoid unnecessary DOM mutations — but it runs reliably in any WebView environment.

Shared serialization

The node classes differ. The JSON format does not.

// packages/lexical-common/src/types.ts
interface SerializedImageNode {
  type: "image";
  version: 1;
  src: string;           // local URI while uploading, CDN URL after
  alt?: string;
  width?: number;
  height?: number;
  caption?: string;
  uploadStatus?: ImageUploadStatus;
  uploadProgress?: number;
  tempId?: string;       // upload correlation ID
}

Both the web ImageNode and the mobile ImageNode import this type and implement exportJSON() / importJSON() against it. TypeScript enforces that they stay in sync. A diary serialized on web opens correctly on mobile, and vice versa, because the wire format is a shared contract — not a convention.

The package layout that makes this possible:

packages/
  lexical-common/        # platform-agnostic
    src/types.ts         # SerializedImageNode, ImageInsertPayload, ImageUpdatePayload
    src/commands/        # INSERT_IMAGE_COMMAND, UPDATE_IMAGE_COMMAND, DELETE_IMAGE_COMMAND, RETRY_IMAGE_COMMAND
    src/markdown.ts      # Lexical JSON → Markdown (image-aware)
  lexical-editor/        # web only
    src/nodes/ImageNode.ts
    src/components/ImageRenderer.web.tsx
    src/plugins/ImagePlugin.tsx
apps/
  expo/
    components/lexical/dom/editor/
      nodes/ImageNode.dom.ts

Commands as the abstraction layer

Image operations are dispatched through Lexical's command system, defined in lexical-common:

export const INSERT_IMAGE_COMMAND = createCommand("INSERT_IMAGE");
export const UPDATE_IMAGE_COMMAND = createCommand("UPDATE_IMAGE");
export const DELETE_IMAGE_COMMAND = createCommand("DELETE_IMAGE");
export const RETRY_IMAGE_COMMAND = createCommand("RETRY_IMAGE");

Any code that inserts an image — a toolbar button, a drag-and-drop handler, the upload callback — calls editor.dispatchCommand(INSERT_IMAGE_COMMAND, payload) without knowing anything about the node implementation. The node's command handler is registered at editor setup time and is platform-specific. The dispatch site is not.

Upload lifecycle

The image node carries its upload state directly. The lifecycle looks like this:

User selects image
  → INSERT_IMAGE_COMMAND (src: file://, status: "uploading", tempId)
  → local preview appears immediately + progress bar renders
  → upload runs in background
  → UPDATE_IMAGE_COMMAND (uploadProgress: 0.6)
  → UPDATE_IMAGE_COMMAND (src: https://…, status: "completed")
  → final image renders

Upload fails
  → UPDATE_IMAGE_COMMAND (status: "failed")
  → error overlay + retry button appear
  → RETRY_IMAGE_COMMAND → back to "uploading" state

The key UX decision is showing the local preview immediately at insert time. The user sees their image in the editor the moment they pick it, regardless of network conditions. The upload happens behind that preview.

The tempId pattern

Lexical assigns node keys internally — you cannot specify one from outside. That creates a problem for the upload callback: when the upload finishes, how does it know which node to update?

tempId is the answer. At insert time, the caller generates a correlation ID:

async function insertImageWithUpload(editor, asset, uploadFn) {
  const tempId = `temp_${Date.now()}_${Math.random().toString(36).substring(7)}`;

  editor.dispatchCommand(INSERT_IMAGE_COMMAND, {
    src: asset.uri,
    uploadStatus: "uploading",
    tempId,
  });

  try {
    const finalUrl = await uploadFn(asset);
    // walk editor state, find node where __tempId === tempId
    // dispatch UPDATE_IMAGE_COMMAND with { tempId, src: finalUrl, status: "completed" }
  } catch {
    // dispatch UPDATE_IMAGE_COMMAND with { tempId, status: "failed" }
  }
}

The tempId travels with the node and is serialized to JSON (useful if the user saves mid-upload and reopens). The upload function holds a closure over tempId and uses it to route the result to the correct node. Without this, concurrent uploads — common when pasting multiple images — have no reliable way to update the right node.

importDOM / exportDOM

Both ImageNode implementations include importDOM and exportDOM. These handle clipboard interop: copying from a web page and pasting into the editor, or copying from the editor and pasting into another app as HTML.

static importDOM(): DOMConversionMap | null {
  return {
    img: (_node: Node) => ({
      conversion: convertImageElement,
      priority: 0,
    }),
  };
}

When Lexical encounters an <img> element during a paste, convertImageElement constructs an ImageNode from it. That is the entirety of "paste images from a web page" — the DOM conversion handles the rest.

Markdown export without the headless editor

Storyie can export diary entries as Markdown. The conversion from Lexical JSON to Markdown runs server-side (and sometimes inside AI pipelines), so it has to work without a DOM.

The naive approach would be to spin up a headless Lexical editor, register all nodes, and call $generateMarkdownWithHeuristics. The problem: ImageNode is platform-specific. Registering the web version server-side pulls in React DOM. Registering neither means Lexical drops image nodes silently as unknown types.

We sidestep this by parsing the serialized JSON directly rather than going through a headless editor:

function convertImage(node: LexicalNode): string {
  const alt = node.alt ?? "";
  const src = node.src ?? "";
  return `![${alt}](${src})`;
}

No node registration, no DOM globals, no framework imports. The conversion function receives the raw JSON object and returns a Markdown string. It works in Node, in a serverless function, in a Bun script — anywhere.

What we would do differently

What worked well

Locking the serialization format first. The web and mobile implementations can evolve independently as long as they read and write the same JSON. We defined SerializedImageNode in lexical-common before writing a single line of node code. Getting this wrong after the fact would have required a migration.

Commands as the insert/update API. Callers dispatch a command and do not care how the node handles it. This kept the upload logic, the toolbar, and the node implementation decoupled. Swapping the node implementation on one platform (we changed the mobile renderer once) required zero changes to the upload code.

tempId. It is a minor addition to the serialized shape, but it is load-bearing. Without it, concurrent uploads become non-trivially hard to route correctly.

What was harder than expected

DOM-based UI in the mobile ElementNode. Implementing a progress bar, an error overlay, and a retry button as raw DOM operations is tedious. Writing updateDOM() to diff between the old and new node state — without a virtual DOM — is especially error-prone. The React path on web is significantly more comfortable.

ImageNode staying out of lexicalCommonNodes. Because ImageNode is platform-specific, it is not in the shared node list exported from lexical-common. That means every place that needs to handle images — Markdown export, the read-only viewer, the AI prompt builder — has to account for image nodes as a special case rather than treating them as first-class registered nodes. The JSON-parse approach handles this, but it is an extra convention to maintain.

Takeaways

The core design principle: split what has to be split, share what must not drift.

  • Types, commands, serialization formatlexical-common. One source of truth, imported by both platforms.
  • Node class and renderer → per platform. DecoratorNode + React on web; ElementNode + DOM on mobile.
  • Markdown export → JSON directly. Headless Lexical would require node registration that drags in platform dependencies.

Cross-platform rich-text with images is tractable when you are clear about which layer each concern belongs to. The wire format is the one thing that cannot diverge. Everything else is implementation detail.

Related Posts

Try Storyie

If you want to see the result from the user side: write a diary with photos on the web at storyie.com and open it on the iOS app. Same content, same images, same formatting. The platform split is invisible from the outside, which is exactly what it should be.