Home > Others > Converting Plain Text To Encoded HTML With Vanilla JavaScript

Converting Plain Text To Encoded HTML With Vanilla JavaScript

April 17th, 2024 Leave a comment Go to comments

When copying text from a website to your device’s clipboard, there’s a good chance that you will get the formatted HTML when pasting it. Some apps and operating systems have a “Paste Special” feature that will strip those tags out for you to maintain the current style, but what do you do if that’s unavailable?

Same goes for converting plain text into formatted HTML. One of the closest ways we can convert plain text into HTML is writing in Markdown as an abstraction. You may have seen examples of this in many comment forms in articles just like this one. Write the comment in Markdown and it is parsed as HTML.

Even better would be no abstraction at all! You may have also seen (and used) a number of online tools that take plainly written text and convert it into formatted HTML. The UI makes the conversion and previews the formatted result in real time.

Providing a way for users to author basic web content — like comments — without knowing even the first thing about HTML, is a novel pursuit as it lowers barriers to communicating and collaborating on the web. Saying it helps “democratize” the web may be heavy-handed, but it doesn’t conflict with that vision!

We can build a tool like this ourselves. I’m all for using existing resources where possible, but I’m also for demonstrating how these things work and maybe learning something new in the process.

Defining The Scope

There are plenty of assumptions and considerations that could go into a plain-text-to-HTML converter. For example, should we assume that the first line of text entered into the tool is a title that needs corresponding

tags? Is each new line truly a paragraph, and how does linking content fit into this?

Again, the idea is that a user should be able to write without knowing Markdown or HTML syntax. This is a big constraint, and there are far too many HTML elements we might encounter, so it’s worth knowing the context in which the content is being used. For example, if this is a tool for writing blog posts, then we can limit the scope of which elements are supported based on those that are commonly used in long-form content:

,

, , and . In other words, it will be possible to include top-level headings, body text, linked text, and images. There will be no support for bulleted or ordered lists, tables, or any other elements for this particular tool.

The front-end implementation will rely on vanilla HTML, CSS, and JavaScript to establish a small form with a simple layout and functionality that converts the text to HTML. There is a server-side aspect to this if you plan on deploying it to a production environment, but our focus is purely on the front end.

Looking At Existing Solutions

There are existing ways to accomplish this. For example, some libraries offer a WYSIWYG editor. Import a library like TinyMCE with a single and you’re good to go. WYSIWYG editors are powerful and support all kinds of formatting, even applying CSS classes to content for styling.

But TinyMCE isn’t the most efficient package at about 500 KB minified. That’s not a criticism as much as an indication of how much functionality it covers. We want something more “barebones” than that for our simple purpose. Searching GitHub surfaces more possibilities. The solutions, however, seem to fall into one of two categories:

  • The input accepts plain text, but the generated HTML only supports the HTML

    and

    tags.

  • The input converts plain text into formatted HTML, but by ”plain text,” the tool seems to mean “Markdown” (or a variety of it) instead. The txt2html Perl module (from 1994!) would fall under this category.

Even if a perfect solution for what we want was already out there, I’d still want to pick apart the concept of converting text to HTML to understand how it works and hopefully learn something new in the process. So, let’s proceed with our own homespun solution.

Setting Up The HTML

We’ll start with the HTML structure for the input and output. For the input element, we’re probably best off using a