Skip to content

Convert Web Article to EPUB

I went down a bit of a rabbit hole today and I came out the other side knowing how to convert web articles into EPUB files, making them easy to read on my mobile devices.

The four main libraries utilized are:

  • @mozilla/readability - A standalone version of the readability library used for Firefox Reader View.
  • got - HTTP request library
  • jsdom - A JavaScript implementation of various web standards, for use with Node.js
  • epub-gen - Generate EPUB books from HTML with simple API in Node.js.

The Code

Install dependencies

bash
npm install @mozilla/readability epub-gen got jsdom
npm install --save-dev @types/got

We use got to call the web page we are interested in converting to EPUB. In this case, I am getting Paul Graham's famous Ramen Profitable article.

ts
import got from 'got'

got('https://paulgraham.com/ramenprofitable.html').then(response => {
	console.log(response)
})

This returns the content as a massive string. We can use JSDOM to convert it back to a web-like dom object, allowing us to easily interact with the code on the server.

ts
import got from 'got'
import { JSDOM } from 'jsdom'

got('https://paulgraham.com/ramenprofitable.html').then(response => {
	const dom = new JSDOM(response.body)
})

With the dom in place, we're able to create a Readability object:

ts
import got from 'got'
import { JSDOM } from 'jsdom'

got('https://paulgraham.com/ramenprofitable.html').then(response => {
	const dom = new JSDOM(response.body)
	const reader = new Readability(dom.window.document)
	const article = reader.parse()
})

This allows us to parse the content, extracting information such as the article's title and the article's content.

ts
import got from 'got'
import { JSDOM } from 'jsdom'

got('https://paulgraham.com/ramenprofitable.html').then(response => {
	const dom = new JSDOM(response.body)
	const reader = new Readability(dom.window.document)
	const article = reader.parse()
	const article = reader.parse()

	if (article) {
		const option = {
		  title: article.title,
		  author: article.siteName,
		  content: [
			{
			  title: article.title,
			  data: article.content,
			},
		  ],
		}
	}
})

Now we can pass the formatted data it into epub-gen to output out .epub file!

ts
import { Readability } from '@mozilla/readability'
import got from 'got'
import { JSDOM } from 'jsdom'
import Epub from 'epub-gen'

got('https://paulgraham.com/ramenprofitable.html').then(response => {
  const dom = new JSDOM(response.body)
  const domDocument = dom.window.document
  const reader = new Readability(domDocument)
  const article = reader.parse()

  if (article) {
    const option = {
      title: article.title,
      author: article.siteName,
      content: [
        {
          title: article.title,
          data: article.content,
        },
      ],
    }

    new Epub(option, './output.epub')
  }
})

Resources