Convert Web Article to EPUB
I went down a bit of a rabbit hole today and I came out the other side knowing how to convert web articles into EPUB files, making them easy to read on my mobile devices.
The four main libraries utilized are:
- @mozilla/readability - A standalone version of the readability library used for Firefox Reader View.
- got - HTTP request library
- jsdom - A JavaScript implementation of various web standards, for use with Node.js
- epub-gen - Generate EPUB books from HTML with simple API in Node.js.
The Code
Install dependencies
npm install @mozilla/readability epub-gen got jsdom
npm install --save-dev @types/got
We use got
to call the web page we are interested in converting to EPUB. In this case, I am getting Paul Graham's famous Ramen Profitable article.
import got from 'got'
got('https://paulgraham.com/ramenprofitable.html').then(response => {
console.log(response)
})
This returns the content as a massive string. We can use JSDOM
to convert it back to a web-like dom object, allowing us to easily interact with the code on the server.
import got from 'got'
import { JSDOM } from 'jsdom'
got('https://paulgraham.com/ramenprofitable.html').then(response => {
const dom = new JSDOM(response.body)
})
With the dom
in place, we're able to create a Readability object:
import got from 'got'
import { JSDOM } from 'jsdom'
got('https://paulgraham.com/ramenprofitable.html').then(response => {
const dom = new JSDOM(response.body)
const reader = new Readability(dom.window.document)
const article = reader.parse()
})
This allows us to parse the content, extracting information such as the article's title and the article's content.
import got from 'got'
import { JSDOM } from 'jsdom'
got('https://paulgraham.com/ramenprofitable.html').then(response => {
const dom = new JSDOM(response.body)
const reader = new Readability(dom.window.document)
const article = reader.parse()
const article = reader.parse()
if (article) {
const option = {
title: article.title,
author: article.siteName,
content: [
{
title: article.title,
data: article.content,
},
],
}
}
})
Now we can pass the formatted data it into epub-gen to output out .epub
file!
import { Readability } from '@mozilla/readability'
import got from 'got'
import { JSDOM } from 'jsdom'
import Epub from 'epub-gen'
got('https://paulgraham.com/ramenprofitable.html').then(response => {
const dom = new JSDOM(response.body)
const domDocument = dom.window.document
const reader = new Readability(domDocument)
const article = reader.parse()
if (article) {
const option = {
title: article.title,
author: article.siteName,
content: [
{
title: article.title,
data: article.content,
},
],
}
new Epub(option, './output.epub')
}
})