Audiobook Factory

The server at the moment is Ip-172-26-13-40 providing conversion for upto ~50KB of text at a time.

This is a quick hack to convert common ebook, image, and text formats into audio books. Using open source OCR and voice synthesis software. Please don't abuse or overload this test service, a giant 100+ page djvu file will destroy the server unless you specify a subset of a few pages covering a chapter.

Please be patient after pressing "Download" as it takes time to process the input and prepare a file for you. Please do not press the "Download" button repeatedly after submitting a job, this will not make anything go faster, and is likely to bring the server to a grinding halt!

A single line of text translates to about a 15-20KiB MP3 file, but the intermediate wave form held in memory is much bigger, so you can imagine massive amounts of text make the server cry. Someone recently uploaded a 50KiB Asimov book in plain text, which produced a 5MiB (and 25 minute long!) MP3, this is the rough upper-limit of what I want to see being uploaded. If anyone wants to offer me some uber-hosting I would be very grateful ;D

$/!\$ Page under construction. Functionality is dependent on page completion, and web hosting status. [↓]

Not available on this host:

java - Executes markov parody text generation.
festival - Synthesizes machine spoken waveform from text input.
gocr - Converts raster text data into vectorized plain-text (OCR) when creating audiobooks.
html2txt - Converts HTML to formatted ASCII text.
djvu-libre - Manipulate djvu file input so that we can extract data for OCR or direct voice synthesis.
pdf-tools - Manipulate PDF file input so that we can extract data for OCR or direct voice synthesis.
pdfistext - Test if a PDF only contains scanned images or has text content.
imagemagick - Converts supported image formats into an intermediate format (PNM) used in OCR.
pstotext - Converts PostScript files into plain ASCII text as if they were being printed.
antiword - Converts older Word (<=2003) .doc files to plain text.
lame - Encodes synthesized voice waveform to lame mp3.
mencoder - Encodes from one mplayer-supported video format to another.
oggenc - Encodes synthesized voice waveform to ogg vorbis.
unrtf - Converts RTF files to ASCII text.
youtube-dl - Downloads a .flv Youtube flash movie, described by a given URL.

Currently available on this host:

perl - Scrapes website/wikipedia online sources when creating audiobooks.
host - I

Current features include:

Various input formats and sources supported: .doc .rtf .html .pdf .djvu .ps .txt and most kinds of images.
Various encoded audio outputs.
Use of completely open source technology: ImageMagick, mencoder, Lame, OGGEnc, GOCR, DjVu-libre, xpdf, Festival, Antiword and many other command-line GNU/Linux tools.

Future features might include:

Dowload plain text file generated by backend processing as an option in the output format instead of audio.
Support for more input formats including OpenOffice.org documents and epubs.
Batch ordering and processing system to let you thrash out whatever volume of data you want!
Multi-language support, problematic with current OCR & text-speech technology.
Automatic language detection and machine translation of source text before audio is generated.
Ability to choose the sex and nationality of the speakers voice, and tweak audio properties.

Create Audiobook

$/!\$ .epub .mobi and .azw not currently supported, but coming soon! $/!\$ Image support not available.
$/!\$ DjVu support not available.
$/!\$ PDF support not available.
$/!\$ PS support not available.
$/!\$ HTML support not available.
$/!\$ DOC support not available.
$/!\$ RTF support not available.

From An E-Book Or File

Upload an ebook, image, or plain-text file to the server and have it converted to audio. Currently supports the following formats: .djvu .pdf .ps .doc .rtf .html .txt .jpg .png and all these image formats supported by ImageMagick.
Uploading images or raster based documents, rather than text based files, will require Optical Character Recognition to reconstruct flat text. This is not perfect, but should still give reasonable results. In future development the postprocessing of the text generated from OCR will be improved.

Input file:

Include pages from: to: Used with PDF and DjVu only.

Encode audio as:

The information provided on this and other pages by me, Matt Oates (mattoates

gmail.com), is under my own personal responsibility and not that of Aberystwyth University or the University of Bristol. Similarly, any opinions expressed are my own and are in no way to be taken as those of either institute.

MattOates.co.uk

Audiobook Factory

Current features include:

Future features might include:

Create Audiobook