[IIAB] Gutenberg epubs and html

Braddock braddock at braddock.com
Sat Jun 29 14:55:42 PDT 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Emmanuel,
I am seeding a torrent of all the gutenberg epubs without images.

http://braddock.com/~braddock/expire/gutenberg-epub-201306.torrent

I have very limited upload bandwidth.  Lets start with the epubs
without images and see how it goes uploading across my DSL.  I've also
got epubs with images, html with and without images.

torrent size is about 7GB.

- -Braddock


On 06/21/2013 02:26 PM, Emmanuel Engelhart wrote:
> Hi Braddock
> 
> Le 21/06/2013 22:04, Braddock a écrit :
>> I finally got around to distilling the Gutenberg collection.
>> I've extracted all 40,000 books in epub format, and also
>> converted them to zipped html ("htmlz").  I've made collections
>> both with and without images.
>> 
>> The result is four collections (sizes in gigabytes): 6.9G
>> gutenberg-htmlz 23G    gutenberg-htmlz-images 7.0G
>> gutenberg-epub 23G    gutenberg-epub-images
>> 
>> There is no metadata in these collections, just the books.  We
>> could generate some meta using our database.  I'm not sure what
>> you would need to make a usable zim.
>> 
>> We are probably just going to keep them in this format (instead
>> of zimmifying them all) for the Internet-in-a-Box.
>> 
>> I can make these available via torrents.
> 
> This is a really great news! I can not wait to see how it looks
> like. So, yes, a torrent URL would be perfect for the download.
> 
>> Other news, I wrote a from-scratch pure python ZIM file reader
>> I'm calling "zimpy". 
>> https://github.com/braddockcg/internet-in-a-box/blob/master/iiab/zimpy.py
>>
>>
>> 
I'm now using the zimpy code for reading zims for Internet-in-a-Box, and
>> if it gets a bit more mature I'll release it as a separate
>> project.  It doesn't currently do anything more than I need.  The
>> existing openzim bindings did not support any read capability.
> 
> This is great to see you were able to make your own ZIM reader. I
> have had it to our list: 
> http://www.openzim.org/wiki/Readers#Without_user_interface_.2F_Console
>
>  Regards Emmanuel
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRz1feAAoJEHWLR/DQzlZu6YUH/1zcYAQL1ZlKreU2E0SypbDV
Kf8humSwnPGO7dewsEM0U6WWAtyeTwdPjSMa+8uClcoXSpNhCl+UoYxMsm1aW9pM
r0TVSSDpfGISQm5hseE7f1UVMVg+d33oGJ+5Yl0Pf8GwlVP1ubSI4tj79dKA7zZR
ef0NXDTBZG4NRVPmjh+Jmk5iYsWBnctc8IArlfmtZxcZmTsnqNYVCY3QEq6tWpQY
To9ct1KM6hPf5V0kO/bZcvPSANNVzvw6zsl6RTWV/rsRmUra9oZ/JMwrZiavuUBz
xJg+OzLuPH3AMLbnCRcDFXf1tB1Pa64XtkvaWUHGc0hZYj6MjwWJLigdz/DL4HQ=
=admh
-----END PGP SIGNATURE-----



More information about the IIAB mailing list