[IIAB] Gutenberg epubs and html

Emmanuel Engelhart kelson at kiwix.org
Fri Jun 21 14:26:11 PDT 2013


Hi Braddock

Le 21/06/2013 22:04, Braddock a écrit :
> I finally got around to distilling the Gutenberg collection.  I've
> extracted all 40,000 books in epub format, and also converted them to
> zipped html ("htmlz").  I've made collections both with and without images.
> 
> The result is four collections (sizes in gigabytes):
> 6.9G    gutenberg-htmlz
> 23G    gutenberg-htmlz-images
> 7.0G    gutenberg-epub
> 23G    gutenberg-epub-images
> 
> There is no metadata in these collections, just the books.  We could
> generate some meta using our database.  I'm not sure what you would need
> to make a usable zim.
> 
> We are probably just going to keep them in this format (instead of
> zimmifying them all) for the Internet-in-a-Box.
> 
> I can make these available via torrents.

This is a really great news! I can not wait to see how it looks like.
So, yes, a torrent URL would be perfect for the download.

> Other news, I wrote a from-scratch pure python ZIM file reader I'm
> calling "zimpy".
> https://github.com/braddockcg/internet-in-a-box/blob/master/iiab/zimpy.py
> 
> I'm now using the zimpy code for reading zims for Internet-in-a-Box, and
> if it gets a bit more mature I'll release it as a separate project.  It
> doesn't currently do anything more than I need.  The existing openzim
> bindings did not support any read capability.

This is great to see you were able to make your own ZIM reader. I have
had it to our list:
http://www.openzim.org/wiki/Readers#Without_user_interface_.2F_Console

Regards
Emmanuel
-- 
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication

  * Arabe - détecté
  * Anglais
  * Arabe
  * Espagnol
  * Portugais
  * Français

  * Anglais
  * Arabe
  * Espagnol
  * Portugais
  * Français

 <javascript:void(0);> <#>



More information about the IIAB mailing list