[IIAB] Gutenberg epubs and html
Emmanuel Engelhart
kelson at kiwix.org
Fri Jun 21 14:26:11 PDT 2013
Hi Braddock
Le 21/06/2013 22:04, Braddock a écrit :
> I finally got around to distilling the Gutenberg collection. I've
> extracted all 40,000 books in epub format, and also converted them to
> zipped html ("htmlz"). I've made collections both with and without images.
>
> The result is four collections (sizes in gigabytes):
> 6.9G gutenberg-htmlz
> 23G gutenberg-htmlz-images
> 7.0G gutenberg-epub
> 23G gutenberg-epub-images
>
> There is no metadata in these collections, just the books. We could
> generate some meta using our database. I'm not sure what you would need
> to make a usable zim.
>
> We are probably just going to keep them in this format (instead of
> zimmifying them all) for the Internet-in-a-Box.
>
> I can make these available via torrents.
This is a really great news! I can not wait to see how it looks like.
So, yes, a torrent URL would be perfect for the download.
> Other news, I wrote a from-scratch pure python ZIM file reader I'm
> calling "zimpy".
> https://github.com/braddockcg/internet-in-a-box/blob/master/iiab/zimpy.py
>
> I'm now using the zimpy code for reading zims for Internet-in-a-Box, and
> if it gets a bit more mature I'll release it as a separate project. It
> doesn't currently do anything more than I need. The existing openzim
> bindings did not support any read capability.
This is great to see you were able to make your own ZIM reader. I have
had it to our list:
http://www.openzim.org/wiki/Readers#Without_user_interface_.2F_Console
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
* Arabe - détecté
* Anglais
* Arabe
* Espagnol
* Portugais
* Français
* Anglais
* Arabe
* Espagnol
* Portugais
* Français
<javascript:void(0);> <#>
More information about the IIAB
mailing list