[IIAB] Gutenberg epubs and html

Emmanuel Engelhart kelson at kiwix.org
Sun Jun 30 05:32:25 PDT 2013


Dear Braddock

I'm more interested in gutenberg-htmlz-images than in gutenberg-epub.
The reason is that ZIM files should be directly usable without using an
additional EPUB reader.

The best case would be to have both in one ZIM file (I have no big
problem with the redundancy) to give the choice to the reader.

Do you think you can put this file instead the epub one online? If you
need some place to put that online, I can probably help you.

Kind regards
Emmanuel

Thank you for your feedback.
I'm mostly interested in the
Le 29/06/2013 23:55, Braddock a écrit :
> Hi Emmanuel,
> I am seeding a torrent of all the gutenberg epubs without images.
> 
> http://braddock.com/~braddock/expire/gutenberg-epub-201306.torrent
> 
> I have very limited upload bandwidth.  Lets start with the epubs
> without images and see how it goes uploading across my DSL.  I've also
> got epubs with images, html with and without images.
> 
> torrent size is about 7GB.
> 
> -Braddock
> 
> 
> On 06/21/2013 02:26 PM, Emmanuel Engelhart wrote:
>> Hi Braddock
> 
>> Le 21/06/2013 22:04, Braddock a écrit :
>>> I finally got around to distilling the Gutenberg collection.
>>> I've extracted all 40,000 books in epub format, and also
>>> converted them to zipped html ("htmlz").  I've made collections
>>> both with and without images.
>>>
>>> The result is four collections (sizes in gigabytes): 6.9G
>>> gutenberg-htmlz 23G    gutenberg-htmlz-images 7.0G
>>> gutenberg-epub 23G    gutenberg-epub-images
>>>
>>> There is no metadata in these collections, just the books.  We
>>> could generate some meta using our database.  I'm not sure what
>>> you would need to make a usable zim.
>>>
>>> We are probably just going to keep them in this format (instead
>>> of zimmifying them all) for the Internet-in-a-Box.
>>>
>>> I can make these available via torrents.
> 
>> This is a really great news! I can not wait to see how it looks
>> like. So, yes, a torrent URL would be perfect for the download.
> 
>>> Other news, I wrote a from-scratch pure python ZIM file reader
>>> I'm calling "zimpy". 
>>> https://github.com/braddockcg/internet-in-a-box/blob/master/iiab/zimpy.py
>>>
>>>
>>>
> I'm now using the zimpy code for reading zims for Internet-in-a-Box, and
>>> if it gets a bit more mature I'll release it as a separate
>>> project.  It doesn't currently do anything more than I need.  The
>>> existing openzim bindings did not support any read capability.
> 
>> This is great to see you were able to make your own ZIM reader. I
>> have had it to our list: 
>> http://www.openzim.org/wiki/Readers#Without_user_interface_.2F_Console
> 
>>  Regards Emmanuel
> 
> 
> 

-- 
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication



More information about the IIAB mailing list