[IIAB] [KIWIX][INTERNET-IN-A-BOX] Nice project!

Tue Mar 26 13:56:07 PDT 2013

Hi Emmanuel,

I'm preparing to do a dump of gutenberg to epub and converted html files 
that you could use to build a ZIM. (may or may not get it done this week 
however)

How should I handle metadata for you to create a zim?  Title, author, 
illustrator, year, language, etc?  Is it possible to search on those fields?

By default the gutenberg files do not have interesting file names 
(pg30532-image.epub for example).  The meta data is stored in the obtuse 
catalog.rdf XML file, indexed by the id number of the text.  We have all 
this parsed and broken out into a SQLite database for our own use.

thanks,
braddock

On 03/26/2013 10:22 AM, Emmanuel Engelhart wrote:
> Hi Braddock
>
> On 03/26/2013 03:01 PM, Braddock wrote:
>> Hey great! I've been meaning to reach out to you Kiwix guys. I think we
>> chatted on IRC a couple weeks ago briefly (dock).
>
> Oh! Sorry I didn't made the match!
>
>> I have been particularly interested in making a ZIM of gutenberg. We
>> have our own early stage gutenberg browser now, but ZIM and kiwix just
>> seems like the natural place for it, and I'm sure it would be useful to
>> others.
>
> Great! Let me now at the moment you have something which can be used 
> by a normal user, then I will do ASAP the corresponding ZIM file.
>
>> We are in a mad rush right now because Tony Anderson of OLPC wants to
>> take what we have to Africa in less than two weeks. He has been using
>> kiwix for OLPC server deployments to Africa and Nepal, as you may know.
>
> That's great, I met Tony a couple of time at the LinuxTag. Nice to see 
> he finally achieved to run kiwix-serve on the OLPC server (we have had 
> some difficulties last time with a really old version of the OLPC 
> server).
>
>> The Gutenberg project is a real mess. But they already have scripts to
>> generate epub format output for all their books, and I thought the
>> simplest path to a Gutenberg ZIM would be to Calibre the epub back into
>> HTML with scaled images contained in ZIM.
>
> Yes, this is certainly a good way... or integrate a JS EPUB reader 
> like "monocle". I have done two weeks ago a first ZIM files of 600 
> books from wikisource with both (HTML&EPUB) inside. I think, this is 
> the best approach, although you duplicate your data volume.
>
>> However, I haven't been able to find any documentation on creating ZIM
>> files from HTML. Do you have a starting place for me?
>
> This is a mess too... But if you give me a directory with a content, I 
> will download it and do the corresponding ZIM file. The best we have 
> currently is a VM with the necessary preinstalled tools:
> http://download.kiwix.org/dev/ZIMmakerVMv3.ova
>
> We really try to fix this problem and I'm pretty confident we will 
> achieve to do it soon.
>
>> Another thing we really need is combined search of all document ZIMs
>> under Kiwix control with language filters. This is perhaps something we
>> could contribute back to kiwix if it doesn't already exist.
>
> yes, this an open feature request on our side... one of the problems 
> with Kiwix is that the soft is currently not thought to deal with many 
> contents at the same time.
>
> For kiwix-serve this should be easier to implement that based on an 
> iteration over all indexes (we have one search index per ZIM file).
>
>> I haven't dug deep into the kiwix code, but will be soon.
>>
>> I am CC'ing our mailing list (there are about six of us). You are also
>> welcome to join.
>> http://sgvhak.org/mailman/listinfo/iiab
>
> I have done it.
>
> Kind regards
> Emmanuel