[IIAB] [KIWIX][INTERNET-IN-A-BOX] Nice project!

Emmanuel Engelhart kelson at kiwix.org
Tue Mar 26 14:50:26 PDT 2013


Hi Braddock

On 03/26/2013 09:56 PM, Braddock wrote:
> I'm preparing to do a dump of gutenberg to epub and converted html files
> that you could use to build a ZIM. (may or may not get it done this week
> however)

This is on my side really not urgent! I don't want to put you under 
pressure :)

> How should I handle metadata for you to create a zim? Title, author,
> illustrator, year, language, etc? Is it possible to search on those fields?

Not sure exactly how to understand this question. This is maybe because 
we have a misunderstood. My wish is to have one ZIM file with all the 
books of Gutemberg. So, I don't see a big problem with the ZIM file 
metadata. My proposition would be: publisher=yourprojectname, 
creator=Project Gutenberg, language is a big challenge because you have 
books in many languages. Maybe it's better to make a ZIM file per 
language? With the Kiwix fulltext search engine, it should be pretty 
easy to find stuff. With good index pages, this should also help... cf. 
my next comment.

> By default the gutenberg files do not have interesting file names
> (pg30532-image.epub for example). The meta data is stored in the obtuse
> catalog.rdf XML file, indexed by the id number of the text. We have all
> this parsed and broken out into a SQLite database for our own use.

This is great. The most important is to have a good HTML title tag and 
if possible a meta tag with good keywords. Do you plan to create custom 
HTML indexes with these data (like lists of book per author, language, 
title)?

Kind regards
Emmanuel



More information about the IIAB mailing list