[IIAB] gutenberg cached files

Braddock braddock at braddock.com
Fri Mar 1 09:07:42 PST 2013


Hi Joel,

Those files do not exist because I had rsync exclude them.

Gutenberg has an ENORMOUS quantity of duplicated content in various 
forms (including mp3 of speech synthesizers reading books!).  It is like 
they were intentionally trying to bloat the archive.  It is a mess.  My 
assumption was we would stick to text or html formats.

-braddock

On 03/01/2013 09:02 AM, Joel Steres wrote:
> Greetings,
>
> The Gutenberg project references a lot of "generated" files which are
> not in the ibiblio mirror we use.  The mirror just contains a broken
> symlink at cache/generated.  Much of it appears to be various ebook
> formats.  I will continue based on the assumption that cached content
> is unavailable and so I will filter it out.  If we should try to
> obtain/include the cached content let me know.
>
> Joel




More information about the IIAB mailing list