[IIAB] gutenberg cached files
Braddock
braddock at braddock.com
Fri Mar 1 09:07:42 PST 2013
Hi Joel,
Those files do not exist because I had rsync exclude them.
Gutenberg has an ENORMOUS quantity of duplicated content in various
forms (including mp3 of speech synthesizers reading books!). It is like
they were intentionally trying to bloat the archive. It is a mess. My
assumption was we would stick to text or html formats.
-braddock
On 03/01/2013 09:02 AM, Joel Steres wrote:
> Greetings,
>
> The Gutenberg project references a lot of "generated" files which are
> not in the ibiblio mirror we use. The mirror just contains a broken
> symlink at cache/generated. Much of it appears to be various ebook
> formats. I will continue based on the assumption that cached content
> is unavailable and so I will filter it out. If we should try to
> obtain/include the cached content let me know.
>
> Joel
More information about the IIAB
mailing list