[IIAB] gutenberg files

Braddock braddock at braddock.com
Thu Mar 14 13:37:15 PDT 2013


Hi Joel,

So is cache/generated something you want or need?  If so I'll complete 
the mirror (ibiblio kicks my script off for a while every few gigabytes 
so it might easily take a week).

It will be much much faster if we don't need to download the .epub and 
.mobi files (which contain images).

-braddock

On 03/13/2013 10:46 PM, Joel Steres wrote:
> The gutenberg db and index have been removed from the repository.
>
> On closer inspection it does look like the cache/generated content is
> the content the index shows under cache/epub.  The problems I
> encountered are probably just due to the incomplete mirroring.
>
> Hope you're doing well.
>
> -Joel
>
>
> On Tue, Mar 12, 2013 at 8:04 PM, Braddock <braddock at braddock.com> wrote:
>> Hi Joel,
>>
>> Thanks for your activity.  I haven't been able to keep completely up the
>> last few days.
>>
>> I mirrored some of cache/generated to another server using:
>> rsync -avHS --delete --delete-after ftp.ibiblio.org::gutenberg-epub
>> generated
>>
>> I've copied that incomplete download (only 5.7 GB) to zhen in
>> /knowledge/data/gutenberg/cached now.
>>
>> If you want a symlink from within static/ that is fine with me.
>>
>> I've seen no sign of a cache/epub/ directory.
>>
>> I've been trying to keep the path /knowledge universal across devices (zhen,
>> the Satellite, the GoFlex Home, and my personal server) so links into it
>> should work anywhere.
>>
>> On a side note, the 100MB gutenberg.db should probably not be in the git
>> repo.  I'd prefer if it lived under /knowledge/processed/, which is where
>> I'm keeping all processed data.
>>
>> I hope to have some time to get back into IIAB in the next couple days.  We
>> had the funeral today, so things should begin to return to normal.
>>
>> -braddock
>>
>>
>>
>>
>> On 03/12/2013 09:58 AM, Joel Steres wrote:
>>> Hi Braddock,
>>>
>>>> I am also mirroring cache/generated - the gutenberg mirrors seem to block
>>>> access to it via ftp etc, but I can get it via rsync. Maybe those files
>>>> will
>>>> be more consistent.
>>> Thanks for mirroring cache/generated. In the current catalog all files
>>> referencing 'cache' point to cache/epub/... rather than
>>> cache/generated/ and the contents of the two paths differ. I looked at
>>> the rsync script from git but it does not seem to include the addition
>>> for gutenberg.org/cache mirroring.  Could you either make the
>>> adjustment or show me where to do so?
>>>
>>> Also, I found that html files include images.  It might be easier to
>>> put the gutenberg files into the flask static directory and permit the
>>> existing paths to work.  No objections if I symlink to
>>> /knowledge/data/gutenberg/gutenberg/ from iiab/static/gutenberg/data/?
>>>
>>> -Joel
>>
>>




More information about the IIAB mailing list