[IIAB] gutenberg files

Braddock braddock at braddock.com
Thu Mar 14 16:07:51 PDT 2013


Hi Joel,

The target I have in mind is a user on a cheap tablet or OLPC laptop 
with only a basic web browser, maybe even a wifi enabled phone.  Perhaps 
not even Javascript, so we should degrade gracefully (as jQuery Mobile 
does).

I don't want to assume any e-book reader software.  For a first pass we 
should provide the ebooks in straight html or text.  The ebook formats 
may be of interest if we can convert them to HTML with images (which 
does not seem to be available in the cache). Obviously we could add 
e-book formats later fairly easily.

-braddock


On 03/14/2013 02:04 PM, Joel Steres wrote:
> Hi Braddock,
>
> What formats need is a question/discussion about the vision for iiab.
> The indexed cache contents consist of 243k files in the following
> formats.
>
> application/epub+zip
> application/pdf
> application/prs.plucker
> application/x-mobipocket-ebook
> application/x-qioo-ebook
> image/jpeg
> text/html
> text/plain
> text/plain; charset="utf-8"
>
> (I have not looked but it is possible that some cached content is
> referenced by non-cached files perhaps not represented above.)
>
> Who is the user?  I don't have good insight into the intended user
> base.  Might ebooks be valuable to them?  I think the generated text,
> html and probably images are an obvious choice for inclusions.  Ebook
> formats seem kind of cool, but I can't assess their value without
> knowing more about the audience.
>
> -Joel
>
>
> On Thu, Mar 14, 2013 at 1:37 PM, Braddock <braddock at braddock.com> wrote:
>> Hi Joel,
>>
>> So is cache/generated something you want or need?  If so I'll complete the
>> mirror (ibiblio kicks my script off for a while every few gigabytes so it
>> might easily take a week).
>>
>> It will be much much faster if we don't need to download the .epub and .mobi
>> files (which contain images).
>>
>> -braddock
>>
>>
>> On 03/13/2013 10:46 PM, Joel Steres wrote:
>>> The gutenberg db and index have been removed from the repository.
>>>
>>> On closer inspection it does look like the cache/generated content is
>>> the content the index shows under cache/epub.  The problems I
>>> encountered are probably just due to the incomplete mirroring.
>>>
>>> Hope you're doing well.
>>>
>>> -Joel
>>>
>>>
>>> On Tue, Mar 12, 2013 at 8:04 PM, Braddock <braddock at braddock.com> wrote:
>>>> Hi Joel,
>>>>
>>>> Thanks for your activity.  I haven't been able to keep completely up the
>>>> last few days.
>>>>
>>>> I mirrored some of cache/generated to another server using:
>>>> rsync -avHS --delete --delete-after ftp.ibiblio.org::gutenberg-epub
>>>> generated
>>>>
>>>> I've copied that incomplete download (only 5.7 GB) to zhen in
>>>> /knowledge/data/gutenberg/cached now.
>>>>
>>>> If you want a symlink from within static/ that is fine with me.
>>>>
>>>> I've seen no sign of a cache/epub/ directory.
>>>>
>>>> I've been trying to keep the path /knowledge universal across devices
>>>> (zhen,
>>>> the Satellite, the GoFlex Home, and my personal server) so links into it
>>>> should work anywhere.
>>>>
>>>> On a side note, the 100MB gutenberg.db should probably not be in the git
>>>> repo.  I'd prefer if it lived under /knowledge/processed/, which is where
>>>> I'm keeping all processed data.
>>>>
>>>> I hope to have some time to get back into IIAB in the next couple days.
>>>> We
>>>> had the funeral today, so things should begin to return to normal.
>>>>
>>>> -braddock
>>>>
>>>>
>>>>
>>>>
>>>> On 03/12/2013 09:58 AM, Joel Steres wrote:
>>>>> Hi Braddock,
>>>>>
>>>>>> I am also mirroring cache/generated - the gutenberg mirrors seem to
>>>>>> block
>>>>>> access to it via ftp etc, but I can get it via rsync. Maybe those files
>>>>>> will
>>>>>> be more consistent.
>>>>> Thanks for mirroring cache/generated. In the current catalog all files
>>>>> referencing 'cache' point to cache/epub/... rather than
>>>>> cache/generated/ and the contents of the two paths differ. I looked at
>>>>> the rsync script from git but it does not seem to include the addition
>>>>> for gutenberg.org/cache mirroring.  Could you either make the
>>>>> adjustment or show me where to do so?
>>>>>
>>>>> Also, I found that html files include images.  It might be easier to
>>>>> put the gutenberg files into the flask static directory and permit the
>>>>> existing paths to work.  No objections if I symlink to
>>>>> /knowledge/data/gutenberg/gutenberg/ from iiab/static/gutenberg/data/?
>>>>>
>>>>> -Joel
>>>>
>>>>
>>




More information about the IIAB mailing list