[IIAB] Large indices on 32-bit machines

Braddock braddock at braddock.com
Mon Sep 30 13:35:35 PDT 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Whoosh community,

We are trying to build large Whoosh indices (all of Wikipedia) and use
them on 32-bit machines.

Currently, we get an mmap2 error on 32-bit machines when attempt to
search because Whoosh tries to mmap the entire index file, which hits
the signed 32-bit address space limit for the process.  Our index
files are between 2GB and 8GB.

The strace and stack trace of the error is below.

We generated the index files on a 64-bit machine.

Is it possible to use large indices on 32-bit machines?  Can they
somehow be split up?  Or are we doing something wrong?

This is for the Internet-in-a-Box project, if you are interested in
getting content to the developing world: http://internet-in-a-box.org

Thanks,
Braddock


strace:

18097 mmap2(NULL, 2136145874, PROT_READ, MAP_SHARED, 6, 0) = -1 ENOMEM
(Cannot allocate memory)

The size 2136145874 is just shy of the signed 32-bit limit of
2147483648, and I assume already allocated memory puts the process
over the limit.


Python stack trace:

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1687, in
wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1360, in
full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1358, in
full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1344, in
dispatch_request
return self.view_functionsrule.endpoint
File "/root/repo/iiab/zim_views.py", line 86, in search
(pagination, suggestion) = paginated_search(index_dir, ["title",
"content"], query, page)
File "/root/repo/iiab/whoosh_search.py", line 61, in paginated_search
with ix.searcher() as searcher:
File "/usr/lib/python2.7/site-packages/whoosh/index.py", line 322, in
searcher
return Searcher(self.reader(), fromindex=self, **kwargs)
File "/usr/lib/python2.7/site-packages/whoosh/filedb/fileindex.py",
line 334, in reader
info.generation, reuse=reuse)
File "/usr/lib/python2.7/site-packages/whoosh/filedb/fileindex.py",
line 315, in reader
return segreader(segments[0])
File "/usr/lib/python2.7/site-packages/whoosh/filedb/fileindex.py",
line 310, in segreader

generation=generation)
File "/usr/lib/python2.7/site-packages/whoosh/filedb/filereading.py",
line 70, in __init_

self.files = OverlayStorage(segment.open_compound_file(storage),
File "/usr/lib/python2.7/site-packages/whoosh/codec/base.py", line
656, in open_compound_file
return CompoundStorage(storage, name)
File "/usr/lib/python2.7/site-packages/whoosh/filedb/compound.py",
line 58, in init
self.source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
error: [Errno 12] Cannot allocate memory
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSSeCXAAoJEHWLR/DQzlZueN0H/1+kaoP15h2C2kLPQ0wDjSMI
rpV9a/I7JO8kABb5vmt8jl1OSnYl7HG527QtTLkIuA8WG7vxIfPk3VdDDBE7ltRU
s2eLMSGHhN0iUHnkww+PYwPD8F8fp+SphT0IkxP0Pf/1lhEpXh8X5KNYTMRXoNvn
yksq2ZofTNr8CcUQVSLRReEV6gso8R4PG7JqXZIH1b2aciP74BjGyCx3lBsI/mkz
ZBxDJUVVHYOHKMfdf6AppB+2vxJC+gAvJWiJ1bFXKOnOtdeV3m/gx+DF4L/Irj+T
78hFraaqhx8hWWtUf8ngvKBHCrOzxaI0TY+1zTdBtN+88ylwGtofAGo//IPfmyU=
=8s53
-----END PGP SIGNATURE-----



More information about the IIAB mailing list