Pulling Java Lucene into Python.notes
Friday, March 25, 2005
TITLE OF PAPER: Pulling Java Lucene into Python
URL OF PRESENTATION: _URL_of_powerpoint_presentation_
PRESENTED BY: Andi Vajda
REPRESENTING: Open Source Applications Foundation
CONFERENCE: PyCon 2005
DATE: 3/25/2005
LOCATION: Grand Ballroom
--------------------------------------------------------------------------
REAL-TIME NOTES / ANNOTATIONS OF THE PAPER:
{If you've contributed, add your name, e-mail & URL at the bottom}
Why PyLucene? Needed probabilistic search engine, and Lucene (written
in Java is a good, open source one - but not willing to
use Java
SWIG file contains all interfaces - compiles Java Lucene -
produces giant C++ file, various magic ensues to create a
Python extension
[incomplete diagram]
Java Lucene --> gcj --lecene.o------> PyLucene_wrap.o
| |
| \/
\/ PyLucene.pyd
gcjh makes C++ headers
Impedence mismatches
SWIG thinks you're wrapping C++, but you're wrapping Java, so you have to
match the memory models. Once you pass an obj from Java to Python, Java
quits refcounting it; this has to be worked around. The same problem
manifests when returning an obj via JNI from C++ to Java.
Thread/concurrency models differ. Python is more flexible. Java really
wants to control...so delegate it
He made a class called PythonThread that delegates thread management to
Java so it won't freak out.
This changed PyLucene from being kinda crashy to very stable
Extending Java classes from Python
He made SWIG recognize that Python class X implements a protocol that lets
it be wrapped by a Java class. It's SWIG in reverse: GIWS! ;-) The Java
class then delegates to the Python methods by calling them via the JNI
and Python's C interface.
Supporting downcasting
[SK Note: See Michel Salib's presentation "Indexing the US Patent DB with
Python and Xapian" yesterday. They tried to use Lucene (and PyLucene) first,
but had a lot of trouble with builds in their very multi-platform
environment. Then they discovered Xapian, which was much easier to
wrap and build. They thought Xapian provided faster queries and better
precision/recall. It comes out of Cambridge University (UK).
They will be releasing their pyXapWrap (or something like that) project
to open source.]
Cross-language error reporting
Looks like they wrapped it in both directions. I'm not sure, but it looks
like Python exceptions are wrapped into Java exceptions and vice versa.
Samples
Future work
Pulling other libraries into Python in the same way
800 line makefile!
PyLucene may soon end up under the Apache Foundation, along with Lucene itself.
(Yay!)
--------------------------------------------------------------------------
REFERENCES: {as documents / sites are referenced add them below}
There's a book about Java Lucene.
Jakarta Lucene project: http://lucene.apache.org/java/docs/
PyLucene project: http://pylucene.osafoundation.org/
--------------------------------------------------------------------------
QUOTES:
--------------------------------------------------------------------------
CONTRIBUTORS: {add your name, e-mail address and URL below}
Erik Rose <corp@grinchcentral.com>
Sally Kleinfeldt <skleinfeldt@tnc.org>
--------------------------------------------------------------------------
E-MAIL BOUNCEBACK: {add your e-mail address separated by commas }
--------------------------------------------------------------------------
NOTES ON / KEY TO THIS TEMPLATE:
A headline (like a field in a database) will be CAPITALISED
This differentiates from the text that follows
A variable that you can change will be surrounded by _underscores_
Spaces in variables are also replaced with under_scores
This allows people to select the whole variable with a simple double-click
A tool-tip is lower case and surrounded by {curly brackets / parentheses}
These supply helpful contextual information.
--------------------------------------------------------------------------
Copyright shared between all the participants unless otherwise stated...