Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Ted Leung on the air: Open Source, Java, Python, and ...
Wed, 23 Jun 2004
pylucene is a separate project
Today OSAF is breaking out pylucene as a separate project. Here is the announcement. We covered this today during our IRC office hours. For those of you who missed the session, here is the IRC log for today's discussion.
The pylucene project page has the usual information including a
Subversion repository, mailing list info, and a bugzilla component. We're experimenting with the use of Subversion as a version control system.
A lot of people have asked for pylucene as a separate project. I hope that this will be useful to folks out there in the broader Python community. I also hope that some folks will get involved in helping to improve pylucene. There are a bunch of things that need doing, a distutils based install being high on my list.
From my perspective, this a good study in looking at how to leverage other projects. We were interested in full-text indexing from Python. We had a few options that included Managing Gigabytes, Lupy (a pure python port of Lucene), CLucene (a C++ port of Lucene) and the original Java port of Lucene. When comparing Managing Gigabytes versus any of the Lucene related libraries, most of the Lucene ports are being actively maintained, while the Managing Gigabytes software hasn't been updated since 1999. That leaves you with a Lucene variant. Lupy is much slower than either Lucene or CLucene. So that rules out Lupy. Only Lucene and CLucene are left. Coming from Python, it would seem that CLucene is the obvious choice since it's written in C++. The problem with this is that it's a port, and you are always trailing Lucene. Further, the real search engine expertise lies with Doug Cutting, who is working on Lucene. So if you have a search related problem in CLucene, it might have to go all the way back to Lucene anyway. The biggest problem with Lucene is that it's written in Java, and the last thing we want in Chandler is to have people also install Java. Enter gcj. Thanks to the hard work of the gcj team, you can compile Lucene into native code, which can then be wrapped with SWIG (just like CLucene needs to be for Python). As both gcj and gcc get better, pylucene will benefit (of course CLucene would also benefit from improvements to gcc).
[18:02] |
[computers/open_source/osaf] |
# |
TB |
F |
G |
36 Comments |
I just had my own personal Lucene conversion this week, so now I'm really excited to know it'll be waiting for me in any pythony stuff I need to do. And gcj really seems to be breaking out as a way to take the fabulous library work that has been going on in the Java community and spread it to other enclaves. Great news!
Posted by Wilhelm at Wed Jun 23 20:23:48 2004
Posted by Wilhelm at Wed Jun 23 20:23:48 2004
Ted Leung has an interesting announcement pylucene is a separate project. Jere is the press release. Here is the Project Page. This is very interesting both for how it is done and because it is going to be so useful.
Posted by Trackback from 42 at Wed Jun 23 20:23:54 2004
Posted by Trackback from 42 at Wed Jun 23 20:23:54 2004
I find myself feeling a little conflicted about this announcement. As much as I'm very pleased to hear about pylucene as a separate project, I'm also a bit disappointed that someone didn't just take over the mg code base and write a Python wrapper. It's an academic project and I doubt Witten, Moffat and Bell would object. I've read all of its source code -- it's pretty straightforward and simple.
Ted, you say mg was rejected because it hasn't been updated since 1999. Maybe that's because it just works. It certainly works for me and has for years. And being C, it burns rubber on inverted files. Who wants to wait for long periods for indexing to finish?
But oh well. Thanks for pylucene!
Posted by patrick at Wed Jun 23 23:16:57 2004
Ted, you say mg was rejected because it hasn't been updated since 1999. Maybe that's because it just works. It certainly works for me and has for years. And being C, it burns rubber on inverted files. Who wants to wait for long periods for indexing to finish?
But oh well. Thanks for pylucene!
Posted by patrick at Wed Jun 23 23:16:57 2004
There's also the train of thought that lucene is becoming the de-facto open source search engine api (source)
So pylucene developers will be able to utilize the expertise of existing lucene developers.
Posted by Darryl at Thu Jun 24 05:38:05 2004
So pylucene developers will be able to utilize the expertise of existing lucene developers.
Posted by Darryl at Thu Jun 24 05:38:05 2004
i am puzzled with this comment "The biggest problem with Lucene is that it's written in Java, and the last thing we want in Chandler is to have people also install Java."
why not? what is the big deal with installing java anyway? everybody is very fncy when it comes to lets say updating your desktop environment, or kernel or web browser, is it a big trouble to install a JRE? Sorry, but i find this comment shamefull for a java developer.
regards.
Ahmet
Posted by aaa at Thu Jun 24 06:34:11 2004
why not? what is the big deal with installing java anyway? everybody is very fncy when it comes to lets say updating your desktop environment, or kernel or web browser, is it a big trouble to install a JRE? Sorry, but i find this comment shamefull for a java developer.
regards.
Ahmet
Posted by aaa at Thu Jun 24 06:34:11 2004
It's offical. The OSAF just announced that Pylucene has been packaged as a seperate project which can be found here: pylucene.osafoundation.org. It's a neat project building on some other cool open source projects.
Posted by Trackback from randomthoughts at Thu Jun 24 06:48:13 2004
Posted by Trackback from randomthoughts at Thu Jun 24 06:48:13 2004
Ahmet,
Programming languages are tools not religions.
Posted by Ted Leung at Thu Jun 24 21:57:22 2004
Programming languages are tools not religions.
Posted by Ted Leung at Thu Jun 24 21:57:22 2004
CLucene-well,PyLucene-? What do You think about VBLucene (Visual Basic)? :-)
Posted by Vladimir at Fri Jun 25 01:14:40 2004
Posted by Vladimir at Fri Jun 25 01:14:40 2004
Ted, i feel the same about languages (yet, i prefer java in most of the case, and spesifically do not like platform dependencies of some other languages), but i critisized your comment because it's reasoning is wrong.
People downloads the latest JVM if you make a software they like. Look Azureus, a java bittorrent client, it requires JVM 1.4.2 and it was downloaded 1 million times in one week. Nobody complained "ah.. but but i have to download 10MB JRE for that.. forget it" Even novice people were trying JRE 1.5 beta's if there is a difference.
Plus, Lucene is mostly a server side component, and please forgive my ignorance but scripting languages are not really cut for speed..
Posted by ahmet at Fri Jun 25 04:26:58 2004
People downloads the latest JVM if you make a software they like. Look Azureus, a java bittorrent client, it requires JVM 1.4.2 and it was downloaded 1 million times in one week. Nobody complained "ah.. but but i have to download 10MB JRE for that.. forget it" Even novice people were trying JRE 1.5 beta's if there is a difference.
Plus, Lucene is mostly a server side component, and please forgive my ignorance but scripting languages are not really cut for speed..
Posted by ahmet at Fri Jun 25 04:26:58 2004
Ted - good news. I am writing a chapter about Lucene Ports for Lucene in Action (...) in the other window, and I'm about to add a piece about PyLucene. Welcome to the Lucene Ports Club! :)
Posted by Otis at Fri Jun 25 06:47:34 2004
Posted by Otis at Fri Jun 25 06:47:34 2004
Moving target. Typically a very fast one. Case in point: I'm writing a chapter about Lucene Ports (there is a whole army of them!) for Lucene in Action , and just the other day I heard of a new Lucene port: PyLucene . By the way, this is a second L
Posted by Trackback from Otis' Chichi Michi at Fri Jun 25 07:16:39 2004
Posted by Trackback from Otis' Chichi Michi at Fri Jun 25 07:16:39 2004
Ahmet,
Our app, Chandler, is already written in Python. We felt that including a JRE and doing all the work of building a Python/Java bridge ourselves was more work that creating PyLucene. Your original comment was "I find this comment shameful for a java developer". I am a Lisp developer, a Python developer, a Java developer, and have been a C/C++ developer. I use the tool at hand. For Chandler, the tools of choice are Python and C/C++ libraries wrapped with Python.
Posted by Ted Leung at Fri Jun 25 11:03:35 2004
Our app, Chandler, is already written in Python. We felt that including a JRE and doing all the work of building a Python/Java bridge ourselves was more work that creating PyLucene. Your original comment was "I find this comment shameful for a java developer". I am a Lisp developer, a Python developer, a Java developer, and have been a C/C++ developer. I use the tool at hand. For Chandler, the tools of choice are Python and C/C++ libraries wrapped with Python.
Posted by Ted Leung at Fri Jun 25 11:03:35 2004
You can subscribe to an RSS feed of the comments for this blog:
Add a comment here:
You can use some HTML tags in the comment text:
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
<a href>
, <em>
, <i>
, <b>
, <blockquote>
, <br/>
, <p>
, <code>
, <pre>
, <cite>
, <sub>
and <sup>
.You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk