Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Ted Leung on the air: Open Source, Java, Python, and ...
Wed, 04 Feb 2004
Chandler happenings
Here's a few items from the Chandler repository that might be of interest. (There's lots happening in the CPIA and applications area as well, but I don't want to spoil their fun...)
[17:13] |
[computers/open_source/osaf/chandler] |
# |
TB |
F |
G |
14 Comments |
- BerkeleyDB store for Lucene It looks like we are going to use a native compiled (gcj) version of Lucene to do free text searching in Chandler. In order to do this, we needed to adapt Lucene to use a BerkeleyDB database as a store. Andi wrote this code and it's been contributed back to the Lucene project, which has taken the code into the Lucene sandbox. There's also the Python layer that exposes the Lucene functionality to Python. People have wondered why we didn't just use Lupy. The problem with Lupy is that it doesn't perform as well as Lucene, and it's basically a port.
- Busy Developer's Guide to the Chandler Repository This document is supposed to help people get started using the repository. If you read it and it doesn't help you, let us know.
- Proposal for Chandler Query System This proposal is the start of our query facility, which will probably be consuming my working life for the next few months. If your interested in this sort of thing, now is the time to drop in and make your opinions known. We got some really good feedback in today's IRC, so don't be shy. I'm not only looking for feedback, but I'm also interested in finding people who'd like to work on building this. So please stop by dev@osafoundation.org or irc://irc.osafoundation.org:6667/chandler and say hi.
How much slower is Lupy? It seems like it might be possible to code key portions in C and get some improved results -- indexing seems like it would lend itself to a mixed-language approach. Though with a binary/non-VM Lucene and Python bindings, maybe it's the same idea just factored differently.
Posted by Ian Bicking at Wed Feb 4 17:41:39 2004
Posted by Ian Bicking at Wed Feb 4 17:41:39 2004
how about a path syntax, since you're already using a path to get the collection you're querying.
Posted by bryan at Thu Feb 5 03:41:22 2004
Posted by bryan at Thu Feb 5 03:41:22 2004
Lupy is supposedly 10x slower than Lucene. I tried to do a Lupy search plugin for pyblosxom (the current one uses Lucene), and it was noticeably slower so I gave up.
Rewriting pieces of Lucene/Lupy in C is always a possibility, but there's a lot more work there, and then you have to try and track changes in the Lucene codebase, since the original expertise in searching is working on Lucene. Starts to become a daunting proposition. If (and that's the key word) the gcj compiler and runtime are good, then it's less work to compile with gcj and link to Python.
Posted by Ted Leung at Thu Feb 5 10:32:51 2004
Rewriting pieces of Lucene/Lupy in C is always a possibility, but there's a lot more work there, and then you have to try and track changes in the Lucene codebase, since the original expertise in searching is working on Lucene. Starts to become a daunting proposition. If (and that's the key word) the gcj compiler and runtime are good, then it's less work to compile with gcj and link to Python.
Posted by Ted Leung at Thu Feb 5 10:32:51 2004
What's wbem provider? Do you mean web based e-mail? We have no plans to do that before a 1.0 release, but someone could certainly write a parcel to provide that functionality.
Posted by Ted Leung at Thu Feb 5 10:34:11 2004
Posted by Ted Leung at Thu Feb 5 10:34:11 2004
I'm not sure what you mean by a path syntax, and where you think it should be used. If you look at the sample queries, we already provide navigational access to items via their attributes.
Posted by Ted Leung at Thu Feb 5 10:35:35 2004
Posted by Ted Leung at Thu Feb 5 10:35:35 2004
wbem (web based enterprisemanagement) http://www.dmtf.org/home implemented in windows as wmi, there are a couple of open source wbem implementations out there: I prefer the following
http://www.openpegasus.org/
among the various things that wbem gives you is a unified way of querying what services a program provides, as well as registering for events from that program
Note: under wmi one has WQL, an sql syntax, which is how your busy developer's guide brought it to mind:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wmisdk/wmi/querying_with_wql.asp
All Microsoft Office tools currently have a wmi provider as far as I'm aware, eases the life of administrators no end. Exchange has all sorts of nifty wmi stuff going for it.
The path syntax:
the path //userdata/roots/email/inbox looks like an xpath, I suppose it probably is, but a simplified one. Then you move to the sql-like syntax: FOR i IN inbox WHERE "Foo" in i.Projects
I'll have to wait till the weekend to read the spec though, but I was wondering if the SQL-Like syntax could be combined into the path syntax. If indeed it were a full xpath then it would be possible to do as complex queries.
Posted by bryan at Thu Feb 5 11:16:29 2004
http://www.openpegasus.org/
among the various things that wbem gives you is a unified way of querying what services a program provides, as well as registering for events from that program
Note: under wmi one has WQL, an sql syntax, which is how your busy developer's guide brought it to mind:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wmisdk/wmi/querying_with_wql.asp
All Microsoft Office tools currently have a wmi provider as far as I'm aware, eases the life of administrators no end. Exchange has all sorts of nifty wmi stuff going for it.
The path syntax:
the path //userdata/roots/email/inbox looks like an xpath, I suppose it probably is, but a simplified one. Then you move to the sql-like syntax: FOR i IN inbox WHERE "Foo" in i.Projects
I'll have to wait till the weekend to read the spec though, but I was wondering if the SQL-Like syntax could be combined into the path syntax. If indeed it were a full xpath then it would be possible to do as complex queries.
Posted by bryan at Thu Feb 5 11:16:29 2004
Bryan,
Thanks for the pointer to wbem. I'll pass it on to the folks working in the applications layer.
As far as path syntax, the SQL like portion of the syntax is actually lifted fairly directly from XQuery. The path/location steps are delimited using . rather than '/', but are semantically equivalent. I did this on purpose because not all Python programmers will know XPath syntax. I suppose that we could do an XPath compatible syntax if people really ask for it, but so far you are the first person to notice/complain. The other issue with usng an XPath like path syntax is that people will think that they can do queries that will work in XPath/XQuery but don't make sense in the Chandler data model.
Posted by Ted Leung at Thu Feb 5 11:38:47 2004
Thanks for the pointer to wbem. I'll pass it on to the folks working in the applications layer.
As far as path syntax, the SQL like portion of the syntax is actually lifted fairly directly from XQuery. The path/location steps are delimited using . rather than '/', but are semantically equivalent. I did this on purpose because not all Python programmers will know XPath syntax. I suppose that we could do an XPath compatible syntax if people really ask for it, but so far you are the first person to notice/complain. The other issue with usng an XPath like path syntax is that people will think that they can do queries that will work in XPath/XQuery but don't make sense in the Chandler data model.
Posted by Ted Leung at Thu Feb 5 11:38:47 2004
It's nice to see Ted Leung's blurb about the Open Source Applications Foundation's (OSAF) recent contribution to Jakarta Lucene , a project I work on.
A side-effect of adding a Berkeley DB-based Lucene indices is that Lucene index operations can no
Posted by Trackback from Chichi Michi at Fri Feb 6 03:23:36 2004
Posted by Trackback from Chichi Michi at Fri Feb 6 03:23:36 2004
Have you considered using a query language that's completely parsable by Python? One thing that would make me particularly happy is if I could write my database queries and the rest of my software in the same language.. which, in my opinion, is one of the best things about using BerkeleyDB based storage or ZODB in the first place. It should be perfectly possible and efficient if you're willing to do egregious operator overloading like these guys: http://www.langdale.com.au/GraphPath/ or in nevow (creating XML using Python syntax, which is currently in Divmod Quotient, but is the evolution of something Donovan Preston and I prototyped at PyCon 2003 for the hell of it).
Posted by Bob Ippolito at Fri Feb 6 04:40:53 2004
Posted by Bob Ippolito at Fri Feb 6 04:40:53 2004
You mentioned a "Python layer that exposes the Lucene functionality to Python", and as far as I know, Lucene is written in Java, you mean by compiling it with gcj, one can build python extentions in Java! Could you elaborate on this, and TIA.
Posted by hyao at Fri Feb 6 06:54:13 2004
Posted by hyao at Fri Feb 6 06:54:13 2004
Bob,
Having a python based "api" in addition to the string API is definitely a goal. I'll take a look at GraphPath to see what they are doing. Any additional suggestions or pointers that you have would be great.
Posted by Ted Leung at Fri Feb 6 10:21:17 2004
Having a python based "api" in addition to the string API is definitely a goal. I'll take a look at GraphPath to see what they are doing. Any additional suggestions or pointers that you have would be great.
Posted by Ted Leung at Fri Feb 6 10:21:17 2004
Hyao,
There's no rocket science (or very little) going on here. gcj compiles Java into machine code which can be linked to from Python just like you can link to C code from Python. So yes, you can use Java to build Python extensions.
Posted by Ted Leung at Fri Feb 6 10:26:22 2004
There's no rocket science (or very little) going on here. gcj compiles Java into machine code which can be linked to from Python just like you can link to C code from Python. So yes, you can use Java to build Python extensions.
Posted by Ted Leung at Fri Feb 6 10:26:22 2004
Someone else did a C++ port of Lucene, which may be worth looking at. It might be a little.. nicer.. than using gcj: http://sourceforge.net/projects/clucene/
Posted by Bob Ippolito at Sat Feb 7 21:20:03 2004
Posted by Bob Ippolito at Sat Feb 7 21:20:03 2004
You can subscribe to an RSS feed of the comments for this blog:
Add a comment here:
You can use some HTML tags in the comment text:
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
<a href>
, <em>
, <i>
, <b>
, <blockquote>
, <br/>
, <p>
, <code>
, <pre>
, <cite>
, <sub>
and <sup>
.You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk