Ted Leung on the air
Ted Leung on the air: Open Source, Java, Python, and ...
Ted Leung on the air: Open Source, Java, Python, and ...
Wed, 14 May 2003
Maybe the future of the semantic web is LSI
Stefano did what I did last night and dug into all the papers on Latent Semantic Indexing (LSI). Actually, he did more than I did, because he got the papers and the math. But I'm not sure if he got something that I got: LSI is patented. So all that cool multidimensional linear algebra cannot be used in open source software. Does that kill the semantic web? Of course not. But it 's going to prevent community development of systems that use LSI.
[00:44] |
[computers/internet] |
# |
TB |
F |
G |
5 Comments |
First of all, LSI is simply based on the concept of spectral decomposition, something that has been around for centuries, you can't patent that. Period. Also, you can't patent how I create a matrix of data and if you do, there is always another way of doing it which can be equivalent (or, even, potentially better). The patent covers ONE possible way of using that math. I'll simply implement another one, show it's better and route around the patent obstacle (which shows why patents on algorithms will never work unless extremely precise, say RSA public key, for example or LZW encoding)
for example, the patent DOES NOT! cover the use of the underlying markup structure of text. This has been prooven effective by google. I'm currently writing the math to do that. It's hard as hell (did you ever applied the spectral composition to hyper-hyper-dimensional matrix?) and I can't find litterature on it, so this means I'm doing something new, innovative. The patent system was done exactly for that: stop blatant stealing but foster innovation. I'm doing exactly that.
Even more: the patent was filed in 1983, it won't take long to expire anyway.
Moreover, US patent policies allow a patent to be used for research. If you don't the software commercialy, there is no problem in releasing the software as open source. And if you want to use it commercially, you'll have to ask for a license to the patent holder (just like it's not illegal to redistribute GIFs in our open source packages). Nothing new under the sun: you can bet your ass that Microsoft will do the same with Mono as soon it emerges as a potential problem for their .NET marketing
And if the above was not enough, remember that only the US have to comply to US patent restrictions. The world is much bigger and diverse than the US.
Posted by Stefano at Wed May 14 07:54:04 2003
for example, the patent DOES NOT! cover the use of the underlying markup structure of text. This has been prooven effective by google. I'm currently writing the math to do that. It's hard as hell (did you ever applied the spectral composition to hyper-hyper-dimensional matrix?) and I can't find litterature on it, so this means I'm doing something new, innovative. The patent system was done exactly for that: stop blatant stealing but foster innovation. I'm doing exactly that.
Even more: the patent was filed in 1983, it won't take long to expire anyway.
Moreover, US patent policies allow a patent to be used for research. If you don't the software commercialy, there is no problem in releasing the software as open source. And if you want to use it commercially, you'll have to ask for a license to the patent holder (just like it's not illegal to redistribute GIFs in our open source packages). Nothing new under the sun: you can bet your ass that Microsoft will do the same with Mono as soon it emerges as a potential problem for their .NET marketing
And if the above was not enough, remember that only the US have to comply to US patent restrictions. The world is much bigger and diverse than the US.
Posted by Stefano at Wed May 14 07:54:04 2003
ERRATA: the patent was filed in September 15, 1988. Sorry.
Posted by Stefano at Wed May 14 07:56:10 2003
Posted by Stefano at Wed May 14 07:56:10 2003
I whipped up a quick implementation of something like LSI (see http://www.perl.com/pub/a/2003/02/19/engine.html). The (very rough) code is on my website.
Posted by Nick at Thu May 15 05:58:46 2003
Posted by Nick at Thu May 15 05:58:46 2003
Some things you might be interested in:
Maciej Ceglowski, through his work at NITLE (nitle.org) has developed an LSI search tool. You can read about it at http://javelina.cet.middlebury.edu/lsa/out/lsa_intro.htm
There's also a prototype accessible somewhere from there. This approach has all the limitations of LSI: it doesn't scale beyond a few thousand docs. And of course, there's the patent.
He and the fellow he works with have developed an alternative, which is described here. http://www.idlewords.com/weblog.04.2003.html#157
I don't know of any implementations.
And of course, if you haven't yet, you need to check out Waypath (waypath.com), which implements a high-precision semantic search using something completely different from LSI: scalable, portable, mergeable, the list goes on.
Posted by Steven Nieker at Fri May 16 21:18:22 2003
Maciej Ceglowski, through his work at NITLE (nitle.org) has developed an LSI search tool. You can read about it at http://javelina.cet.middlebury.edu/lsa/out/lsa_intro.htm
There's also a prototype accessible somewhere from there. This approach has all the limitations of LSI: it doesn't scale beyond a few thousand docs. And of course, there's the patent.
He and the fellow he works with have developed an alternative, which is described here. http://www.idlewords.com/weblog.04.2003.html#157
I don't know of any implementations.
And of course, if you haven't yet, you need to check out Waypath (waypath.com), which implements a high-precision semantic search using something completely different from LSI: scalable, portable, mergeable, the list goes on.
Posted by Steven Nieker at Fri May 16 21:18:22 2003
Some things you might be interested in:
Maciej Ceglowski, through his work at NITLE (nitle.org) has developed an LSI search tool. You can read about it at http://javelina.cet.middlebury.edu/lsa/out/lsa_intro.htm
There's also a prototype accessible somewhere from there. This approach has all the limitations of LSI: it doesn't scale beyond a few thousand docs. And of course, there's the patent.
He and the fellow he works with have developed an alternative, which is described here. http://www.idlewords.com/weblog.04.2003.html#157
I don't know of any implementations.
And of course, if you haven't yet, you need to check out Waypath (waypath.com), which implements a high-precision semantic search using something completely different from LSI: scalable, portable, mergeable, the list goes on.
Posted by Steven Nieker at Sat May 17 11:04:38 2003
Maciej Ceglowski, through his work at NITLE (nitle.org) has developed an LSI search tool. You can read about it at http://javelina.cet.middlebury.edu/lsa/out/lsa_intro.htm
There's also a prototype accessible somewhere from there. This approach has all the limitations of LSI: it doesn't scale beyond a few thousand docs. And of course, there's the patent.
He and the fellow he works with have developed an alternative, which is described here. http://www.idlewords.com/weblog.04.2003.html#157
I don't know of any implementations.
And of course, if you haven't yet, you need to check out Waypath (waypath.com), which implements a high-precision semantic search using something completely different from LSI: scalable, portable, mergeable, the list goes on.
Posted by Steven Nieker at Sat May 17 11:04:38 2003
You can subscribe to an RSS feed of the comments for this blog:
Add a comment here:
You can use some HTML tags in the comment text:
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
<a href>
, <em>
, <i>
, <b>
, <blockquote>
, <br/>
, <p>
, <code>
, <pre>
, <cite>
, <sub>
and <sup>
.You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*
Ordered list => consecutive lines starting spaces and an asterisk