Talk:Kernel trick

From Wikipedia, the free encyclopedia

Mathematics Portal

This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.

Mathematics rating:

Stub Class

Low Priority

Field: Applied mathematics

Please update this rating as the article progresses, or if the rating is inaccurate. Click to show/hide comments.
Please add to or update the comments to suggest improvements to the article.
--Cronholm¹⁴⁴ 21:39, 15 June 2007 (UTC)

1 PDF linked in external resources
2 Mercer's condition
3 External link
4 Intro typo?
5 MDS

[edit] PDF linked in external resources

I am going to delete the document linked as part of the external resources. My reasons are : 1)the document lacks any scholarly format (there are even question marks as to the uncertainty of certain details - for example, in the second paragraph, the author is unsure of who is responsible for a contribution). 2) The language used in the document is too informal. 3) The author introduces many concepts without a clear background or explanation that is necessary to understand them.

I, on the whole believe that, someone who does link to this document and gives it a read will be left with no clear idea of how the kernel trick works.

If you think otherwise, please undo my edit and write a short note on my talk page as to why this document should continue to be linked.

Grokmenow 11:22, 18 October 2007 (UTC)

[edit] Mercer's condition

I find this statement of Mercer's theorem very vague. Specifically, what is the space that the kernel operates in, and what is the higher-dimensional space? I would think it would be easy for anyone who knows this theorem to state the answers. Michael Hardy 22:29, 24 Aug 2003 (UTC)

After the last round of edits, the article is now factually incorrect. It is not that the kernel is decomposed into eigenfunctions. It is that the kernel is decomposed into a single dot product

K (x, y) = φ(x)φ(y)

. The new feature space

F

is defined by the map

φ: x - > F

φ

may be a non-linear function of x and

F

may have a much higher (or infinite) dimension than x.

The whole point is that linear algorithms can operate in

φ

space without actually computing any values in

F

, since

F

may be very high (or even infinite dimension).

Please see Chapter 4 of Chris Burges' tutorial [1] for a more complete explanation, especially Section 4.1 .

In any event, the article is now incorrect and must be changed. I can edit the article, but I'll give you a first chance to correct it. -- hike395 04:05, 26 Aug 2003 (UTC)

P.S. Notice in Burges' tutorial that Support Vector Machine is capitalized. -- hike395

Went ahead and fixed article -- hike395 04:26, 27 Aug 2003 (UTC)

I edited the article on Mercer's theorem. As stated originally, it was not correct... not every positive definite function is a product of functions. What is true is that a positive definite function as an integral operator is the product of two non0negatiev operators.

This should be cleared up. Good chance for collaboration here between mathematicians and AI people.CSTAR 19:55, 11 May 2004 (UTC)

Well I think the issue about what the kernel trick actually is should be able to be resolved --- semi-definite functions K(x,y) are of the form <phi(x), phi(y)> for some function phi with values in some Hilbert space H. Could we agree to do this? To reiterate the point made on other page, the fact at this level of generality has nothing to do with Mercer's theoremCSTAR 19:32, 13 May 2004 (UTC)

I went back and checked Burges' tutorial on SVMs ([2]). Burges is very well-respected and very exacting. His tutorial says that it is Mercer's condition that guarantees that a semi-definite kernel is a dot product in a Hilbert space, not Mercer's theorem. He cites Courant and Hilbert, 1953.

I think things were mixed up between the two terms: condition and theorem. So, let's fix this article to say Mercer's condition, not Mercer's theorem, and then everything will be OK. Right? -- hike395 05:16, 14 May 2004 (UTC)

One more comment: As stated in the main article of the [Kernel trick]], Mercer's condition needs a measure μ on the space X in order for inequality stated there to make sense. A simpler condition, which does not involve integrals (and in some cases is equivalent to the integral form of Mercer's condition) is

$\sum_{i,j} K(x_i,x_j) c_i c_j \geq 0$

for any finite sequence of 'x₁,..,x_n of X and sequence c₁,..,c_nof real numbers.

Do you really need the integral form of Mercer's condition? Also I think it would be a lot clearer to mathematicians if you said:

φ is a function with values in H

instead of

φ is a function with image in H

I agree. Also changed image to range, because the word linked to the article function range. -- hike395 12:56, 14 May 2004 (UTC)

In my opinion the whole statement of Mercer's condition here could be replaced with a simpler and more germane statement of what it means for a kernel function to be symmetric positive semidefinite:

$K(x_i,x_i) \ge 0$ for all nonzero $x_i \in X$

K (x i, x j) = K (x j, x i)

for all $x_i,x_j \in X$

Kenahoo 18:53, 15 September 2006 (UTC)

Seconded. I'm confused by the current statement of Mercer's condition. As it stands the condition requires the inequality to hold for "any finite sequence of x... and sequence c... of real numbers". Do the cs depend on the xs? Unless I'm missing the point (which is quite possible), can't I always choose some sequence of cs given some sequence of xs with odd-length that will make the sum negative by simply setting the last c_i = -c_i? -- MDReid 01:10, 5 January 2007 (UTC)

[edit] External link

Please put the new link in an external link section. CSTAR 12:42, 19 Aug 2004 (UTC)

The link is broken.

[edit] Intro typo?

This sounds wrong:

by mapping the original observations into a higher-dimensional non-linear space so that linear classification in the new space is equivalent to non-linear classification in the original space.

Should it say this:

by mapping the original observations into a higher-dimensional linear space so that linear classification in the new space is equivalent to non-linear classification in the original space.

or is it correct as is?

— BenFrantzDale

Yes, I believe you are correct. Alternatively we could just remove the word non-linear altogether:

by mapping the original observations into a higher-dimensional space so that linear classification in the new space is equivalent to non-linear classification in the original space.

Kenahoo 18:41, 15 September 2006 (UTC)

[edit] MDS

I believe there should be a mention of Multi dimension scaling in this article because of the "kernel trick" is basically a form of MDS. Perhaps a mention of the similarity should be mentioned in the introduction of the article and a link to the MDS page should be added at the end of the article. —Preceding unsigned comment added by 156.56.92.123 (talk) 19:08, 24 February 2008 (UTC)

See also ebooksgratis.com: no banners, no cookies, totally FREE.

Talk:Kernel trick

From Wikipedia, the free encyclopedia

Contents

[edit] PDF linked in external resources

[edit] Mercer's condition

[edit] External link

[edit] Intro typo?

[edit] MDS

Views

Navigation

Interaction

Search