|
SIGMOD 2008
2008 ACM SIGMOD International Conference on Management of Data
June 9-12 2008
Vancouver
Canada
|
Review: 1 |
Reviewer:
| Bin Cui |
Email:
| bin.cui@pku.edu.cn |
Organization:
| Peking University |
Review:
| Question | Response |
1 | Overall Rating |
Weak Reject
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Medium
|
4 | Technical Depth |
Medium
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
yes, allowing for space limitations
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
not good
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
yes
|
8 | Presentation |
Adequate
|
9 | Reviewer Confidence |
High
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
Finding nearest neighbors in uncertian database has attracted more and more attention recently. In this paper, the authors formulated the problem of finding the k most probable nearest neighbors (Topk-PNN), and designed a model that allows the uncertainty in both query and data objects. Under this model, the authors defined the semantics of Topk-PNN queries and provided several efficient
evaluation algorithms. The experimental results show performance
gain compared to current approaches.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
S1, clear research motivation
S2, investigating an important research problem
S3, detailed experimental study
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
1) Review of related literature is not very good.
2) The algorithms are not described in great detail.
3) The experimental study is rather weak.
|
14 | Detailed comments (please number each point) |
The paper is well written and the authors proposed a new approach to evaluate top k-pnn query. The idea is quite interesting, and the experimental results show the superiority of the proposed method. My concerns about the paper are as follows:
1) In Page 2, "O_2 is inside the shaded area". That area is not shaded.
2) In Page 2, I doubted the arguement "Current proposals do not
adopt a formal cost model and concentrate only on reducing
the computation cost while ignoring the cost
incurred by objects retrieval." The I/O cost is a main performance issue in many papers.
3)In Page 3, "a range of possible distances between Oi.A and the
query object Q." should be Oi--Q or Oi.A--Q.A? The authors seem misuse the notations in many places.
4) In Page 5, "When our procedure is applied a sufficient
number of times," what is the sufficient number? The readers can't figure it out, and don't know how it will affect the performance. The analysis about this number should be provided.
5) The authors mentioned R*-tree for indexing. How is it implemented? each object is represented by a rectangle with possible region? How are the scores evaluate for tree traversal?
6) The experimental study are unsatificatory.
a) The dataset is too small, only 10K objects.
b) The compared approach [6] is not start-of-art. Many new approaches have been proposed in last few years.
c) It is quite unreasonable to me that CPU-centric and Baseline method need to retrieval all the objects. Is this because no good pruning method is used? If so, why bother to use the R-tree as index?
d) the CPU-centric method is worse in terms of both time and I/O. So why propose it?
e) Did you consider any update of objects?
|
15 | Comments for the Program Committee |
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
Review: 2 |
Reviewer:
| Anthony Tung |
Email:
| at@comp.nus.edu.sg |
Organization:
| National University of Singapore |
Review:
| Question | Response |
1 | Overall Rating |
Accept
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Medium
|
4 | Technical Depth |
High
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
yes, allowing for space limitations
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
adequate
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
yes
|
8 | Presentation |
Very Good
|
9 | Reviewer Confidence |
High
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
This paper developed efficient techniques for finding top-k probable
nearest neighbors in uncertain databases.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
|
14 | Detailed comments (please number each point) |
This paper provide nice motivation examples for their work.
I also like the comprehensive studies being done in this paper.
The definition of the problem is general enough to represent
cases in which both queries and data are uncertain.
The consideration for optimizing CPU and I/O cost individually
before optimizing the total cost is nice. Solutions are also provide
for different instance of the problem including uncertain query with
uncertain data and uncertain query with certain data. I particularly
like section 6 which discuss the handling of dependcies which is
something often missing in other simlar papers that I have seen.
It is unfortunate that no experimental studies are conducted on
real life datasets which can further strenghten the paper.
|
15 | Comments for the Program Committee |
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
Review: 3 |
Reviewer:
| Ronald Fagin |
Email:
| fagin@almaden.ibm.com |
Organization:
| IBM Almaden Research Center |
Review:
| Question | Response |
1 | Overall Rating |
Weak Reject
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Medium
|
4 | Technical Depth |
Medium
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
to a limited extent
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
not good
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
no or not applicable
|
8 | Presentation |
Adequate
|
9 | Reviewer Confidence |
Medium
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
The paper gives an approach towards finding the k objects with the highest probability of being the nearest neighbor.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
S1. The problem of finding the k objects with the highest probability of being the nearest neighbor is an interesting one. Instead of trying to find the k nearest neighbors in some sense (which is the usual approach), the authors find the k objects with the highest probabilities of being the nearest neighbor.
S2. I like the basic approach of trying to find the top k without worrying about doing exact probabilistic computation, but simply guaranteeing the right answers.
S3. The idea of maintaining lower and upper bounds on the probability that a given object is the nearest neighbor is a good approach.
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
W1. The authors give no guarantee that the algorithm is efficient. Theorem 3.6 is very weak - it simply says that the authors’ algorithm is best among an extremely limited class of algorithms that follow a similar strategy to the authors’ algorithm. For all we know, there may be other algorithms that solve the problem much faster. For example, the authors could possibly prove "instance optimality" in the sense of Fagin's 2001 PODS paper (I believe there is a journal version now). It looks like Section 3.1 is a step in that direction.
W2. The experimental results also do not show the goodness of the algorithm, compared to other possible algorithms.
W3. It is unpleasant that the authors delay their description of how to compute the upper and lower bounds on the probabilities until Section 3.2, whereas they make use of these values in Section 3.1. It makes Section 3.1 hard to understand.
|
14 | Detailed comments (please number each point) |
1. I cannot understand either Lemma 3.1 or its proof. What does it even mean to say that a certain retrieval order is necessary to compute nontrivial bounds?
2. In Example 1.1, it isn’t right to say that probability of being in the shaded area is higher than that of being in other areas, unless this probability is greater than 1/2. What if we take as an "other area" the entire rest of the Figure outside of the shaded area?
3. In the third paragraph of Section 1.1, it isn't right to say that an approach based on expected distances gives "inaccurate" query answers. It is a legitimate alternate approach to deal with expected distances, and to give answers with the smallest expected distances. There is nothing "inaccurate" about it - it simply represents a different choice as to what we are looking for.
4. Again in the third paragraph of Section 1.1, what does it mean to say that O_2 is closer to q than "the majority of O_1's possible locations"? Can't O_1 have infinitely many possible locations?
5. I don't think the sentence at the end of point 1 on the right-hand side of page 2 is true, that current proposals ignore the cost incurred by objects retrieval. The cost of retrieval is a standard measure of cost.
6. In the paragraph at the end of the first column of page 3, why are the authors switching from representing the query by q to representing it by Q?
7. In the third full paragraph on page 4, attaining the > sign in the last sentence of that paragraph may not be possible. What if there are ties, so that, for example, there are k+1 objects with exactly the same probability of being the nearest neighbor, and their probabilities are all larger than those of any other objects? Then ties must be broken in deciding the top k, so the > sign is not attainable.
|
15 | Comments for the Program Committee |
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
|