|
SIGMOD 2008
2008 ACM SIGMOD International Conference on Management of Data
June 9-12 2008
Vancouver
Canada
|
Review: 1 |
Reviewer:
| Panos Chrysanthis |
Email:
| panos@cs.pitt.edu |
Organization:
| University of Pittsburgh |
Review:
| Question | Response |
1 | Overall Rating |
Weak Accept
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
High
|
4 | Technical Depth |
High
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
to a limited extent
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
adequate
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
no or not applicable
|
8 | Presentation |
Very Good
|
9 | Reviewer Confidence |
Low
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
The papers proposes a scheme called Active Feature Probing (AFP) that automatically generates a minimal number of questions to ask a customer for finding an appropriate recommendation in a helpdesk database. AFP models the problem as a classification problem but it does not build a static decision tree that forces a specific traversal to a recommendation and which does not permit the consideration of user's initial and additional information as well as to correct previous answers as ACP does. ACP also handles the problem of missing values using a Monte Carlo data augmentation method. The paper is well written and organized with a balance of theory and experimentation, both equally strong. Although I am not aware of any similar work in the context of databases, the experiments using synthetic datasets based on news reports made me wondering if similar work exists in the context of NLP. On the other hand, the related work in the paper is quite comprehensive.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
S1) Well written paper addressing a real problem and proposing a nice solution.
S2) Formal treatment of the problem and the proposed solution.
S3) Good evaluation which includes a small yet informative user study. It uses both synthetic based in news reports and real data from a helpdesk and the experimental set up is well described. The discussion of the results does not hide the scalability problem of AFP. On the contrary, it uses it as a motivation for future investigation.
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
W1) This proposed work is strongly related to case-based reasoning and NLP but there no
mention to this body of work at all. In fact, case-based reasoning has been used in the
context of automated help systems.
W2) The user study in the evaluation needs to be more comprehensive to be solid and convincing.
|
14 | Detailed comments (please number each point) |
None really
|
15 | Comments for the Program Committee |
My only concern is that this work might be a re-discovery of work done in the context of NLP.
If this is a common concern with the other reviewers, we might wish to ask an NLP colleague to take a look at this paper. Otherwise, it is a very nice paper.
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
Review: 2 |
Reviewer:
| Vasant Dhar |
Email:
| vdhar@stern.nyu.edu |
Organization:
| New York University |
Review:
| Question | Response |
1 | Overall Rating |
Weak Reject
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Low
|
4 | Technical Depth |
Medium
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
to a limited extent
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
adequate
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
no or not applicable
|
8 | Presentation |
Adequate
|
9 | Reviewer Confidence |
High
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
The paper addresses an important problem, although I'm unsure about whether the solution proposed is adequate to address it. The authors do a decent (not stellar) job of articulating the difficulty of building a robust helpdesk advisory model.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
S1. Good description of the problems associated with building flexible advisory systems.
S2. A listing of related work
S3. Real experiments to show how well their proposed solution works
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
W1. Decision Trees are used as a bit of a strawman. Its possible to envision DTs being used in a way that don't suffer from the problems the authors claim.
W2. There's a huge body of literature from Case-Based Reasoning that the authors don't consider, including one of the best known commercial successes by Compaq on the user of cases for supporting their helpdesk. These approaches look at the problem "holistically" not one feature at a time.
W3. What's novel about "data augmentation" isn't well articulated, nor are the drawbacks of trtaditional CBR approaches adequately discussed. There's a statement that says "suppose there are 10,000 cases and we know only 10 feature values (all are binary), there are only 10 matching cases on average." How?!! Assuming all features are independent? Is this reasonable when they are clearly not?
|
14 | Detailed comments (please number each point) |
1. I think the authors should reconsider some of the "shortcomings" of previous approaches carefully and fairly. This is an old problem with a lot of research, so its worth being "generous" about the ability of considered approaches instead of dismissing them prematurely. This will lead to a clearer and more believable description of the contribution of this paper.
2. What's novel about "data augmentation" needs to be better articulated, and the drawbacks of traditional CBR approaches to helpdesks better discussed (including database maintenance as the number of cases grows, which the authors don't mention). Some of the design choices need to be better justified (why use logistic regression to generate labels for the samples? Etc)
|
15 | Comments for the Program Committee |
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
Review: 3 |
Reviewer:
| Ken Ross |
Email:
| kar@cs.columbia.edu |
Organization:
| Columbia University |
Review:
| Question | Response |
1 | Overall Rating |
Weak Reject
|
2 | Reject due to technical incorrectness |
No
|
3 | Novelty |
Medium
|
4 | Technical Depth |
Medium
|
5 | Is the relevant and reasonably well known state of the art fairly treated (note this may come from outside the database literature and even outside computer science)? |
yes, allowing for space limitations
|
6 | Experimental results should meet the following expectations: deal with materially relevant cases (e.g., updates as well as queries, different scales) allowing for space limitations; statistically meaningful results; and use standard datasets or benchmarks (as opposed to a cherry-picked one) when possible. How did the experiments rate? |
not good
|
7 | If experiments were presented, would you want them checked for repeatability (as might be required in a future sigmod)? |
no or not applicable
|
8 | Presentation |
Adequate
|
9 | Reviewer Confidence |
High
|
10 | Name of External Reviewer (if applicable) |
|
11 | Summary of the paper's main contributions and impact (up to one paragraph) |
The authors propose to dynamically determine the next question to ask a user in a helpdesk application given some partial information about the problem.
|
12 | Three strong points of the paper (please number them S1,S2,S3) |
S1. This seems like an interesting problem, and manual creation of such helpdesk application databases seems expensive.
S2. The proposed methods have been implemented in a real system.
|
13 | Three weak points of the paper (please number them W1,W2,W3) |
W1. The main experimental evaluation should be the user study. In the paper, the user study is very preliminary, and no conclusions can be drawn from it. This will be a much better paper with a more substantial user study.
W2. I am not convinced about the idea of adding fictitious (not factitious!) answers. See detailed comments below.
|
14 | Detailed comments (please number each point) |
D1. The paper does not address the important issue of question sequentiality. For example, the question "have you plugged in your external disk drive" should only be asked once it is confirmed that the user has an external disk drive. The extension of answers to such questions for situations where no external disk drive is present seems strange. It seems like, in the present work, the question mentioned above could be asked even if no external drive were present.
D2. I'm not sure how the system appears to the helpdesk employee. In particular, how is the prior information input into the system? In a broad help application, this input step could itself require navigation of a complex hierarchy not unlike the question-answering hierarchy itself. If so, having a bunch of initial information might still mean that a hierarchical sequence of steps must be followed.
D3. There are several places where the writing is nongrammatical.
D4. The source [26] for customer service statistics is unlikely to be objective.
D5. Another kind of related work is faceted search, such as the Flamenco system. You can choose to navigate a hierarchy according to any dimension you like, and the cardinalities of each remaining subcategory are shown.
D6. Be clear about what equation 1 means when the probability is zero (the log of 0 is undefined).
D7. Be more explicit in defining "accuracy" in section 6.1.
D8. It is not clear in section 6.4 if these are real examples or made-up examples.
|
15 | Comments for the Program Committee |
|
16 | Is this paper a candidate for the Best Paper Award? |
No
|
17 | Would author feedback be useful for this Review? (if "Yes", please answer Q. 18) |
No
|
18 | List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 17) |
|
|
|