-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excuse me. How can I evaluate the initial effect? #10
Comments
@Laqcce-cao So you mean the results reproduced in the image are obtained after step 1: Bidirectional Schema Linking, right? BTW. Could you please provide the code for these three results before step1? Another question about RSL: When I was reproducing with GPT-4, I noticed that the scores from step 1 to step 4 showed a trend of first decreasing and then increasing. Does this mean that the effect and role of step 2 and are not significant? Overall PerformanceStep1 Effect
Step2 Effect
Step3 Effect
Step4 Final Effect
As the results show, the overall effect of step2 has slightly decreased compared to step1, contrary to the improvement mentioned in the ablation experiments of the paper. What's more, compared to the overall score of 67.21 in the paper, the result I got using GPT was only 66.69, a difference of 0.5 points. I conducted three separate experiments, and the resulting scores were almost identical. Therefore, I have to question the effectiveness of step 2 and the authenticity of reproducing the paper's results. |
My experimental data is here for you to test. I'd like to know if you made any changes to my code when reproducing it. For example, the organizational form of PROMPT. If possible, you can use DeepSeek to reproduce it. DeepSeek is relatively cheap. |
It's worth noting that in different LLMs, It is possible that Step 2 may not always outperform Step 1. This is just the potential risk we refer to as "schema linking". However, Step 3 relies on both Step 1 and Step 2. The SQL generated in Step 2 greatly helps improve the performance of Step 3. The SQL generated in Steps 1 and 2 each has its own advantages, and a significant portion of their correct and incorrect cases do not overlap. Step 3 hedges the risks between the two. Step 2 is not yet perfect, and we are currently exploring ways to further enhance its robustness. Our other experiments (may not be included in the paper) show that within the RSL-SQL framework, performance improvements in either Step 1 or Step 2 will contribute to boosting the final performance. |
Hello author, I am very interested in your paper. While reading the paper and code, I found that the initial results of the paper are quite good. May I ask how you evaluated them? Is there any related code available?
Moreover, I have a question: Is the
preliminary SQL
file already the initial effect?As shown in README.md, executing
python src/step_1_preliminary_sql.py
will yield the initial results presented in the paper, correct? Therefore, by evaluatingpreliminary_sql.txt
, the initial results can be obtained. Is this correct?BTW. Which one is used for the initial effect: filter schema, full schema, or schema with augmentation?
Relly looking forward to your reply. Thank you.
The text was updated successfully, but these errors were encountered: