ReSRer Meta Prompt V2

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jan 30 16:21
Editor
Edited
Edited
2024 Feb 2 2:5
Refs
Refs

Meta prompt v2

###Instruction### Your primary goal is to refine the summarizer's method to maximize the Exact Match (EM) retention rate. This is crucial for ensuring the summary closely aligns with the original text's key elements, directly impacting the reader's ability to achieve a high EM score in their final answer. The F1 score is a secondary metric, serving as an additional indicator of the summarizer's performance. Keep these points in mind while tuning the summarizer's prompt:
  1. Prioritize Exact Match (EM) Retention: The most critical aspect is maintaining as much of the original text's exact match span as possible in the summary. Focus on capturing and preserving key terms and phrases that directly relate to the question.
  1. Efficient Information Retrieval (psgs_tokens): Control the retrieved token length count from the original passages. The aim is to extract only the most relevant information, avoiding unnecessary details that don't contribute to the EM.
  1. Concise Summary Tokens (summary_tokens): Ensure the summary is succinct, ideally shorter than the total psgs_tokens. While conciseness is key, don't sacrifice crucial details necessary for retaining the exact match.
  1. Use of F1 Score as a Supportive Metric: While the primary focus is on EM, also consider the F1 score as an indicator of how well the summary captures the essential information in a coherent manner.
  1. Leverage Historical Data for Improvement: Regularly analyze the performance of previous prompts, particularly focusing on their EM rates. Utilize insights from this analysis to make informed adjustments to the summarization strategy.
  1. Iterative Refinement: Continuously refine the summarization approach based on the latest EM performance data. This process should aim to incrementally enhance the exact match rate, ensuring that the summary remains relevant and effectively serves as a bridge for the reader to extract the final answer.
Remember, the ultimate aim is to create a summary that maximizes EM retention, thereby directly assisting the reader in accurately identifying the final answer.
###Reader prompt###
###Baseline###
topk-4 gpt
topk-8 gpt
topk-16 gpt
###Prompt v3###
v3 prompt's metric per scores are like these
topk 4 → summarized
topk 8→1
topk 16 → 1
prompt v3 improved total pipeline performance overall but it is not enough. Gap between f1 score and EM is large and information loss during summarization (ret_em to sum_em) is too large. However the gap is increasing as top-k increase so it means good trend.
###Prompt v4###
topk 10 → summarized
As an insight, bullet points works better for question answering. However this prompt is not optimized yet so the score is lower than expected
###Prompt v5###
topk 10 → summarized
Like version 4, structured text is expected to works good for question answering. However this prompt is not optimized yet too.
###Prompt v6###
topk 10 → summarized
It improved a lot than before but still lower than v3(even than topk8). The good news is sum_em is higher
###Prompt v7###
topk 10 → summarized
The score lower than v5. I’m curious about structured prompt like bullet point or numbering instruction is better than general text.
###Prompt v8###
Highlighting key element is nice approach but let us see how it works
topk 10 → summarized
Highlight was novel idea but didn’t worked better. sum_em is similar with v5 but final exact match score was bad that the format of v5 summary was better.
 
 
 
 

Recommendations