You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
A user may have an LLM pipeline with a prompt template that gets very large when the placeholders are filled in. We want to support dynamic resizing of prompts based on a token budget, by prioritizing certain elements within the prompt template. The question is: what do we select in the prompt template?
Objective:
To ensure that the generated prompt does not exceed a pre-determined or runtime-specified token budget by selectively including elements based on their importance and constraints.
Functional Requirements:
State Management:
Motion component state is a dictionary, where entries (key-value pair) corresponds to placeholders in a prompt template. For instance, in a recommender system Motion component, state might include user information (e.g., name, age, location), recent activities, interests, friends). Serve operations might consist of a call to an LLM with some prompt template, and update (background) operations may add information to the state (e.g., activities).
We should implement a special type of component, say an LLM component, that has utilities for the requirements below.
Token Budgeting and Optimization:
If the token count of a formatted prompt template exceeds the budget, apply a strategy to reduce tokens while maximizing content importance.
Importance is defined by user-defined constraints (see the "constraints management" section below) and log-probs of an LLM.
We break down the prompt into chunks, where each chunk is an entry, or if the entry is a list-type entry, a chunk is an item in the list. Each chunk has a cost (i.e., # tokens). Each chunk also has an importance (derived from log-probs of the chunk and user-defined importances).
Our optimizer should select chunks such that the sum of the importances of chunks selected is maximized, while the cost of the selected chunks falls under the token budget. More formally:
Let $S = {s_1, s_2, \ldots, s_n}$ represent the set of all chunks in the prompt template. Define $T(s_i)$ as the token count for the full inclusion of chunk $s_i$. Let $B$ denote the total token budget. Each chunk $s_i$ is assigned an importance $I(s_i)$, a combination of the log probs and user-defined constraints.
The optimization problem is to determine a set of inclusion indicators (i.e., 0 or 1) ${x_1, x_2, \ldots, x_n}$ for each state entry $s_i$ such that:
The total token count of the included fractions does not exceed budget $B$: $$\sum_{i=1}^{n} x_i \cdot T(s_i) \leq B $$
The sum of coverages, weighted by importance and inclusion fraction, is maximized: $$\text{Maximize} \quad \sum_{i=1}^{n} x_i \cdot I(s_i)$$
This is at least NP-hard, via a simple reduction to maximum coverage.
Constraints Management:
Users should be able to set constraints on entries, including:
"Must-Include": Certain entries are mandatory in the prompt (e.g., user name, location).
"List Direction": For list-type entries, provide options to prioritize newer or older items. Users can set a "direction" constraint: right means the newer items are prioritized, left means the older items are prioritized.
"Priority Assignment": Enable users to assign (positive) priorities for each state entry, e.g., user's recent activities is more important than friends' activities. A topological sort order is derived from the priorities, and entries without priorities are assumed to be priority 1.
Importance Scores
Now, we should determine how to compute $I(s_i)$, given both log-probs ($p_i$) and user-defined constraints. A low $p_i$ means $s_i$ is unexpected, so we may want to prioritize $s_i$? For list direction constraints, we can apply an exponential weighting scheme to derive weights for each list item (and normalize the weights within the list). More concretely, for the $j$th most important element in a list, its weight is $\exp(j * \lambda)$, where $\lambda$ is some constant that determines the rate of growth.
Let $E$ be the set of all entries (i.e., variables in the state/prompt template). Let $\phi: S \rightarrow E$ be the function that maps a chunk to its entry. The importance score $I(s_i)$ for each chunk $s_i$ is defined as:
$$I(s_i) =
\begin{cases}
\lambda_j(\phi(s_i)) \cdot \exp(-\log p_i) \cdot P(\phi(s_i)), & \text{if } s_i \text{ is part of a list-type entry} \\
\exp(-\log p_i) \cdot P(\phi(s_i)), & \text{otherwise}
\end{cases}$$
Where:
$\phi(s_i)$ maps the chunk $s_i$ to its corresponding entry $e_j$.
$\lambda_j(\phi(s_i))$ is the exponential weight for the chunk $s_i$ when it is part of a list. For the $j$th most important element in a list, it is defined as $\exp(j \cdot \lambda$, where $\lambda$ is a constant rate of growth.
$\exp(-\log p_i)$ inversely converts the log probability into a factor that increases the importance of unexpected chunks.
$P(\phi(s_i))$ is the user-assigned priority for the entry corresponding to chunk $s_i$, with a default value of 1 if not specified by the user.
Acceptance Criteria:
Write a general-purpose LLM component, which is a subclass of Component with utilities to set constraints and budgets.
During ops that involve calls to LLMs, motion dynamically adjusts the prompt content to respect the token budget, which can be set at runtime. Compute importances for each chunk.
The text was updated successfully, but these errors were encountered:
Summarizing long documents -> can get better summaries with smaller budget
Question-answering: use our method to shrink the retrieved context and hopefully get the same results.
If we really want to use this to implement a relational operator:
Sorting a massive list: in the state, maintain a list of items ordered by how unsure we are of their position in the sorted list. This list will represent the list of items for the LLM to sort. Each item starts with a score of infinity. When we get a sort order from the LLM, we set each elements score to 1 or 0 (0 if it's consistent with the previous ordering, otherwise 1). Consistency is defined by: if x is greater than y in the new order, then x must be greater than y in the previous order. We finish the sort when the sum of uncertainty scores < some threshold.
shreyashankar
changed the title
Implement Dynamic Prompt Resizing
Implement Dynamic Prompt Resizing for LLM Calls
Jan 13, 2024
Description:
A user may have an LLM pipeline with a prompt template that gets very large when the placeholders are filled in. We want to support dynamic resizing of prompts based on a token budget, by prioritizing certain elements within the prompt template. The question is: what do we select in the prompt template?
Objective:
To ensure that the generated prompt does not exceed a pre-determined or runtime-specified token budget by selectively including elements based on their importance and constraints.
Functional Requirements:
State Management:
LLM
component, that has utilities for the requirements below.Token Budgeting and Optimization:
Let$S = {s_1, s_2, \ldots, s_n}$ represent the set of all chunks in the prompt template. Define $T(s_i)$ as the token count for the full inclusion of chunk $s_i$ . Let $B$ denote the total token budget. Each chunk $s_i$ is assigned an importance $I(s_i)$ , a combination of the log probs and user-defined constraints.
The optimization problem is to determine a set of inclusion indicators (i.e., 0 or 1)${x_1, x_2, \ldots, x_n}$ for each state entry $s_i$ such that:
The total token count of the included fractions does not exceed budget$B$ :
$$\sum_{i=1}^{n} x_i \cdot T(s_i) \leq B $$
The sum of coverages, weighted by importance and inclusion fraction, is maximized:
$$\text{Maximize} \quad \sum_{i=1}^{n} x_i \cdot I(s_i)$$
This is at least NP-hard, via a simple reduction to maximum coverage.
Constraints Management:
Importance Scores
Now, we should determine how to compute$I(s_i)$ , given both log-probs ($p_i$ ) and user-defined constraints. A low $p_i$ means $s_i$ is unexpected, so we may want to prioritize $s_i$ ? For list direction constraints, we can apply an exponential weighting scheme to derive weights for each list item (and normalize the weights within the list). More concretely, for the $j$th most important element in a list, its weight is $\exp(j * \lambda)$ , where $\lambda$ is some constant that determines the rate of growth.
Let$E$ be the set of all entries (i.e., variables in the state/prompt template). Let $\phi: S \rightarrow E$ be the function that maps a chunk to its entry. The importance score $I(s_i)$ for each chunk $s_i$ is defined as:
Where:
Acceptance Criteria:
LLM
component, which is a subclass ofComponent
with utilities to set constraints and budgets.The text was updated successfully, but these errors were encountered: