Skip to content

Commit

Permalink
[8.13] [Obs AI Assistant] Update evaluation framework (elastic#176914) (
Browse files Browse the repository at this point in the history
elastic#177441)

# Backport

This will backport the following commits from `main` to `8.13`:
- [[Obs AI Assistant] Update evaluation framework
(elastic#176914)](elastic#176914)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dario
Gieselaar","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-02-21T13:32:42Z","message":"[Obs
AI Assistant] Update evaluation framework (elastic#176914)\n\nThe following
changes were made to the evaluation framework:\r\n- Adds support for
screen context in the evaluation framework\r\n- Remove the checks
against the LLM using mathematical operations in\r\nSTATS aggregations.
This is now supported by ES|QL.\r\n- Allow specifying a different
connector for evaluation (e.g. run with\r\nClaude, evaluate with
GPT-4)\r\n\r\nThe `query` functions were improved as well:\r\n- For the
`visualize_query` function, store `userOverrides` in the\r\nfunction
response as data, rather than in the function
request.\r\n`userOverrides` is a big chunk of data and I think it makes
more sense\r\nto hide it from the LLM (ideally we'd just have the
changed property but\r\nthat's probably hard).\r\n- Use `execute_query`
instead of `visualize_query` if the user just\r\nwants to see results,
and not visualize the data.\r\n- Add the ES|QL instructions as a user
message, instead of a system\r\nmessage, to get the LLM to pay more
attention to it in relation to the\r\nuser message.\r\n- Make sure
`query` is also used for converting queries.\r\n- Fix a bug that
occurred when multiple visualizations were available in\r\nthe
conversation, editing any always resulted in the first
visualization\r\nbeing updated.\r\n- Store `columns` in `data` rather
than `content` to prevent it being\r\nsent over to the LLM.\r\n\r\nOne
bugfix in the Bedrock/Claude adapter:\r\n- Catch, parse and throw errors
in Bedrock stream (which come through as\r\nan object, not an
error)\r\n\r\nSome APM changes:\r\n- Remove the APM-specific addition to
the system message to have more\r\nconsistent performance. (I've not
seen evidence of a degradation in\r\nperformance when calling
APM-specific functions but would like a second\r\nopinion).\r\n- Add
ES|QL queries to screen context for APM. This allows the Assistant\r\nto
generate e.g. breakdowns of data that is on the page.\r\n- Add a
`variance` scenario that generates data with some
variations\r\naccording to a seasonal pattern. This is to get more
realistic charts.\r\n\r\n---------\r\n\r\nCo-authored-by: almudenasanz
<[email protected]>\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"ebb2c9d083bf2fe80923ca4fb191d4bf61e9b1eb","branchLabelMapping":{"^v8.14.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:obs-ux-infra_services","v8.13.0","v8.14.0"],"title":"[Obs
AI Assistant] Update evaluation
framework","number":176914,"url":"https://github.com/elastic/kibana/pull/176914","mergeCommit":{"message":"[Obs
AI Assistant] Update evaluation framework (elastic#176914)\n\nThe following
changes were made to the evaluation framework:\r\n- Adds support for
screen context in the evaluation framework\r\n- Remove the checks
against the LLM using mathematical operations in\r\nSTATS aggregations.
This is now supported by ES|QL.\r\n- Allow specifying a different
connector for evaluation (e.g. run with\r\nClaude, evaluate with
GPT-4)\r\n\r\nThe `query` functions were improved as well:\r\n- For the
`visualize_query` function, store `userOverrides` in the\r\nfunction
response as data, rather than in the function
request.\r\n`userOverrides` is a big chunk of data and I think it makes
more sense\r\nto hide it from the LLM (ideally we'd just have the
changed property but\r\nthat's probably hard).\r\n- Use `execute_query`
instead of `visualize_query` if the user just\r\nwants to see results,
and not visualize the data.\r\n- Add the ES|QL instructions as a user
message, instead of a system\r\nmessage, to get the LLM to pay more
attention to it in relation to the\r\nuser message.\r\n- Make sure
`query` is also used for converting queries.\r\n- Fix a bug that
occurred when multiple visualizations were available in\r\nthe
conversation, editing any always resulted in the first
visualization\r\nbeing updated.\r\n- Store `columns` in `data` rather
than `content` to prevent it being\r\nsent over to the LLM.\r\n\r\nOne
bugfix in the Bedrock/Claude adapter:\r\n- Catch, parse and throw errors
in Bedrock stream (which come through as\r\nan object, not an
error)\r\n\r\nSome APM changes:\r\n- Remove the APM-specific addition to
the system message to have more\r\nconsistent performance. (I've not
seen evidence of a degradation in\r\nperformance when calling
APM-specific functions but would like a second\r\nopinion).\r\n- Add
ES|QL queries to screen context for APM. This allows the Assistant\r\nto
generate e.g. breakdowns of data that is on the page.\r\n- Add a
`variance` scenario that generates data with some
variations\r\naccording to a seasonal pattern. This is to get more
realistic charts.\r\n\r\n---------\r\n\r\nCo-authored-by: almudenasanz
<[email protected]>\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"ebb2c9d083bf2fe80923ca4fb191d4bf61e9b1eb"}},"sourceBranch":"main","suggestedTargetBranches":["8.13"],"targetPullRequestStates":[{"branch":"8.13","label":"v8.13.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.14.0","branchLabelMappingKey":"^v8.14.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/176914","number":176914,"mergeCommit":{"message":"[Obs
AI Assistant] Update evaluation framework (elastic#176914)\n\nThe following
changes were made to the evaluation framework:\r\n- Adds support for
screen context in the evaluation framework\r\n- Remove the checks
against the LLM using mathematical operations in\r\nSTATS aggregations.
This is now supported by ES|QL.\r\n- Allow specifying a different
connector for evaluation (e.g. run with\r\nClaude, evaluate with
GPT-4)\r\n\r\nThe `query` functions were improved as well:\r\n- For the
`visualize_query` function, store `userOverrides` in the\r\nfunction
response as data, rather than in the function
request.\r\n`userOverrides` is a big chunk of data and I think it makes
more sense\r\nto hide it from the LLM (ideally we'd just have the
changed property but\r\nthat's probably hard).\r\n- Use `execute_query`
instead of `visualize_query` if the user just\r\nwants to see results,
and not visualize the data.\r\n- Add the ES|QL instructions as a user
message, instead of a system\r\nmessage, to get the LLM to pay more
attention to it in relation to the\r\nuser message.\r\n- Make sure
`query` is also used for converting queries.\r\n- Fix a bug that
occurred when multiple visualizations were available in\r\nthe
conversation, editing any always resulted in the first
visualization\r\nbeing updated.\r\n- Store `columns` in `data` rather
than `content` to prevent it being\r\nsent over to the LLM.\r\n\r\nOne
bugfix in the Bedrock/Claude adapter:\r\n- Catch, parse and throw errors
in Bedrock stream (which come through as\r\nan object, not an
error)\r\n\r\nSome APM changes:\r\n- Remove the APM-specific addition to
the system message to have more\r\nconsistent performance. (I've not
seen evidence of a degradation in\r\nperformance when calling
APM-specific functions but would like a second\r\nopinion).\r\n- Add
ES|QL queries to screen context for APM. This allows the Assistant\r\nto
generate e.g. breakdowns of data that is on the page.\r\n- Add a
`variance` scenario that generates data with some
variations\r\naccording to a seasonal pattern. This is to get more
realistic charts.\r\n\r\n---------\r\n\r\nCo-authored-by: almudenasanz
<[email protected]>\r\nCo-authored-by: Kibana Machine
<[email protected]>","sha":"ebb2c9d083bf2fe80923ca4fb191d4bf61e9b1eb"}}]}]
BACKPORT-->

Co-authored-by: Dario Gieselaar <[email protected]>
  • Loading branch information
kibanamachine and dgieselaar authored Feb 21, 2024
1 parent 143f5cf commit e2fc5da
Show file tree
Hide file tree
Showing 30 changed files with 770 additions and 230 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

export function timeBasedPattern({
min,
max,
cycle,
peak,
}: {
min: number;
max: number;
cycle: number;
peak: number;
}) {
return function (timestamp: number) {
// Calculate the midpoint to determine the base level of the pattern
const baseLevel = (max + min) / 2;

// Adjust amplitude based on min and max
const adjustedAmplitude = (max - min) / 2;

// Calculate the current position in the cycle
const cyclePosition = (timestamp % cycle) / cycle;

// Determine the phase shift to align peak times with the specified peak
const phaseShift = peak * 2 * Math.PI - Math.PI / 2; // Subtract π/2 to make the cosine function start at its maximum

// Calculate the value using a cosine function to create a smooth wave pattern
const value =
baseLevel + adjustedAmplitude * Math.cos(cyclePosition * 2 * Math.PI - phaseShift);

// Ensure the value is within the specified range
return Math.max(min, Math.min(value, max));
};
}
66 changes: 66 additions & 0 deletions packages/kbn-apm-synthtrace/src/scenarios/variance.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

import { apm, ApmFields, Instance } from '@kbn/apm-synthtrace-client';
import { Scenario } from '../cli/scenario';
import { RunOptions } from '../cli/utils/parse_run_cli_flags';
import { withClient } from '../lib/utils/with_client';
import { timeBasedPattern } from './helpers/time_based_pattern';

const scenario: Scenario<ApmFields> = async (runOptions: RunOptions) => {
return {
generate: ({ range, clients: { apmEsClient } }) => {
const throughputPattern = timeBasedPattern({
min: 1,
max: 10,
peak: 0.7,
cycle: 24 * 60 * 60 * 1000,
});

const durationPattern = timeBasedPattern({
min: 10,
max: 410,
peak: 0.7,
cycle: 24 * 60 * 60 * 1000,
});

const service = apm.service('myService', 'production', 'go');
const instanceA = service.instance('a');
const instanceB = service.instance('b');

function generateTrace(instance: Instance, duration: number, timestamp: number) {
return instance
.transaction('GET /api')
.duration(duration)
.timestamp(timestamp)
.outcome('success');
}

return withClient(
apmEsClient,
range
.interval('1m')
.rate(1)
.generator((timestamp) => {
const throughput = Math.floor(throughputPattern(timestamp));

const traces = new Array(throughput).fill(undefined).flatMap((_, index) => {
return [
generateTrace(instanceA, durationPattern(timestamp), timestamp),
generateTrace(instanceB, durationPattern(timestamp) * 1.25, timestamp),
];
});

return traces;
})
);
},
};
};

export default scenario;
18 changes: 18 additions & 0 deletions x-pack/plugins/apm/common/utils/esql/get_esql_date_range_filter.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import moment from 'moment';
import { string } from '.';

export function getEsqlDateRangeFilter(
from: number | string,
to: number | string
) {
return `@timestamp >= ${string`${moment(
from
).toISOString()}`} AND @timestamp < ${string`${moment(to).toISOString()}`}`;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import { identifier, string } from '.';
import {
ENVIRONMENT_ALL,
ENVIRONMENT_NOT_DEFINED,
} from '../../environment_filter_values';
import { SERVICE_ENVIRONMENT } from '../../es_fields/apm';

export function getEsqlEnvironmentFilter(environment: string) {
if (environment === ENVIRONMENT_ALL.value) {
return '';
}
if (environment === ENVIRONMENT_NOT_DEFINED.value) {
return `${identifier`${SERVICE_ENVIRONMENT}`} IS NULL`;
}

return `${identifier`${SERVICE_ENVIRONMENT}`} == ${string`${environment}`}`;
}
25 changes: 25 additions & 0 deletions x-pack/plugins/apm/common/utils/esql/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

export function string(
parts: TemplateStringsArray,
...variables: Array<string | number>
): string {
const joined = Array.from(parts.raw).concat(variables.map(String)).join('');
return `"${joined.replaceAll(/[^\\]"/g, '\\"')}"`;
}

export function identifier(
parts: TemplateStringsArray,
...variables: Array<string | number>
): string {
const joined = Array.from(parts.raw).concat(variables.map(String)).join('');

const escaped = `\`${joined.replaceAll(/[^\\]`/g, '\\`')}\``;

return escaped;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import { Environment } from '../../../../common/environment_rt';
import {
PROCESSOR_EVENT,
SERVICE_NAME,
TRANSACTION_NAME,
TRANSACTION_TYPE,
} from '../../../../common/es_fields/apm';
import { string } from '../../../../common/utils/esql';
import { getEsqlDateRangeFilter } from '../../../../common/utils/esql/get_esql_date_range_filter';
import { getEsqlEnvironmentFilter } from '../../../../common/utils/esql/get_esql_environment_filter';

export function getThroughputScreenContext({
serviceName,
transactionName,
transactionType,
environment,
start,
end,
preferred,
}: {
serviceName?: string;
transactionName?: string;
transactionType?: string;
environment?: Environment;
start: string;
end: string;
preferred: {
bucketSizeInSeconds: number;
} | null;
}) {
const clauses = [
`${PROCESSOR_EVENT} == "transaction"`,
getEsqlDateRangeFilter(start, end),
serviceName ? `${SERVICE_NAME} == ${string`${serviceName}`}` : '',
transactionName
? `${TRANSACTION_NAME} == ${string`${transactionName}`}`
: '',
transactionType
? `${TRANSACTION_TYPE} == ${string`${transactionType}`}`
: '',
environment ? getEsqlEnvironmentFilter(environment) : '',
].filter(Boolean);

return {
screenDescription: `There is a throughput chart displayed. The ES|QL equivalent for this is:
\`\`\`esql
FROM traces-apm*
| WHERE ${clauses.join(' AND ')}
${
preferred
? `| EVAL date_bucket = DATE_TRUNC(${preferred?.bucketSizeInSeconds} seconds, @timestamp)`
: ''
}
| STATS count = COUNT(*)${preferred ? ` BY date_bucket` : ''}
\`\`\`
`,
};
}
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import {
EuiFlexGroup,
} from '@elastic/eui';
import { i18n } from '@kbn/i18n';
import React from 'react';
import React, { useEffect } from 'react';
import { usePreviousPeriodLabel } from '../../../hooks/use_previous_period_text';
import { isTimeComparison } from '../../shared/time_comparison/get_comparison_options';
import { AnomalyDetectorType } from '../../../../common/anomaly_detection/apm_ml_detectors';
Expand All @@ -32,6 +32,8 @@ import {
} from '../../shared/charts/helper/get_timeseries_color';
import { usePreferredDataSourceAndBucketSize } from '../../../hooks/use_preferred_data_source_and_bucket_size';
import { ApmDocumentType } from '../../../../common/document_type';
import { useApmPluginContext } from '../../../context/apm_plugin/use_apm_plugin_context';
import { getThroughputScreenContext } from './get_throughput_screen_context';

const INITIAL_STATE = {
currentPeriod: [],
Expand Down Expand Up @@ -152,6 +154,32 @@ export function ServiceOverviewThroughputChart({
: []),
];

const { setScreenContext } =
useApmPluginContext().observabilityAIAssistant.service;

useEffect(() => {
return setScreenContext(
getThroughputScreenContext({
serviceName,
transactionName,
transactionType,
environment,
preferred,
start,
end,
})
);
}, [
serviceName,
transactionName,
transactionType,
environment,
setScreenContext,
preferred,
start,
end,
]);

return (
<EuiPanel hasBorder={true}>
<EuiFlexGroup alignItems="center" gutterSize="s" responsive={false}>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import type { Environment } from '../../../../../common/environment_rt';
import {
PROCESSOR_EVENT,
SERVICE_NAME,
TRANSACTION_NAME,
TRANSACTION_TYPE,
} from '../../../../../common/es_fields/apm';
import { string } from '../../../../../common/utils/esql';
import { getEsqlDateRangeFilter } from '../../../../../common/utils/esql/get_esql_date_range_filter';
import { getEsqlEnvironmentFilter } from '../../../../../common/utils/esql/get_esql_environment_filter';

export function getLatencyChartScreenContext({
serviceName,
transactionName,
transactionType,
environment,
start,
end,
bucketSizeInSeconds,
}: {
serviceName?: string;
transactionName?: string;
transactionType?: string;
environment?: Environment;
start: string;
end: string;
bucketSizeInSeconds?: number;
}) {
const clauses = [
`${PROCESSOR_EVENT} == "transaction"`,
getEsqlDateRangeFilter(start, end),
serviceName ? `${SERVICE_NAME} == ${string`${serviceName}`}` : '',
transactionName
? `${TRANSACTION_NAME} == ${string`${transactionName}`}`
: '',
transactionType
? `${TRANSACTION_TYPE} == ${string`${transactionType}`}`
: '',
environment ? getEsqlEnvironmentFilter(environment) : '',
].filter(Boolean);

return {
screenDescription: `There is a latency chart displayed. The ES|QL equivalent for this is:
\`\`\`esql
FROM traces-apm*
| WHERE ${clauses.join(' AND ')}
${
bucketSizeInSeconds !== undefined
? `| EVAL date_bucket = DATE_TRUNC(${bucketSizeInSeconds} seconds, @timestamp)`
: ''
}
| STATS avg_duration = AVG(transaction.duration.us)${
bucketSizeInSeconds !== undefined ? ` BY date_bucket` : ''
}
\`\`\`
`,
};
}
Loading

0 comments on commit e2fc5da

Please sign in to comment.