Skip to content
This repository was archived by the owner on Mar 29, 2025. It is now read-only.

Commit 1759fe3

Browse files
authored
Merge branch 'release/v24.0' into main
2 parents 77de360 + 2302bc7 commit 1759fe3

File tree

16 files changed

+237
-15
lines changed

16 files changed

+237
-15
lines changed

.github/styles/Vocab/Dgraph/accept.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ rebalancing
130130
unary
131131
loopback
132132
snake_case
133+
semver
133134

134135
Leia
135136
Skywalker

LICENSE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Dgraph Licensing
22

3-
Copyright 2016-2021 Dgraph Labs, Inc.
3+
Copyright 2016-2024 Dgraph Labs, Inc.
44

55
Source code in this repository is variously licensed under the Apache Public
66
License 2.0 (APL) and the Dgraph Community License (DCL). A copy of each license

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@ Making our documentation easy to understand includes optimizing it for client-si
3838
Use hugo shortcode for relref.
3939

4040
Example, to reference a term, use a relref to the glossary :
41+
```
4142
> [entity]({{< relref "dgraph-glossary.md#entity" >}})
43+
```
4244

4345
### Staging doc updates locally
4446

content/deploy/cli-command-reference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ The `--badger` superflag allows you to set many advanced [Badger options](https:
128128
| `--query_edge_limit` | uint64 | `query-edge` | uint64 |`alpha`| Maximum number of edges that can be returned in a query |
129129
| `--normalize_node_limit` | int | `normalize-node` | int |`alpha`| Maximum number of nodes that can be returned in a query that uses the normalize directive |
130130
| `--mutations_nquad_limit` | int | `mutations-nquad` | int |`alpha`| Maximum number of nquads that can be inserted in a mutation request |
131+
| `--max-pending-queries` | int | `max-pending-queries` | int |`alpha`| Maximum number of concurrently processing requests allowed before requests are rejected with 429 Too Many Requests |
131132

132133
### Raft superflag
133134

content/design-concepts/replication-concept.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ weight = 85
99
Each Highly-Available (HA) group will be served by at least 3 instances (or two if one is temporarily unavailable). In the case of an alpha instance
1010
failure, other alpha instances in the same group still handle the load for data in that group. In case of a zero instance failure, the remaining two zeros in the zero group will continue to hand out timestamps and perform other zero functions.
1111

12-
In addition, Dgraph `Learner Nodes` are alpha instances that hold replicas of data, but this replication is to suupport read replicas, often in a different geography from the master cluster. This replication is implemented the same way as HA replication, but the learner nodes do not participate in quorum, and do not take over from failed nodes to provide high availability.
12+
In addition, Dgraph `Learner Nodes` are alpha instances that hold replicas of data, but this replication is to support read replicas, often in a different geography from the master cluster. This replication is implemented the same way as HA replication, but the learner nodes do not participate in quorum, and do not take over from failed nodes to provide high availability.

content/dql/dql-schema.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ revenue: float .
1616
running_time: int .
1717
starring: [uid] .
1818
director: [uid] .
19+
description: string .
20+
21+
description_vector: float32vector @index(hnsw(metric:"cosine")) .
1922
2023
type Person {
2124
name
@@ -28,6 +31,8 @@ type Film {
2831
running_time
2932
starring
3033
director
34+
description
35+
description_vector
3136
}
3237
```
3338

@@ -112,6 +117,15 @@ For all triples with a predicate of scalar types the object is a literal.
112117
are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should
113118
convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}}
114119

120+
### Vector Type
121+
122+
The `float32vector` type denotes a vector of floating point numbers, i.e an ordered array of float32. A node type can contain more than one vector predicate.
123+
124+
Vectors are normaly used to store embeddings obtained from other information through an ML model. When a `float32vector` is [indexed]({{<relref "dql/predicate-indexing.md">}}), the DQL [similar_to]({{<relref "query-language/functions#vector-similarity-search">}}) function can be used for similarity search.
125+
126+
127+
128+
115129
### UID Type
116130

117131
The `uid` type denotes a relationship; internally each node is identified by it's UID which is a `uint64`.

content/dql/predicate-indexing.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,15 @@ weight = 4
99

1010
Filtering on a predicate by applying a [function]({{< relref "query-language/functions.md" >}}) requires an index.
1111

12+
Indices are defined in the [Dgraph types schema]({{<relref "dql/dql-schema.md" >}}) using `@index` directive.
13+
14+
Here are some examples:
15+
```
16+
name: string @index(term) .
17+
release_date: datetime @index(year) .
18+
description_vector: float32vector @index(hnsw(metric:"cosine")) .
19+
```
20+
1221
When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient.
1322

1423
All scalar types can be indexed.
@@ -17,6 +26,8 @@ Types `int`, `float`, `bool` and `geo` have only a default index each: with toke
1726

1827
Types `string` and `dateTime` have a number of indices.
1928

29+
Type `float32vector` supports `hnsw` index.
30+
2031
## String Indices
2132
The indices available for strings are as follows.
2233

@@ -34,6 +45,30 @@ transaction conflict rate. Use only the minimum number of and simplest indexes
3445
that your application needs.
3546
{{% /notice %}}
3647

48+
## Vector Indices
49+
50+
The indices available for `float32vector` are as follows.
51+
52+
| Dgraph function | Required index / tokenizer | Notes |
53+
| :----------------------- | :------------ | :--- |
54+
| `similar_to` | `hnsw` | HNSW index supports parameters `metric` and `exponent`. |
55+
56+
57+
#
58+
59+
`hnsw` (**Hierarchical Navigable Small World**) index supports the following parameters
60+
- metric : indicate the metric to use to compute vector similarity. One of `cosine`, `euclidean`, and `dotproduct`. Default is `euclidean`.
61+
62+
- exponent : An integer, represented as a string, roughly representing the number of vectors expected in the index in power of 10. The exponent value,is used to set "reasonable defaults" for HNSW internal tuning parameters. Default is "4" (10^4 vectors).
63+
64+
65+
Here are some examples:
66+
```
67+
simple_vector: float32vector @index(hnsw) .
68+
description_vector: float32vector @index(hnsw(metric:"cosine")) .
69+
large_vector: float32vector @index(hnsw(metric:"euclidean",exponent:"6")) .
70+
```
71+
3772
## DateTime Indices
3873

3974
The indices available for `dateTime` are as follows.

content/graphql/mutations/mutations-overview.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,35 @@ mutation {
221221
}
222222
```
223223

224+
## Vector Embedding mutations
225+
226+
For types with vector embeddings Dgraph automatically generates the add mutation. For this example of add mutation we use the following schema.
227+
228+
```graphql
229+
type User {
230+
userID: ID!
231+
name: String!
232+
name_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
233+
}
234+
235+
mutation {
236+
addUser(input: [
237+
{ name: "iCreate with a Mini iPad", name_v: [0.12, 0.53, 0.9, 0.11, 0.32] },
238+
{ name: "Resistive Touchscreen", name_v: [0.72, 0.89, 0.54, 0.15, 0.26] },
239+
{ name: "Fitness Band", name_v: [0.56, 0.91, 0.93, 0.71, 0.24] },
240+
{ name: "Smart Ring", name_v: [0.38, 0.62, 0.99, 0.44, 0.25] }])
241+
{
242+
project {
243+
id
244+
name
245+
name_v
246+
}
247+
}
248+
}
249+
```
250+
251+
Note: The embeddings are generated outside of Dgraph using any suitable machine learning model.
252+
224253
## Examples
225254

226255
You can refer to the following [link](https://github.com/dgraph-io/dgraph/tree/main/graphql/schema/testdata/schemagen) for more examples.

content/graphql/queries/aggregate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
+++
22
title = "Aggregate Queries"
33
description = "Dgraph automatically generates aggregate queries for GraphQL schemas. These are compatible with the @auth directive."
4-
weight = 3
4+
weight = 4
55
[menu.main]
66
parent = "graphql-queries"
77
name = "Aggregate Queries"
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
+++
2+
title = "Similarity Search"
3+
description = "Dgraph automatically generates GraphQL queries for each vector index that you define in your schema. There are two types of queries generated for each index."
4+
weight = 3
5+
[menu.main]
6+
parent = "graphql-queries"
7+
identifier = "vector-queries"
8+
+++
9+
10+
Dgraph automatically generates two GraphQL similarity queries for each type that have at least one [vector predicate](/graphql/schema/types/#vectors) with `@search` directive.
11+
12+
For example
13+
14+
```graphql
15+
type User {
16+
id: ID!
17+
name: String!
18+
name_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
19+
}
20+
```
21+
22+
With the above schema, the auto-generated `querySimilar<Object>ByEmbedding` query allows us to run similarity search using the vector index specified in our schema.
23+
24+
```graphql
25+
getSimilar<Object>ByEmbedding(
26+
by: vector_predicate,
27+
topK: n,
28+
vector: searchVector): [User]
29+
```
30+
31+
For example in order to find top 3 users with names similar to a given user name embedding the following query function can be used.
32+
33+
```graphql
34+
querySimilarUserByEmbedding(by: name_v, topK: 3, vector: [0.1, 0.2, 0.3, 0.4, 0.5]) {
35+
id
36+
name
37+
vector_distance
38+
}
39+
```
40+
The results obtained for this query includes the 3 closest Users ordered by vector_distance. The vector_distance is the Euclidean distance between the name_v embedding vector and the input vector used in our query.
41+
42+
Note: you can omit vector_distance predicate in the query, the result will still be ordered by vector_distance.
43+
44+
The distance metric used is specified in the index creation.
45+
46+
Similarly, the auto-generated `querySimilar<Object>ById` query allows us to search for similar objects to an existing object, given it’s Id. using the function.
47+
48+
```graphql
49+
getSimilar<Object>ById(
50+
by: vector_predicate,
51+
topK: n,
52+
id: userID): [User]
53+
```
54+
55+
For example the following query searches for top 3 users whose names are most similar to the name of the user with id "0xef7".
56+
57+
```graphql
58+
querySimilarUserById(by: name_v, topK: 3, id: "0xef7") {
59+
id
60+
name
61+
vector_distance
62+
}
63+
```
64+

0 commit comments

Comments
 (0)