Skip to content

Commit 70b8737

Browse files
committed
feat: MVA and Float Vector UDF support
1 parent e002a4c commit 70b8737

File tree

9 files changed

+374
-7
lines changed

9 files changed

+374
-7
lines changed

manual/chinese/Extensions/UDFs_and_Plugins/UDFs_and_Plugins.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,21 @@ SELECT id, attr1, myudf (attr2, attr3+attr4) ...
99
您可以动态加载和卸载 `searchd` 中的 UDF,而无需重启服务器,并且可以在搜索、排序等操作时在表达式中使用它们。UDF 功能的简要总结如下:
1010

1111
* UDF 可以接受整型(32 位和 64 位)、浮点型、字符串、多值属性(MVA)或 `PACKEDFACTORS()` 参数。
12-
* UDF 可以返回整型、浮点型或字符串值
12+
* UDF可以返回整数、浮点数、字符串或MVA值(MULTI、MULTI64、FLOAT_VECTOR)
1313
* UDF 可以在查询设置阶段检查参数的数量、类型和名称,并且可以抛出错误。
1414

1515
我们还不支持聚合函数。换句话说,您的 UDF 将一次只为单个文档调用,且应返回该文档的某个值。目前还无法编写能够计算整个共享相同 GROUP BY 键的文档组上的聚合值(如 AVG())的函数。不过,您可以在内置聚合函数中使用 UDF:即使不支持 MYCUSTOMAVG(),使用 AVG(MYCUSTOMFUNC()) 应该可以正常工作!
1616

17+
## MVA返回类型
18+
19+
除了标量值外,UDF还可以返回多值属性(MVA)。支持的MVA返回类型包括:
20+
21+
* **MULTI**:32位无符号整数数组
22+
* **MULTI64**:64位有符号整数数组
23+
* **FLOAT_VECTOR**:浮点数数组
24+
25+
MVA UDF使用相同的`CREATE FUNCTION`语法创建,并指定适当的返回类型,可以像标量UDF一样在SELECT语句中使用。
26+
1727
UDF 具有广泛的应用,包括:
1828

1929
* 集成自定义的数学或字符串函数;

manual/english/Extensions/UDFs_and_Plugins/UDFs_and_Plugins.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,21 @@ SELECT id, attr1, myudf (attr2, attr3+attr4) ...
99
You can dynamically load and unload UDFs into `searchd` without having to restart the server, and use them in expressions when searching, ranking, etc. A quick summary of the UDF features is as follows:
1010

1111
* UDFs can take integer (both 32-bit and 64-bit), float, string, MVA, or `PACKEDFACTORS()` arguments.
12-
* UDFs can return integer, float, or string values.
12+
* UDFs can return integer, float, string, or MVA values (MULTI, MULTI64, FLOAT_VECTOR).
1313
* UDFs can check the argument number, types, and names during the query setup phase, and raise errors.
1414

1515
We do not yet support aggregation functions. In other words, your UDFs will be called for just a single document at a time and are expected to return some value for that document. Writing a function that can compute an aggregate value like AVG() over the entire group of documents that share the same GROUP BY key is not yet possible. However, you can use UDFs within the built-in aggregate functions: that is, even though MYCUSTOMAVG() is not supported yet, AVG(MYCUSTOMFUNC()) should work just fine!
1616

17+
## MVA Return Types
18+
19+
UDFs can also return Multi-Value Attributes (MVA) in addition to scalar values. The supported MVA return types are:
20+
21+
* **MULTI**: Arrays of 32-bit unsigned integers
22+
* **MULTI64**: Arrays of 64-bit signed integers
23+
* **FLOAT_VECTOR**: Arrays of floating-point numbers
24+
25+
MVA UDFs are created using the same `CREATE FUNCTION` syntax with the appropriate return type, and can be used in SELECT statements just like scalar UDFs.
26+
1727
UDFs offer a wide range of applications, such as:
1828

1929
* incorporating custom mathematical or string functions;

manual/russian/Extensions/UDFs_and_Plugins/UDFs_and_Plugins.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,21 @@ SELECT id, attr1, myudf (attr2, attr3+attr4) ...
99
Вы можете динамически загружать и выгружать UDF в `searchd` без необходимости перезапуска сервера, и использовать их в выражениях при поиске, ранжировании и т.д. Краткое описание возможностей UDF:
1010

1111
* UDF могут принимать аргументы типа integer (как 32-битные, так и 64-битные), float, string, MVA или `PACKEDFACTORS()`.
12-
* UDF могут возвращать значения типа integer, float или string.
12+
* UDF могут возвращать значения целочисленного, вещественного, строкового или MVA типов (MULTI, MULTI64, FLOAT_VECTOR).
1313
* UDF могут проверять номер, типы и имена аргументов во время стадии настройки запроса и генерировать ошибки.
1414

1515
В настоящее время агрегатные функции не поддерживаются. Другими словами, ваши UDF вызываются только для одного документа за раз и должны возвращать некоторое значение для этого документа. Написать функцию, которая может вычислить агрегатное значение, например AVG() для всей группы документов, имеющих одинаковый ключ GROUP BY, пока невозможно. Однако вы можете использовать UDF внутри встроенных агрегатных функций: другими словами, хотя MYCUSTOMAVG() пока не поддерживается, AVG(MYCUSTOMFUNC()) должна работать без проблем!
1616

17+
## Типы возвращаемых MVA
18+
19+
UDF также могут возвращать многозначные атрибуты (MVA) в дополнение к скалярным значениям. Поддерживаемые типы возвращаемых MVA:
20+
21+
* **MULTI**: Массивы 32-битных беззнаковых целых чисел
22+
* **MULTI64**: Массивы 64-битных знаковых целых чисел
23+
* **FLOAT_VECTOR**: Массивы чисел с плавающей запятой
24+
25+
MVA UDF создаются с использованием того же синтаксиса `CREATE FUNCTION` с соответствующим типом возвращаемого значения и могут использоваться в предложениях SELECT так же, как и скалярные UDF.
26+
1727
UDF предоставляют широкий спектр применений, таких как:
1828

1929
* внедрение собственных математических или строковых функций;

src/ddl.y

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -506,6 +506,9 @@ udf_type:
506506
| TOK_FLOAT { $$.SetValueInt ( SPH_ATTR_FLOAT ); }
507507
| TOK_STRING { $$.SetValueInt ( SPH_ATTR_STRINGPTR ); }
508508
| TOK_INTEGER { $$.SetValueInt ( SPH_ATTR_INTEGER ); }
509+
| TOK_MULTI { $$.SetValueInt ( SPH_ATTR_UINT32SET_PTR ); }
510+
| TOK_MULTI64 { $$.SetValueInt ( SPH_ATTR_INT64SET_PTR ); }
511+
| TOK_FLOAT_VECTOR { $$.SetValueInt ( SPH_ATTR_FLOAT_VECTOR_PTR ); }
509512
;
510513

511514
drop_function:

src/querycontext.h

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,21 @@ FORCE_INLINE void CalcContextItem ( CSphMatch & tMatch, const ContextCalcItem_t
5151
case SPH_ATTR_INT64SET_PTR:
5252
case SPH_ATTR_UINT32SET_PTR:
5353
case SPH_ATTR_FLOAT_VECTOR_PTR:
54-
tMatch.SetAttr ( tCalc.m_tLoc, (SphAttr_t)tCalc.m_pExpr->Int64Eval ( tMatch ) );
54+
{
55+
// Check if this is a UDF expression that returns raw MVA data
56+
// UDF expressions need MvaEval() + packing, while regular expressions use Int64Eval()
57+
if ( tCalc.m_pExpr->IsDataPtrAttr() )
58+
{
59+
ByteBlob_t tMva = tCalc.m_pExpr->MvaEval ( tMatch );
60+
BYTE * pPacked = sphPackPtrAttr ( tMva );
61+
tMatch.SetAttr ( tCalc.m_tLoc, (SphAttr_t)pPacked );
62+
}
63+
else
64+
{
65+
// Regular MVA expressions (including columnar) return packed data directly
66+
tMatch.SetAttr ( tCalc.m_tLoc, (SphAttr_t)tCalc.m_pExpr->Int64Eval ( tMatch ) );
67+
}
68+
}
5569
break;
5670

5771
case SPH_ATTR_DOUBLE:

src/sphinxexpr.cpp

Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5574,6 +5574,7 @@ void MoveToArgList ( ISphExpr * pLeft, VecRefPtrs_t<ISphExpr*> &dArgs )
55745574
using UdfInt_fn = sphinx_int64_t ( * ) ( SPH_UDF_INIT *, SPH_UDF_ARGS *, char * );
55755575
using UdfDouble_fn = double ( * ) ( SPH_UDF_INIT *, SPH_UDF_ARGS *, char * );
55765576
using UdfCharptr_fn = char * ( * ) ( SPH_UDF_INIT *, SPH_UDF_ARGS *, char * );
5577+
using UdfMva_fn = ByteBlob_t ( * ) ( SPH_UDF_INIT *, SPH_UDF_ARGS *, char * );
55775578

55785579
class Expr_Udf_c : public ISphExpr
55795580
{
@@ -5880,6 +5881,186 @@ class Expr_UdfStringptr_c : public Expr_Udf_c
58805881
};
58815882

58825883

5884+
class Expr_UdfMva32_c : public Expr_Udf_c
5885+
{
5886+
public:
5887+
explicit Expr_UdfMva32_c ( UdfCall_t * pCall, QueryProfile_c * pProfiler )
5888+
: Expr_Udf_c ( pCall, pProfiler )
5889+
{
5890+
assert ( pCall->m_pUdf->m_eRetType==SPH_ATTR_UINT32SET_PTR );
5891+
m_pFn = (UdfMva_fn) m_pCall->m_pUdf->m_fnFunc;
5892+
}
5893+
5894+
ByteBlob_t MvaEval ( const CSphMatch & tMatch ) const final
5895+
{
5896+
if ( m_bError )
5897+
return {nullptr, 0};
5898+
5899+
CSphScopedProfile tProf ( m_pProfiler, SPH_QSTATE_EVAL_UDF );
5900+
FillArgs ( tMatch );
5901+
auto tRes = m_pFn ( &m_pCall->m_tInit, &m_pCall->m_tArgs, &m_bError );
5902+
FreeArgs();
5903+
return tRes;
5904+
}
5905+
5906+
float Eval ( const CSphMatch & ) const final
5907+
{
5908+
assert ( 0 && "internal error: mva32 udf evaluated as float" );
5909+
return 0.0f;
5910+
}
5911+
5912+
int IntEval ( const CSphMatch & ) const final
5913+
{
5914+
assert ( 0 && "internal error: mva32 udf evaluated as int" );
5915+
return 0;
5916+
}
5917+
5918+
int64_t Int64Eval ( const CSphMatch & ) const final
5919+
{
5920+
assert ( 0 && "internal error: mva32 udf evaluated as bigint" );
5921+
return 0;
5922+
}
5923+
5924+
bool IsDataPtrAttr() const final
5925+
{
5926+
return true;
5927+
}
5928+
5929+
ISphExpr * Clone () const final
5930+
{
5931+
return new Expr_UdfMva32_c ( *this );
5932+
}
5933+
5934+
private:
5935+
Expr_UdfMva32_c ( const Expr_UdfMva32_c& rhs )
5936+
: Expr_Udf_c ( rhs )
5937+
, m_pFn ( rhs.m_pFn )
5938+
{
5939+
}
5940+
UdfMva_fn m_pFn; // to avoid dereference on each MvaEval() call
5941+
};
5942+
5943+
5944+
class Expr_UdfMva64_c : public Expr_Udf_c
5945+
{
5946+
public:
5947+
explicit Expr_UdfMva64_c ( UdfCall_t * pCall, QueryProfile_c * pProfiler )
5948+
: Expr_Udf_c ( pCall, pProfiler )
5949+
{
5950+
assert ( pCall->m_pUdf->m_eRetType==SPH_ATTR_INT64SET_PTR );
5951+
m_pFn = (UdfMva_fn) m_pCall->m_pUdf->m_fnFunc;
5952+
}
5953+
5954+
ByteBlob_t MvaEval ( const CSphMatch & tMatch ) const final
5955+
{
5956+
if ( m_bError )
5957+
return {nullptr, 0};
5958+
5959+
CSphScopedProfile tProf ( m_pProfiler, SPH_QSTATE_EVAL_UDF );
5960+
FillArgs ( tMatch );
5961+
auto tRes = m_pFn ( &m_pCall->m_tInit, &m_pCall->m_tArgs, &m_bError );
5962+
FreeArgs();
5963+
return tRes;
5964+
}
5965+
5966+
float Eval ( const CSphMatch & ) const final
5967+
{
5968+
assert ( 0 && "internal error: mva64 udf evaluated as float" );
5969+
return 0.0f;
5970+
}
5971+
5972+
int IntEval ( const CSphMatch & ) const final
5973+
{
5974+
assert ( 0 && "internal error: mva64 udf evaluated as int" );
5975+
return 0;
5976+
}
5977+
5978+
int64_t Int64Eval ( const CSphMatch & ) const final
5979+
{
5980+
assert ( 0 && "internal error: mva64 udf evaluated as bigint" );
5981+
return 0;
5982+
}
5983+
5984+
bool IsDataPtrAttr() const final
5985+
{
5986+
return true;
5987+
}
5988+
5989+
ISphExpr * Clone () const final
5990+
{
5991+
return new Expr_UdfMva64_c ( *this );
5992+
}
5993+
5994+
private:
5995+
Expr_UdfMva64_c ( const Expr_UdfMva64_c& rhs )
5996+
: Expr_Udf_c ( rhs )
5997+
, m_pFn ( rhs.m_pFn )
5998+
{
5999+
}
6000+
UdfMva_fn m_pFn; // to avoid dereference on each MvaEval() call
6001+
};
6002+
6003+
6004+
class Expr_UdfFloatVector_c : public Expr_Udf_c
6005+
{
6006+
public:
6007+
explicit Expr_UdfFloatVector_c ( UdfCall_t * pCall, QueryProfile_c * pProfiler )
6008+
: Expr_Udf_c ( pCall, pProfiler )
6009+
{
6010+
assert ( pCall->m_pUdf->m_eRetType==SPH_ATTR_FLOAT_VECTOR_PTR );
6011+
m_pFn = (UdfMva_fn) m_pCall->m_pUdf->m_fnFunc;
6012+
}
6013+
6014+
ByteBlob_t MvaEval ( const CSphMatch & tMatch ) const final
6015+
{
6016+
if ( m_bError )
6017+
return {nullptr, 0};
6018+
6019+
CSphScopedProfile tProf ( m_pProfiler, SPH_QSTATE_EVAL_UDF );
6020+
FillArgs ( tMatch );
6021+
auto tRes = m_pFn ( &m_pCall->m_tInit, &m_pCall->m_tArgs, &m_bError );
6022+
FreeArgs();
6023+
return tRes;
6024+
}
6025+
6026+
float Eval ( const CSphMatch & ) const final
6027+
{
6028+
assert ( 0 && "internal error: float_vector udf evaluated as float" );
6029+
return 0.0f;
6030+
}
6031+
6032+
int IntEval ( const CSphMatch & ) const final
6033+
{
6034+
assert ( 0 && "internal error: float_vector udf evaluated as int" );
6035+
return 0;
6036+
}
6037+
6038+
int64_t Int64Eval ( const CSphMatch & ) const final
6039+
{
6040+
assert ( 0 && "internal error: float_vector udf evaluated as bigint" );
6041+
return 0;
6042+
}
6043+
6044+
bool IsDataPtrAttr() const final
6045+
{
6046+
return true;
6047+
}
6048+
6049+
ISphExpr * Clone () const final
6050+
{
6051+
return new Expr_UdfFloatVector_c ( *this );
6052+
}
6053+
6054+
private:
6055+
Expr_UdfFloatVector_c ( const Expr_UdfFloatVector_c& rhs )
6056+
: Expr_Udf_c ( rhs )
6057+
, m_pFn ( rhs.m_pFn )
6058+
{
6059+
}
6060+
UdfMva_fn m_pFn; // to avoid dereference on each MvaEval() call
6061+
};
6062+
6063+
58836064
ISphExpr * ExprParser_t::CreateUdfNode ( int iCall, ISphExpr * pLeft )
58846065
{
58856066
if ( !CheckStoredArg(pLeft) )
@@ -5898,6 +6079,15 @@ ISphExpr * ExprParser_t::CreateUdfNode ( int iCall, ISphExpr * pLeft )
58986079
case SPH_ATTR_STRINGPTR:
58996080
pRes = new Expr_UdfStringptr_c ( m_dUdfCalls[iCall], m_pProfiler );
59006081
break;
6082+
case SPH_ATTR_UINT32SET_PTR:
6083+
pRes = new Expr_UdfMva32_c ( m_dUdfCalls[iCall], m_pProfiler );
6084+
break;
6085+
case SPH_ATTR_INT64SET_PTR:
6086+
pRes = new Expr_UdfMva64_c ( m_dUdfCalls[iCall], m_pProfiler );
6087+
break;
6088+
case SPH_ATTR_FLOAT_VECTOR_PTR:
6089+
pRes = new Expr_UdfFloatVector_c ( m_dUdfCalls[iCall], m_pProfiler );
6090+
break;
59016091
default:
59026092
m_sCreateError.SetSprintf ( "internal error: unhandled type %d in CreateUdfNode()", m_dUdfCalls[iCall]->m_pUdf->m_eRetType );
59036093
break;
@@ -10269,6 +10459,10 @@ int ExprParser_t::AddNodeUdf ( int iCall, int iArg )
1026910459
case SPH_ATTR_JSON_FIELD:
1027010460
eRes = SPH_UDF_TYPE_JSON;
1027110461
break;
10462+
case SPH_ATTR_FLOAT_VECTOR:
10463+
case SPH_ATTR_FLOAT_VECTOR_PTR:
10464+
eRes = SPH_UDF_TYPE_FLOAT_VECTOR_RETURN;
10465+
break;
1027210466
default:
1027310467
m_sParserError.SetSprintf ( "internal error: unmapped UDF argument type (arg=%d, type=%u)", i, dArgTypes[i] );
1027410468
return -1;
@@ -10300,6 +10494,8 @@ int ExprParser_t::AddNodeUdf ( int iCall, int iArg )
1030010494
// deduce type
1030110495
tNode.m_eArgType = ( iArg>=0 ) ? m_dNodes[iArg].m_eRetType : SPH_ATTR_INTEGER;
1030210496
tNode.m_eRetType = pCall->m_pUdf->m_eRetType;
10497+
10498+
1030310499
return m_dNodes.GetLength()-1;
1030410500
}
1030510501

src/sphinxudf.h

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ extern "C" {
2929
#endif
3030

3131
/// current udf version
32-
#define SPH_UDF_VERSION 11
32+
#define SPH_UDF_VERSION 12
3333

3434
/// error buffer size
3535
#define SPH_UDF_ERROR_LEN 256
@@ -48,7 +48,10 @@ enum sphinx_udf_argtype
4848
SPH_UDF_TYPE_STRING = 5, ///< non-ASCIIZ string, with a separately stored length
4949
SPH_UDF_TYPE_INT64SET = 6, ///< sorted set of signed 64-bit integers
5050
SPH_UDF_TYPE_FACTORS = 7, ///< packed ranking factors
51-
SPH_UDF_TYPE_JSON = 8 ///< whole json or particular field as a string
51+
SPH_UDF_TYPE_JSON = 8, ///< whole json or particular field as a string
52+
SPH_UDF_TYPE_UINT32SET_RETURN = 9, ///< return type: sorted set of unsigned 32-bit integers
53+
SPH_UDF_TYPE_INT64SET_RETURN = 10, ///< return type: sorted set of signed 64-bit integers
54+
SPH_UDF_TYPE_FLOAT_VECTOR_RETURN = 11 ///< return type: vector of floats
5255
};
5356

5457
/// our malloc() replacement type

0 commit comments

Comments
 (0)