Generating Citations with a Custom Model Deployment #830

bcicc · 2024-11-05T21:30:47Z

What is the issue?

I am unable to generate citations with a custom model deployment by simply yielding a CITATION_GENERATION event, nor by including citations in the STREAM_END event in the model's invoke_chat_stream method.

Here is my code in invoke_chat_stream:

yield {
      "event_type": StreamEvent.CITATION_GENERATION,
      "citations":[{
          "start": 0,
          "end": 57, #len(chunk_text),
          "text": "Carlos Alcaraz won the Wimbledon 2024 men's singles title",
          "document_ids": ["wikipedia.org"]
      }]
  }

I can see in the frontend that the citation-generation event comes through the chat-stream endpoint with the appropriate data, but no UI component is rendered. I think I may be misunderstanding something about how citations are implemented. Help would be appreciated.

Additional information

No response

The text was updated successfully, but these errors were encountered:

tianjing-li · 2024-11-10T04:49:39Z

@bcicc I'll check this out the upcoming week, I don't recall the citations format by heart.

I'm probably going to step through the code with a debugger and check the actual expected format, if you're in a rush my suggestion is to step through it on your side as well.

Just from a glance, your citation looks fine however 🤔

tianjing-li · 2024-11-12T18:26:08Z

@bcicc The general format is correct, your error is most likely the hardcoded document_ids field you're providing. If you look in handle_stream_citation_generation, you'll see the following lines:

document_ids = event_citation.get("document_ids")
        for document_id in document_ids:
            document = document_ids_to_document.get(document_id, None)

            if document is not None:
                citation.documents.append(document)

The document id needs to exist in the document_ids_to_document dictionary provided to this function, which is generated by handle_stream_search_results. You can always hardcode that dictionary where the citations are handled to test the scenario

bcicc · 2024-11-19T17:50:10Z

Thank you for the response. Here is the code i'm using, this is meant to implement Gemini grounding. Below the code is the docker logs output showing the citations object. Let me know what you think.

stream = self.gemini.generate_content(
    messages, stream=True, safety_settings=safety_settings
)
full_response = ""
try:
    for chunk in stream:
        grounding_chunks = chunk.candidates[
            0
        ].grounding_metadata.grounding_chunks
        grounding_supports = chunk.candidates[
            0
        ].grounding_metadata.grounding_supports
        chunk_text = chunk.text
        if grounding_chunks:
            full_response = "".join(
                [support.segment.text for support in grounding_supports]
            )

            ctx.get_logger().info(
                event="Grounding Info", grounding=grounding_chunks
            )
            documents = [
                {
                    "id": str(hash(chunk.web.uri)),
                    "text": chunk_text,
                    "title": chunk.web.title,
                    "url": chunk.web.uri,
                    "tool_name": "google_web_search",
                }
                for chunk in grounding_chunks
            ]
            search_results = [
                {
                    str(hash(chunk.web.uri)): chunk_text,
                }
                for chunk in grounding_chunks
            ]
            citations = [
                {
                    "start": support.segment.start_index or 0,
                    "end": support.segment.end_index or 0,
                    "text": support.segment.text or "",
                    "document_ids": [
                        str(hash(chunk.web.uri))
                        for idx, chunk in enumerate(grounding_chunks)
                        if idx in support.grounding_chunk_indices
                    ],
                }
                for support in grounding_supports
            ]

            ctx.get_logger().info(
                event="Grounding citations", citations=citations
            )
            yield {
                "event_type": StreamEvent.SEARCH_RESULTS,
                "documents": documents,
                "search_results": search_results,
            }
            yield {
                "event_type": StreamEvent.CITATION_GENERATION,
                "citations": citations,
            }
        yield {
            "event_type": StreamEvent.TEXT_GENERATION,
            "text": chunk_text,
            "response": {
                "text": chunk_text,
                "generation_id": "",
            },
        }
    usage_metadata = stream.usage_metadata
    ctx.get_logger().info(
        event="Gemini Usage Metadata", usage_metadata=usage_metadata
    )

    yield {
        "event_type": StreamEvent.STREAM_END,
        "text": full_response,
        "finish_reason": "COMPLETE",
        "chat_history": chat_history
        + [ChatMessage(role=ChatRole.CHATBOT, message=full_response).to_dict()],
        "response": {
            "text": full_response,
            "generation_id": "",
        },
    }
except Exception as e:
    error = str(e)
    ctx.get_logger().error(event="Error in invoke_chat_stream", error=error)
    yield {
        "event_type": StreamEvent.TEXT_GENERATION,
        "text": str(e),
        "response": {
            "text": str(e),
            "generation_id": "",
        },
    }
    yield {
        "event_type": "stream-end",
        "text": str(e),
        "finish_reason": "COMPLETE",
        "chat_history": chat_request.chat_history
        or [] + [ChatMessage(role=ChatRole.CHATBOT, message=error).to_dict()],
        "response": {
            "text": error,
            "generation_id": "",
        },
    }

Docker logs:

2024-11-19T17:40:31.461542Z [info     ] Grounding citations            
citations=[{
    'start': 0, 
    'end': 136, 
    'text': "Carlos Alcaraz won the men's singles title at Wimbledon 2024, defeating Novak Djokovic in the final with a score of 6-2, 6-2, 7-6 (7-4).", 
    'document_ids': ['8710105391252906020', '-2538578039819764145', '-2264767345683759908']
}, 
{
    'start': 138, 
    'end': 232, 
    'text': "This was Alcaraz's second consecutive Wimbledon title and his fourth Grand Slam title overall.", 
    'document_ids': ['8710105391252906020', '-2538578039819764145', '-2264767345683759908', '7116582112555714649']
}] 
module=backend.model_deployments.gemini trace_id=default user_id=default

tianjing-li · 2024-11-22T15:27:22Z

@bcicc Great is that version working for you now?

bcicc · 2024-11-26T16:33:25Z

I may not have been clear, I am still having the issue. I'm not sure what I'm doing wrong. I haven't been able to generate a citation properly yet, so I am not even sure what it looks like although I know from the frontend code there is supposed to be highlighting.

tianjing-li · 2024-12-05T16:17:49Z

@bcicc What's the error that pops up, if any? I would step through each part of the chat stream handling code to see what's happening exactly with your citations. The document ids in the citations need to exist in the DB as well I believe, in the user facing document_id field

bcicc added the bug Something isn't working label Nov 5, 2024

tianjing-li self-assigned this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating Citations with a Custom Model Deployment #830

Generating Citations with a Custom Model Deployment #830

bcicc commented Nov 5, 2024

tianjing-li commented Nov 10, 2024

tianjing-li commented Nov 12, 2024

bcicc commented Nov 19, 2024

tianjing-li commented Nov 22, 2024

bcicc commented Nov 26, 2024

tianjing-li commented Dec 5, 2024

Generating Citations with a Custom Model Deployment #830

Generating Citations with a Custom Model Deployment #830

Comments

bcicc commented Nov 5, 2024

What is the issue?

Additional information

tianjing-li commented Nov 10, 2024

tianjing-li commented Nov 12, 2024

bcicc commented Nov 19, 2024

tianjing-li commented Nov 22, 2024

bcicc commented Nov 26, 2024

tianjing-li commented Dec 5, 2024