-
Notifications
You must be signed in to change notification settings - Fork 191
Description
Bug Description
The fullsync on the confluence connector only pulls 50 documents if a CQL is set.
To Reproduce
Set a CQL as an "advanced rule" in the connector "sync rules" for example:
[
{
"query": "created >= now('-5y')"
}
]
Expected behavior
Pull the confluence content of the last 5 years (obvious overkill but that is a different story)
Environment
8.17.3
Solution
I have been playing around with the "paginated_api_call" function in "confluence.py" and have noticed that the function looks for a next link.
However in the /api/search call this does not actually seem to exist according to the API documentation:
https://docs.atlassian.com/atlassian-confluence/REST/6.6.0/#content-search
It seems that pagination for a search has to be done with moving of the start window.
quick prof of concept while still keeping the next link if it would be needed by another function:
async def paginated_api_call(self, url_name, **url_kwargs):
"""Make a paginated API call for Confluence objects using the passed url_name.
Args:
url_name (str): URL Name to identify the API endpoint to hit
Yields:
response: JSON response.
"""
base_url = os.path.join(self.host_url, URLS[url_name].format(**url_kwargs))
start = 0
while True:
try:
url = f"{base_url}&start={start}"
print("Starting Pagination for API endpoint: ", url)
self._logger.debug(f"Starting pagination for API endpoint {url}")
response = await self.api_call(url=url)
json_response = await response.json()
#print(json_response)
links = json_response.get("_links")
yield json_response
print(links.get("next"))
if links.get("next"):
print("Next URL Found")
url = os.path.join(
self.host_url,
links.get("next")[1:],
)
elif json_response.get("start") + json_response.get("size") < json_response.get("totalSize"):
print("Calculating next URL")
start = json_response.get("start") + json_response.get("size")
url = f"{base_url}&start={start}"
print("Next URL: ", url)
else:
print("No more data to fetch")
return
except Exception as exception:
print("Exception: ", exception)
self._logger.warning(
f"Skipping data for type {url_name} from {base_url}. Exception: {exception}."
)
break
While debugging this I also found another issue in the function "search_by_query", it never is checked if "entity_details" exists, so if entity details is none, it will fail.
I fixed this with an additional condition
async def search_by_query(self, query):
async for entity in self.confluence_client.search_by_query(query=query):
# entity can be space or content
entity_details = entity.get(SPACE) or entity.get(CONTENT)
if not entity_details:
continue
if (entity_details.get("type", "") == "attachment"
and entity_details.get("container", {}).get("title") is None
):
continue