You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Simplest example of Windows empty folder problem.
url contains only a folder sub1 (no other folders or documents)
Scan 1 adds folder to folder index
Deleting the folder from url in Windows does not remove folder from index. No other changes to contents of url.
Delete _status.json
Scan 2 folder is not removed from folder index
The same steps work correctly on Linux machine. Folder is removed from folder index.
In both cases data is on local drive (not mapped drive etc)
Job Settings
Only difference between Windows and Linux is url
14:51:22,541 �[30mTRACE�[m [f.p.e.c.f.c.ElasticsearchClient] Calling POST [https://localhost:9200/metadata_index_url14_folder/\_search](https://localhost:9200/metadata_index_url14_folder/%5C_search) with params [version=true]
14:51:22,572 �[30mTRACE�[m [f.p.e.c.f.c.ElasticsearchClient] POST [https://localhost:9200/metadata_index_url14_folder/\_search](https://localhost:9200/metadata_index_url14_folder/%5C_search) gives {"took":6,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"econtaq"},"max_score":null,"hits":[]}}
14:51:22,574 �[36mDEBUG�[m [f.p.e.c.f.FsParserAbstract] Deleting metadata_index_url14_folder**/694ca77db71937d803f96050584060
No such section in Windows fscrawler.log
21:58:59,006 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Loading plugins
21:58:59,069 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [1.9gb/2gb=97.51%], RAM [1.9gb/11.9gb=16.55%], Swap [1.7gb/13.7gb=12.73%].
21:58:59,069 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
21:58:59,069 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
21:58:59,069 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [metadata_index_url14]...
21:58:59,475 WARN [f.p.e.c.f.s.Elasticsearch] username is deprecated. Use apiKey instead.
21:58:59,475 WARN [f.p.e.c.f.s.Elasticsearch] password is deprecated. Use apiKey instead.
21:58:59,475 INFO [f.p.e.c.f.c.FsCrawlerCli] attributes_support is set to true but getting group is not available on [windows server 2019].
21:58:59,475 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Starting plugins
21:58:59,522 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Found FsCrawlerExtensionFsProvider extension for type [http]
21:58:59,522 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Found FsCrawlerExtensionFsProvider extension for type [local]
21:58:59,522 DEBUG [f.p.e.c.p.FsCrawlerPluginsManager] Found FsCrawlerExtensionFsProvider extension for type [s3]
21:58:59,537 INFO [f.p.e.c.f.FsCrawlerImpl] attributes_support is set to true but getting group is not available on [windows server 2019].
21:58:59,537 DEBUG [f.p.e.c.f.FsParserAbstract] creating fs crawler thread [metadata_index_url14] for [C://data3/folderurl14] every [10h]
21:58:59,537 DEBUG [f.p.e.c.f.FsParserAbstract] We are running on Windows without Server settings so we use the separator in accordance with fs.url
21:58:59,537 INFO [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
21:58:59,537 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
21:58:59,803 WARN [f.p.e.c.f.c.ElasticsearchClient] We are not doing SSL verification. It's not recommended for production.
21:58:59,866 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
21:59:01,444 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version returns 8.17.2 and 8 as the major version number
21:59:01,444 INFO [f.p.e.c.f.c.ElasticsearchClient] Elasticsearch Client connected to a node running version 8.17.2
21:59:01,444 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Semantic search is enabled and we are running on a version of Elasticsearch 8.17.2 which is 8.17 or higher. We will try to use the semantic search features.
21:59:01,444 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get license
21:59:01,459 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get license returns basic
21:59:01,459 WARN [f.p.e.c.f.c.ElasticsearchClient] Semantic search is enabled but we are running Elasticsearch with a basic license although we need either an enterprise or trial license.We will not be able to use the semantic search features ATM. We might switch later to a vector embeddings generation.
21:59:01,475 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service started
21:59:01,475 WARN [f.p.e.c.f.c.ElasticsearchClient] We are not doing SSL verification. It's not recommended for production.
21:59:01,475 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
21:59:01,631 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version returns 8.17.2 and 8 as the major version number
21:59:01,631 INFO [f.p.e.c.f.c.ElasticsearchClient] Elasticsearch Client connected to a node running version 8.17.2
21:59:01,631 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Semantic search is enabled and we are running on a version of Elasticsearch 8.17.2 which is 8.17 or higher. We will try to use the semantic search features.
21:59:01,631 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get license
21:59:01,647 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get license returns basic
21:59:01,647 WARN [f.p.e.c.f.c.ElasticsearchClient] Semantic search is enabled but we are running Elasticsearch with a basic license although we need either an enterprise or trial license.We will not be able to use the semantic search features ATM. We might switch later to a vector embeddings generation.
21:59:01,647 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceElasticsearchImpl] Elasticsearch Document Service started
21:59:01,647 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Creating/updating component templates
21:59:01,647 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_alias]
21:59:01,678 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_settings_shards]
21:59:01,678 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_settings_total_fields]
21:59:01,692 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_attributes]
21:59:01,700 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_file]
21:59:01,709 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_path]
21:59:01,709 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_attachment]
21:59:01,725 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_content]
21:59:01,725 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push component template [fscrawler_mapping_meta]
21:59:01,741 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Creating/updating index templates
21:59:01,741 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push index template [fscrawler_docs_metadata_index_url14]
21:59:01,741 DEBUG [f.p.e.c.f.c.ElasticsearchClient] push index template [fscrawler_folders_metadata_index_url14_folder]
21:59:01,757 INFO [f.p.e.c.f.FsParserAbstract] FS crawler started for [metadata_index_url14] for [C://data3/folderurl14] every [10h]
21:59:01,757 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler thread [metadata_index_url14] is now running. Run #1...
21:59:01,757 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] computeVirtualPathName(C://data3/folderurl14, C://data3/folderurl14) = /
21:59:01,819 DEBUG [f.p.e.c.f.FsParserAbstract] indexing [C://data3/folderurl14] content
21:59:01,819 DEBUG [f.p.e.c.f.c.f.FileAbstractorFile] Listing local files from C://data3/folderurl14
21:59:01,819 DEBUG [f.p.e.c.f.c.f.FileAbstractorFile] 0 local files found
21:59:01,819 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed files in [C://data3/folderurl14]...
21:59:01,851 DEBUG [f.p.e.c.f.FsParserAbstract] Looking for removed directories in [C://data3/folderurl14]...
21:59:01,882 DEBUG [f.p.e.c.f.FsParserAbstract] Updating job metadata after run for [metadata_index_url14]: lastrun [2025-02-11T21:58:59.757186500], indexed [0], deleted [0]
21:59:01,882 INFO [f.p.e.c.f.FsParserAbstract] Closing FS crawler file abstractor [FileAbstractorFile].
21:59:01,882 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is going to sleep for 10h
21:59:06,632 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Going to execute new bulk composed of 1 actions
21:59:06,647 DEBUG [f.p.e.c.f.c.ElasticsearchEngine] Sending a bulk request of [1] documents to the Elasticsearch service
21:59:06,647 DEBUG [f.p.e.c.f.c.ElasticsearchClient] bulk a ndjson of 393 characters
21:59:06,694 DEBUG [f.p.e.c.f.f.b.FsCrawlerSimpleBulkProcessorListener] Executed bulk composed of 1 actions
Expected behavior
metadata_index_url14
metadata_index_url14_folder
url contains a single folder (sub1), no documents
1 scan 1: Outcome folder sub1 is added to folder index
2 delete folder sub1 (no other changes to contents of url)
delete _status.json
3 Scan 2: Outcome folder sub1 is removed from folder index
sub1 is removed in Linux but not in Windows
A clear and concise description of what you expected to happen.
If the bug is related to a given file, please share this file, so we can reuse it in tests
to reproduce the problem and may be use it in our integration tests.
The text was updated successfully, but these errors were encountered:
Describe the bug
Simplest example of Windows empty folder problem.
url contains only a folder sub1 (no other folders or documents)
Scan 1 adds folder to folder index
Deleting the folder from url in Windows does not remove folder from index. No other changes to contents of url.
Delete _status.json
Scan 2 folder is not removed from folder index
The same steps work correctly on Linux machine. Folder is removed from folder index.
In both cases data is on local drive (not mapped drive etc)
Job Settings
Only difference between Windows and Linux is url
Windows
Logs
Linux
fscrawler.log
has this sectionNo such section in Windows
fscrawler.log
Expected behavior
metadata_index_url14
metadata_index_url14_folder
url contains a single folder (sub1), no documents
1 scan 1: Outcome folder sub1 is added to folder index
2 delete folder sub1 (no other changes to contents of url)
delete _status.json
3 Scan 2: Outcome folder sub1 is removed from folder index
sub1 is removed in Linux but not in Windows
A clear and concise description of what you expected to happen.
Windows
Scan 1 metadata_index_url14_folder count 2, metadata_index count 0
delete folder, delete _status.json
Scan 2 metadata_index_url14_folder 2, metadata_index 0
Linux
Scan 1 metadata_index_url14_folder 2, metadata_index 0
delete folder, delete _status.json
Scan 2 metadata_index_url14_folder 1, metadata_index 0
lin_fscrawler.log
win_fscrawler.log
win_settings.txt
lin_settings.txt
Versions:
Elastic 8.17
FSCrawler v462
Attachment
If the bug is related to a given file, please share this file, so we can reuse it in tests
to reproduce the problem and may be use it in our integration tests.
The text was updated successfully, but these errors were encountered: