diff --git a/cellsem_agent/graphs/cxg_annotate/README.md b/cellsem_agent/graphs/cxg_annotate/README.md new file mode 100644 index 0000000..4b3958e --- /dev/null +++ b/cellsem_agent/graphs/cxg_annotate/README.md @@ -0,0 +1,67 @@ +# CxG experiments + +We have two experiments related with CxG annotation: +- **Old one**: `cellsem_agent/graphs/cxg_annotate/cxg_annotate_graph.py` + This experiments uses a set of manually downloaded papers related with gut, retina etc. +- **New one**: `cellsem_agent/graphs/cxg_annotate/cxg_annotate_graph_v2.py` + This is the new experiment. An example input file is here: [cellsem_agent/graphs/cxg_annotate/resources/ac8619d0-4fff-4296-913a-819d1e361ba0_cxg_dataset_unique.tsv](cellsem_agent/graphs/cxg_annotate/resources/ac8619d0-4fff-4296-913a-819d1e361ba0_cxg_dataset_unique.tsv) + +## New experiment workflow + +`cxg_annotate_graph_v2.py` has a main function that runs the whole experiment graph. Experiment graph is as follows: + +```mermaid +--- +title: validation_graph +--- +stateDiagram-v2 + PrepareData --> GetFullNames + GetFullNames --> GetGroundings + GetGroundings --> [*] +``` + +- **PrepareData**: Prepares the data for the experiment. It reads the input TSV file, converts into the desired format and stores them into `ctx.state` which is a shared state among the workflow tasks. This step also downloads the required publications. +- **GetFullNames**: Uses ChatGPT to get the full names of the cell types based on the publication full text. +- **GetGroundings**: Uses Annotator Agent (`cellsem_agent/agents/annotator/annotator_agent.py`) to get the ontology groundings for the cell types. + +Script has a `main` function that runs the whole experiment graph. You can run the script directly. + +An environment file is needed at the project root folder named `.env` with the following variables (`cellsem-agent/.env`): +``` +OPENAI_API_KEY= +``` + +### Outputs: + +Then the pipeline is run, two main outputs are generated: +- `cellsem_agent/graphs/cxg_annotate/resources/groundings.tsv`: The main annotation results. `grounding_cl_id` and `grounding_cl_label` are found by the agent, `cl_id` and `cl_label` are the truth values from the input file. +- `cellsem_agent/graphs/cxg_annotate/resources/cell_type_annotations_un_filtered.tsv`: A not important intermediate file that contains all the cell type annotations provided by the agent. Agent uses fullname and abbreviation to find the groundings. This file contains all the groundings found by the agent in case you want to optimize the prioritization logic. Currently Full name has the higher priority and it is returned as the first grounding to be used. + +### Statistics: + +A manuel step to calculate the metrics is needed after the experiment is run. Metrics script is here: `cellsem_agent/graphs/nlm_annotate/grounding_statistics.py`. Update script to point to the correct `groundings.tsv` file and run it. +This scripts prints something like this: + +``` +Truth table: TP=19, FP=13, FN=0, TN=0 +Precision: 0.594 +Recall: 1.000 +F1 score: 0.745 +``` + +### Running in test mode: + +If you set `IS_TEST_MODE=True`, the experiment will run in test mode. In this mode, only a small subset of data (`TEST_ANNOTATIONS_COUNT=4`) is processed to allow for quick testing and debugging. This is useful for development and troubleshooting. + +Set `IS_TEST_MODE=False`, to run the full experiment. + +### Beware of caching: + +The experiment uses caching to store intermediate results and avoid redundant computations and avoid expensive ChatGPT calls. If you make changes to the code or input data, you may need to clear the cache to ensure that the experiment runs with the latest information. + +Here are the cache directories used in the experiment: +- `cellsem_agent/graphs/cxg_annotate/resources/publications`: Publications downloaded in the `PrepareData` step is stored here in format: `DOI_10_1038_s41586-018-0698-6.txt` +- `cellsem_agent/graphs/cxg_annotate/resources/expansions`: Caching of the `GetFullNames` step. Example cache file name: `DOI_10_1038_s41586-018-0698-6_batch_0.json` +- `cellsem_agent/graphs/cxg_annotate/resources/cache`: Caching of the `GetGroundings` step. Example cache file name: `groundings_batch_0.json` + +Delete these folders as needed to clear the cache and run a fresh but $$$ experiment. The folders should be automatically created when script is run. \ No newline at end of file diff --git a/cellsem_agent/graphs/cxg_annotate/cxg_annotate_graph_v2.py b/cellsem_agent/graphs/cxg_annotate/cxg_annotate_graph_v2.py new file mode 100644 index 0000000..3921e5c --- /dev/null +++ b/cellsem_agent/graphs/cxg_annotate/cxg_annotate_graph_v2.py @@ -0,0 +1,318 @@ +import asyncio +import os.path +import json +import pandas as pd + +from dotenv import load_dotenv +from pydantic_graph import BaseNode, End, Graph, GraphRunContext + +from cellsem_agent.agents.annotator.annotator_agent import annotator_agent +from cellsem_agent.agents.paper_celltype.paper_celltype_agent import celltype_agent, CellTypeEntry +from cellsem_agent.agents.annotator.annotator_agent import TextAnnotation +from cellsem_agent.utils.pubmed_utils import get_doi_text + +from dataclasses import dataclass +import logfire +import logging + +cxg_annotate_logger = logging.getLogger(__name__) +cxg_annotate_logger.setLevel(logging.INFO) +console = logging.StreamHandler() +console.setLevel(logging.INFO) +formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') +console.setFormatter(formatter) +cxg_annotate_logger.addHandler(console) + +cxg_annotate_logger.propagate = True +logfire.configure() + +ANNOTATIONS_BATCH_SIZE = 5 + +IS_TEST_MODE = False +TEST_ANNOTATIONS_COUNT = 4 # Number of annotations to process in test mode + +CURRENT_DIR = os.path.dirname(os.path.abspath(__file__)) +RESOURCES_DIR = os.path.join(CURRENT_DIR, "resources") +PUBLICATIONS_DIR = os.path.join(RESOURCES_DIR, "publications") +EXPANSIONS_DIR = os.path.join(RESOURCES_DIR, "expansions") + + +@dataclass +class Dataset: + name: str + publication_file_name: str + supplementary_file_name: str + data_file_name: str + + +@dataclass +class State: + articles: set[str] + annotations: list[dict] + article_to_annotations: dict[str, dict] + paper_expansion: dict[str, CellTypeEntry] + is_test_mode: bool = IS_TEST_MODE + + +@dataclass +class GetGroundings(BaseNode[State, None, str]): + + async def run(self, ctx: GraphRunContext[State]) -> End: + annotations = ctx.state.annotations + cxg_annotate_logger.info(f"Total annotations to process: {len(annotations)}") + + for annotation in annotations: + if 'enrichment' not in annotation: + annotation['enrichment'] = CellTypeEntry( + name=annotation['annotation_text'], + full_name="", + paper_synonyms="", + tissue_context="" + ) + print(f"Warning: No enrichment found for annotation '{annotation['annotation_text']}', using blank entry.") + # delete tissue_context of all enrichments + annotation['enrichment'].tissue_context = "" + + # Sort annotations by article_id_doi, then annotation_text + annotations.sort(key=lambda annot: (annot.get('article_id_doi') or "", annot.get('annotation_text') or "")) + + cache_dir = os.path.join(RESOURCES_DIR, "cache") + os.makedirs(cache_dir, exist_ok=True) + + batch_size = 4 + all_groundings = [] + for i in range(0, len(annotations), batch_size): + batch_index = i // batch_size + batch = annotations[i:i + batch_size] + batch_cache_path = os.path.join(cache_dir, f"groundings_batch_{batch_index}.json") + + if os.path.exists(batch_cache_path): + print(f"Loading cached results for batch {batch_index}") + with open(batch_cache_path, "r") as f: + batch_groundings = [TextAnnotation(**entry) for entry in json.load(f)] + else: + print("Processing batch: ", i // batch_size + 1, " of ", + (len(annotations) + batch_size - 1) // batch_size) + expansions_json = json.dumps([annotation['enrichment'].model_dump() for annotation in batch], indent=2) + agent_response = await annotator_agent.run(expansions_json) + batch_groundings = agent_response.output.annotations + with open(batch_cache_path, "w") as f: + json.dump([entry.model_dump() for entry in batch_groundings], f, indent=2) + + all_groundings.extend(batch_groundings) + # update batch annotations with grounding results + for annotation in batch: + # convert enrichment to json to make df mode readable + annotation['enrichment'] = annotation['enrichment'].model_dump() + if "grounding_cl_id" not in annotation: + related_groundings = [gr for gr in batch_groundings if + gr.input_name == annotation['annotation_text']] + if related_groundings: + valid_grounding = next( + (g for g in related_groundings if "NO MATCH" not in g.cl_id), None) + if valid_grounding: + grounding_to_use = valid_grounding + else: + grounding_to_use = related_groundings[0] + annotation['grounding_cl_id'] = grounding_to_use.cl_id + annotation['grounding_cl_label'] = grounding_to_use.cl_label + else: + annotation['grounding_cl_id'] = "" + annotation['grounding_cl_label'] = "" + + + data = [entry.model_dump() for entry in all_groundings] + df = pd.DataFrame(data) + df.to_csv(os.path.join(RESOURCES_DIR, "cell_type_annotations_un_filtered.tsv"), sep='\t', index=False) + + # print annotations that has groundings as tsv (annotation_text, cl_id, grounding_cl_id, grounding_cl_label, article_id_doi) + df = pd.DataFrame(annotations) + df_filtered = df[df['grounding_cl_id'].notna()] + df_filtered['result'] = df_filtered['cl_id'].eq(df_filtered['grounding_cl_id']).map( + {True: 'TRUE', False: 'FALSE'}) + df_filtered.to_csv(os.path.join(RESOURCES_DIR, "groundings.tsv"), sep='\t', index=False) + + return End("Report generated and saved to individual dataset folders.") + + +@dataclass +class GetFullNames(BaseNode[State, None, str]): + + async def run(self, ctx: GraphRunContext[State]) -> GetGroundings: + print("Running GetFullNames node") + if not os.path.exists(EXPANSIONS_DIR): + os.makedirs(EXPANSIONS_DIR) + article_to_annotations = ctx.state.article_to_annotations + articles = sorted(str(a) if a is not None else "" for a in set(article_to_annotations.keys())) + index = 1 + for article_pmc in articles: + print(f"Processing article: {article_pmc} - {index}/{len(articles)}") + index += 1 + # get all annotations for this article + article_annotations = article_to_annotations[article_pmc] + + for batch_index in range(0, len(article_annotations), ANNOTATIONS_BATCH_SIZE): + batch = article_annotations[batch_index:batch_index + ANNOTATIONS_BATCH_SIZE] + dataset_cache = os.path.join(EXPANSIONS_DIR, + f"{normalise_file_name(article_pmc)}_batch_{batch_index // ANNOTATIONS_BATCH_SIZE}.json") + cc_labels = [{"cc.label": ann['annotation_text']} for ann in batch] + + if not os.path.exists(dataset_cache): + full_text_path = os.path.join(EXPANSIONS_DIR, f"{normalise_file_name(article_pmc)}.txt") + if os.path.exists(full_text_path): + with open(full_text_path, 'r', encoding='utf-8') as f: + paper_full_text = f.read() + + prompt_instructions = f""" + You are tasked with extracting cell type information from the provided academic paper content, + and the provided JSON data. + + The JSON contains cell type annotations (cc.label column) from single-cell transcriptomic data. + + Based on the following JSON data and academic paper content, generate a list of structured + cell type entries. Each entry must follow the `CellTypeEntry` schema. + + --- JSON List Input Data: + {json.dumps(cc_labels, indent=2)} + + --- Academic Paper Content (extracted from PDF): + {paper_full_text} + + --- COLUMN DEFINITIONS AND LOGIC: + - `name`: The exact `cc.label` from the input JSON. + - `full_name`: Use the following logic: + 1. If the full label (e.g., "SI_TA") is defined directly in the paper, use the exact definition. + 2. If not, check if individual parts (e.g., prefixes, suffixes) are defined and reconstruct/assemble the `full_name` from the parts found (e.g., for "SI_TA", assemble "small intestine transit amplifying cell" if paper defines "SI" as "small intestine" and "TA" as "transit amplifying cell"). + 3. If the label begins with a defined prefix abbreviation (e.g., "RGC"), expand the prefix and append the remaining label (e.g., "RGC10" becomes "retinal ganglion cell 10"). + 4. If only one part is defined, use just that part. + 5. If no parts are defined, leave this field blank. + - `paper_synonyms`: Use only synonyms mentioned in the paper using: + - Abbreviation lists + - Abbreviation definitions (e.g., "follicle-associated epithelium (FAE)") + - Patterns like “also known as”, “termed”, “referred to as” + - Include all found; separate with semicolons (;) + - `tissue_context`: Exact quoted tissue(s) or anatomical terms from the paper where the cell type was identified. + + Process all `cc.label` entries from the JSON data automatically. + Do not ask for confirmation. + Provide the output as a JSON array of `CellTypeEntry` objects. + """ + agent_response = await celltype_agent.run(prompt_instructions) + + for entry in agent_response.output.cell_type_annotations: + print( + f"Name: {entry.name}, Full Name: {entry.full_name}, Synonyms: {entry.paper_synonyms}, Tissue Context: {entry.tissue_context}") + # add entry to the related article_annotations + for ann in article_annotations: + if ann['annotation_text'] == entry.name: + ann['enrichment'] = entry + break + + # ctx.state.paper_expansion[article_pmc] = agent_response.output.cell_type_annotations + expansions = agent_response.output.cell_type_annotations + print(f"Saving results to cache for article: {article_pmc}") + with open(dataset_cache, 'w') as cache_file: + json.dump( + [entry.model_dump() for entry in expansions], + cache_file, indent=2) + else: + print(f"Error: Full text file not found for article for name expansion: {article_pmc}") + else: + print(f"Using cached data for article: {article_pmc}") + with open(dataset_cache, 'r') as cache_file: + cached_data = json.load(cache_file) + for cached_entry in cached_data: + for ann in article_annotations: + if ann['annotation_text'] == cached_entry["name"]: + ann['enrichment'] = CellTypeEntry(**cached_entry) + print("Using cached enrichment data for annotation:", ann['annotation_text']) + break + # ctx.state.paper_expansion[article_pmc] = [CellTypeEntry(**entry) for entry in cached_data] + return GetGroundings() + +@dataclass +class PrepareData(BaseNode[State, None, str]): + + async def run(self, ctx: GraphRunContext[State]) -> GetFullNames: + print("Running PrepareData node") + annotations, article_to_annotations = load_cxg_annotations() + + if ctx.state.is_test_mode: + # only process a few annotations in test mode + annotations = list(annotations)[:TEST_ANNOTATIONS_COUNT] + # filter article_to_annotations to only include those in annotations + article_to_annotations = {k: v for k, v in article_to_annotations.items() if k in + {ann['article_id_doi'] for ann in annotations}} + + unique_dois = set(article_to_annotations.keys()) + print(f"Unique DOISs to download: {len(unique_dois)}") + articles = download_publication_texts(unique_dois) + print(f"Downloaded articles: {len(articles)}") + + ctx.state.articles = articles + ctx.state.annotations = annotations + ctx.state.article_to_annotations = article_to_annotations + + return GetFullNames() + +def load_cxg_annotations(): + tsv_path = os.path.join(os.getcwd(),"resources", "ac8619d0-4fff-4296-913a-819d1e361ba0_cxg_dataset_unique.tsv") + df = pd.read_csv(tsv_path, sep='\t') + + annotations = [] + article_to_annotations = {} + + for _, row in df.iterrows(): + paper_doi = str(row['reference']).replace("https://doi.org/", "DOI:") + annotation = { + 'annotation_text': row['author_cell_type'], + 'cl_id': row['CL_ID'], + 'cl_label': row['CL_label'], + 'article_id_doi': paper_doi + } + annotations.append(annotation) + article_to_annotations.setdefault(paper_doi, []).append(annotation) + + return annotations, article_to_annotations + +def download_publication_texts(dois, publications_dir=PUBLICATIONS_DIR): + """ + Download full text for each DOI using get_doi_text and save to publications_dir/pmc_id.txt. + Skips download if file already exists. Creates publications_dir if needed. + Args: + dois (Iterable[str]): Set or list of PMC IDs. + publications_dir (str): Directory to save text files. + """ + if not os.path.exists(publications_dir): + os.makedirs(publications_dir) + articles = set() + for doi in dois: + if doi: + file_path = os.path.join(publications_dir, f"{normalise_file_name(doi)}.txt") + if os.path.exists(file_path): + articles.add(doi) + continue + text = get_doi_text(doi) + if text: + with open(file_path, "w", encoding="utf-8") as f: + f.write(text) + articles.add(doi) + else: + print(f"Error: No full-text found for ID {doi}") + return articles + +def normalise_file_name(doi: str) -> str: + return doi.replace("/", "_").replace(":", "_").replace(".", "_") + +async def main(): + state = State(set(), list(), dict(), dict(), is_test_mode=IS_TEST_MODE) + validation_graph = Graph(nodes=(PrepareData, GetFullNames, GetGroundings)) + result = await validation_graph.run(PrepareData(), state=state) + print(result.output) + # print(validation_graph.mermaid_code()) + + +if __name__ == "__main__": + load_dotenv() + print(os.environ.get("OPENAI_API_KEY")) + asyncio.run(main()) \ No newline at end of file diff --git a/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0-4fff-4296-913a-819d1e361ba0_cxg_dataset_unique.tsv b/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0-4fff-4296-913a-819d1e361ba0_cxg_dataset_unique.tsv new file mode 100644 index 0000000..1fe93cd --- /dev/null +++ b/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0-4fff-4296-913a-819d1e361ba0_cxg_dataset_unique.tsv @@ -0,0 +1,33 @@ +author_cell_type CL_label CL_ID reference dataset_version +dS1 decidual cell CL:2000002 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +VCT placental villous trophoblast CL:2000060 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dNK2 decidual natural killer cell, human CL:0002343 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dS2 decidual cell CL:2000002 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Tcells T cell CL:0000084 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dNK1 decidual natural killer cell, human CL:0002343 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dM2 macrophage CL:0000235 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dM1 macrophage CL:0000235 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dNK3 decidual natural killer cell, human CL:0002343 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +fFB1 fibroblast CL:0000057 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +EVT extravillous trophoblast CL:0008036 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +NK CD16+ CD16-positive, CD56-dim natural killer cell, human CL:0000939 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +HB Hofbauer cell CL:3000001 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +SCT syncytiotrophoblast cell CL:0000525 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dS3 decidual cell CL:2000002 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dM3 macrophage CL:0000235 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +MO monocyte CL:0000576 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dNK p decidual natural killer cell, human CL:0002343 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Plasma plasma cell CL:0000786 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +DC1 conventional dendritic cell CL:0000990 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +DC2 conventional dendritic cell CL:0000990 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Granulocytes granulocyte CL:0000094 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +NK CD16- CD16-negative, CD56-bright natural killer cell, human CL:0000938 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +ILC3 innate lymphoid cell CL:0001065 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +fFB2 fibroblast CL:0000057 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Endo (m) endothelial cell CL:0000115 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Endo L endothelial cell CL:0000115 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Endo (f) endothelial cell CL:0000115 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dP1 pericyte CL:0000669 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +dP2 pericyte CL:0000669 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Epi1 glandular secretory epithelial cell CL:0000150 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad +Epi2 glandular secretory epithelial cell CL:0000150 https://doi.org/10.1038/s41586-018-0698-6 https://datasets.cellxgene.cziscience.com/ac8619d0-4fff-4296-913a-819d1e361ba0.h5ad diff --git a/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0_cell_type_annotations_un_filtered.tsv b/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0_cell_type_annotations_un_filtered.tsv new file mode 100644 index 0000000..b73749e --- /dev/null +++ b/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0_cell_type_annotations_un_filtered.tsv @@ -0,0 +1,63 @@ +input_name text cl_id cl_label +DC1 conventional dendritic cell 1 CL:0000990 conventional dendritic cell +DC1 DC1 CL:0000990 conventional dendritic cell +DC2 dendritic cell 2 CL:0000990 conventional dendritic cell +DC2 DC2 CL:0000990 conventional dendritic cell +EVT extravillous trophoblast CL:0008036 extravillous trophoblast +EVT EVT CL:0008036 extravillous trophoblast +Endo (f) Fetal endothelial cell CL:0000115 endothelial cell +Endo (f) Endo (f) CL:0000115 endothelial cell +Endo (m) Maternal endothelial cell CL:0000115 endothelial cell +Endo (m) Endo (m) CL:0000115 endothelial cell +Endo L Lymphatic endothelial cell CL:0002138 endothelial cell of lymphatic vessel +Endo L Endo L CL:0002138 endothelial cell of lymphatic vessel +Epi1 epithelial glandular cell 1 CL:0000150 glandular secretory epithelial cell +Epi1 Epi1 CL:0000150 glandular secretory epithelial cell +Epi2 epithelial glandular cell 2 CL:0000150 glandular secretory epithelial cell +Epi2 Epi2 CL:0000150 glandular secretory epithelial cell +Granulocytes granulocyte CL:0000094 granulocyte +Granulocytes granulocyte CL:0000094 granulocyte +HB Hofbauer cell CL:3000001 Hofbauer cell +HB HB CL:3000001 Hofbauer cell +ILC3 innate lymphocyte cell 3 CL:0001071 group 3 innate lymphoid cell +ILC3 ILC3 CL:0001071 group 3 innate lymphoid cell +MO monocyte CL:0000576 monocyte +MO MO CL:0000576 monocyte +NK CD16+ natural killer cell CD16+ CL:0000939 CD16-positive, CD56-dim natural killer cell, human +NK CD16+ NK CD16+ CL:0000939 CD16-positive, CD56-dim natural killer cell, human +NK CD16- natural killer cell CD16- CL:0000938 CD16-negative, CD56-bright natural killer cell, human +NK CD16- NK CD16- CL:0000938 CD16-negative, CD56-bright natural killer cell, human +Plasma plasma cell CL:0000786 plasma cell +Plasma Plasma CL:0000786 plasma cell +SCT syncytiotrophoblast CL:0000525 syncytiotrophoblast cell +SCT SCT CL:0000525 syncytiotrophoblast cell +Tcells T cell CL:0000084 T cell +Tcells T cell CL:0000084 T cell +VCT villous cytotrophoblast CL:0000523 mononuclear cytotrophoblast cell +VCT VCT CL:0000523 mononuclear cytotrophoblast cell +dM1 decidual macrophage 1 CL:4033088 decidual resident macrophage +dM1 dM1 CL:4033088 decidual resident macrophage +dM2 decidual macrophage 2 CL:4033088 decidual resident macrophage +dM2 dM2 CL:4033088 decidual resident macrophage +dM3 decidual macrophage 3 CL:4033088 decidual resident macrophage +dM3 dM3 CL:4033088 decidual resident macrophage +dNK p proliferating decidual natural killer cell CL:4052028 uterine natural killer cell +dNK p dNK p CL:4052028 uterine natural killer cell +dNK1 decidual natural killer cell 1 CL:4052051 uterine natural killer cell 1, human +dNK1 dNK1 CL:4052051 uterine natural killer cell 1, human +dNK2 decidual natural killer cell 2 CL:4052052 uterine natural killer cell 2, human +dNK2 dNK2 CL:4052052 uterine natural killer cell 2, human +dNK3 decidual natural killer cell 3 CL:4052053 uterine natural killer cell 3, human +dNK3 dNK3 CL:4052053 uterine natural killer cell 3, human +dP1 dP1 NO MATCH found +dP2 dP2 NO MATCH found +dS1 decidual stromal cell 1 CL:2000002 decidual cell +dS1 dS1 CL:2000002 decidual cell +dS2 decidual stromal cell 2 CL:2000002 decidual cell +dS2 dS2 NO MATCH found +dS3 decidual stromal cell 3 CL:2000002 decidual cell +dS3 dS3 NO MATCH found +fFB1 fetal fibroblast 1 CL:0000057 fibroblast +fFB1 fFB1 NO MATCH found +fFB2 fetal NO MATCH found +fFB2 fFB2 NO MATCH found diff --git a/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0_groundings.tsv b/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0_groundings.tsv new file mode 100644 index 0000000..24b0eaf --- /dev/null +++ b/cellsem_agent/graphs/cxg_annotate/resources/ac8619d0_groundings.tsv @@ -0,0 +1,33 @@ +annotation_text cl_id cl_label article_id_doi enrichment grounding_cl_id grounding_cl_label result +DC1 CL:0000990 conventional dendritic cell DOI:10.1038/s41586-018-0698-6 {'name': 'DC1', 'full_name': 'conventional dendritic cell 1', 'paper_synonyms': 'conventional DC1', 'tissue_context': ''} CL:0000990 conventional dendritic cell TRUE +DC2 CL:0000990 conventional dendritic cell DOI:10.1038/s41586-018-0698-6 {'name': 'DC2', 'full_name': 'dendritic cell 2', 'paper_synonyms': 'dendritic cells; DC', 'tissue_context': ''} CL:0000990 conventional dendritic cell TRUE +EVT CL:0008036 extravillous trophoblast DOI:10.1038/s41586-018-0698-6 {'name': 'EVT', 'full_name': 'extravillous trophoblast', 'paper_synonyms': 'extravillous trophoblast cells', 'tissue_context': ''} CL:0008036 extravillous trophoblast TRUE +Endo (f) CL:0000115 endothelial cell DOI:10.1038/s41586-018-0698-6 {'name': 'Endo (f)', 'full_name': 'Fetal endothelial cells', 'paper_synonyms': 'Endo', 'tissue_context': ''} CL:0000115 endothelial cell TRUE +Endo (m) CL:0000115 endothelial cell DOI:10.1038/s41586-018-0698-6 {'name': 'Endo (m)', 'full_name': 'Maternal endothelial cells', 'paper_synonyms': 'Endo', 'tissue_context': ''} CL:0000115 endothelial cell TRUE +Endo L CL:0000115 endothelial cell DOI:10.1038/s41586-018-0698-6 {'name': 'Endo L', 'full_name': 'Lymphatic endothelial cells', 'paper_synonyms': 'Endo; Endo l', 'tissue_context': ''} CL:0002138 endothelial cell of lymphatic vessel FALSE +Epi1 CL:0000150 glandular secretory epithelial cell DOI:10.1038/s41586-018-0698-6 {'name': 'Epi1', 'full_name': 'epithelial glandular cell 1', 'paper_synonyms': 'Epi; epithelial glandular cells', 'tissue_context': ''} CL:0000150 glandular secretory epithelial cell TRUE +Epi2 CL:0000150 glandular secretory epithelial cell DOI:10.1038/s41586-018-0698-6 {'name': 'Epi2', 'full_name': 'epithelial glandular cell 2', 'paper_synonyms': 'Epi; epithelial glandular cells', 'tissue_context': ''} CL:0000150 glandular secretory epithelial cell TRUE +Granulocytes CL:0000094 granulocyte DOI:10.1038/s41586-018-0698-6 {'name': 'Granulocytes', 'full_name': 'granulocytes', 'paper_synonyms': 'Granulo', 'tissue_context': ''} CL:0000094 granulocyte TRUE +HB CL:3000001 Hofbauer cell DOI:10.1038/s41586-018-0698-6 {'name': 'HB', 'full_name': 'Hofbauer cells', 'paper_synonyms': None, 'tissue_context': ''} CL:3000001 Hofbauer cell TRUE +ILC3 CL:0001065 innate lymphoid cell DOI:10.1038/s41586-018-0698-6 {'name': 'ILC3', 'full_name': 'innate lymphocyte cell 3', 'paper_synonyms': 'innate lymphocyte cells; ILC', 'tissue_context': ''} CL:0001071 group 3 innate lymphoid cell FALSE +MO CL:0000576 monocyte DOI:10.1038/s41586-018-0698-6 {'name': 'MO', 'full_name': 'monocytes', 'paper_synonyms': None, 'tissue_context': ''} CL:0000576 monocyte TRUE +NK CD16+ CL:0000939 CD16-positive, CD56-dim natural killer cell, human DOI:10.1038/s41586-018-0698-6 {'name': 'NK CD16+', 'full_name': 'natural killer cell CD16+', 'paper_synonyms': 'natural killer cells', 'tissue_context': ''} CL:0000939 CD16-positive, CD56-dim natural killer cell, human TRUE +NK CD16- CL:0000938 CD16-negative, CD56-bright natural killer cell, human DOI:10.1038/s41586-018-0698-6 {'name': 'NK CD16-', 'full_name': 'natural killer cell CD16-', 'paper_synonyms': 'natural killer cells; NK', 'tissue_context': ''} CL:0000938 CD16-negative, CD56-bright natural killer cell, human TRUE +Plasma CL:0000786 plasma cell DOI:10.1038/s41586-018-0698-6 {'name': 'Plasma', 'full_name': 'plasma cells', 'paper_synonyms': None, 'tissue_context': ''} CL:0000786 plasma cell TRUE +SCT CL:0000525 syncytiotrophoblast cell DOI:10.1038/s41586-018-0698-6 {'name': 'SCT', 'full_name': 'syncytiotrophoblast', 'paper_synonyms': None, 'tissue_context': ''} CL:0000525 syncytiotrophoblast cell TRUE +Tcells CL:0000084 T cell DOI:10.1038/s41586-018-0698-6 {'name': 'Tcells', 'full_name': 'T cells', 'paper_synonyms': None, 'tissue_context': ''} CL:0000084 T cell TRUE +VCT CL:2000060 placental villous trophoblast DOI:10.1038/s41586-018-0698-6 {'name': 'VCT', 'full_name': 'villous cytotrophoblast', 'paper_synonyms': None, 'tissue_context': ''} CL:0000523 mononuclear cytotrophoblast cell FALSE +dM1 CL:0000235 macrophage DOI:10.1038/s41586-018-0698-6 {'name': 'dM1', 'full_name': 'decidual macrophage 1', 'paper_synonyms': 'dM; decidual macrophages', 'tissue_context': ''} CL:4033088 decidual resident macrophage FALSE +dM2 CL:0000235 macrophage DOI:10.1038/s41586-018-0698-6 {'name': 'dM2', 'full_name': 'decidual macrophage 2', 'paper_synonyms': 'dM; decidual macrophages', 'tissue_context': ''} CL:4033088 decidual resident macrophage FALSE +dM3 CL:0000235 macrophage DOI:10.1038/s41586-018-0698-6 {'name': 'dM3', 'full_name': 'decidual macrophage 3', 'paper_synonyms': 'dM; decidual macrophages', 'tissue_context': ''} CL:4033088 decidual resident macrophage FALSE +dNK p CL:0002343 decidual natural killer cell, human DOI:10.1038/s41586-018-0698-6 {'name': 'dNK p', 'full_name': 'proliferating decidual natural killer cell', 'paper_synonyms': 'dNKp; proliferating dNK cells; NKp; proliferating natural killer cells', 'tissue_context': ''} CL:4052028 uterine natural killer cell FALSE +dNK1 CL:0002343 decidual natural killer cell, human DOI:10.1038/s41586-018-0698-6 {'name': 'dNK1', 'full_name': 'decidual natural killer cell 1', 'paper_synonyms': 'decidual natural killer (dNK) cells; dNK cells; dNK', 'tissue_context': ''} CL:4052051 uterine natural killer cell 1, human FALSE +dNK2 CL:0002343 decidual natural killer cell, human DOI:10.1038/s41586-018-0698-6 {'name': 'dNK2', 'full_name': 'decidual natural killer cell 2', 'paper_synonyms': None, 'tissue_context': ''} CL:4052052 uterine natural killer cell 2, human FALSE +dNK3 CL:0002343 decidual natural killer cell, human DOI:10.1038/s41586-018-0698-6 {'name': 'dNK3', 'full_name': 'decidual natural killer cell 3', 'paper_synonyms': 'decidual natural killer (dNK) cells; dNK cells; dNK', 'tissue_context': ''} CL:4052053 uterine natural killer cell 3, human FALSE +dP1 CL:0000669 pericyte DOI:10.1038/s41586-018-0698-6 {'name': 'dP1', 'full_name': None, 'paper_synonyms': None, 'tissue_context': ''} NO MATCH found FALSE +dP2 CL:0000669 pericyte DOI:10.1038/s41586-018-0698-6 {'name': 'dP2', 'full_name': None, 'paper_synonyms': None, 'tissue_context': ''} NO MATCH found FALSE +dS1 CL:2000002 decidual cell DOI:10.1038/s41586-018-0698-6 {'name': 'dS1', 'full_name': 'decidual stromal cell 1', 'paper_synonyms': None, 'tissue_context': ''} CL:2000002 decidual cell TRUE +dS2 CL:2000002 decidual cell DOI:10.1038/s41586-018-0698-6 {'name': 'dS2', 'full_name': 'decidual stromal cell 2', 'paper_synonyms': None, 'tissue_context': ''} CL:2000002 decidual cell TRUE +dS3 CL:2000002 decidual cell DOI:10.1038/s41586-018-0698-6 {'name': 'dS3', 'full_name': 'decidual stromal cell 3', 'paper_synonyms': 'decidual stromal cells', 'tissue_context': ''} CL:2000002 decidual cell TRUE +fFB1 CL:0000057 fibroblast DOI:10.1038/s41586-018-0698-6 {'name': 'fFB1', 'full_name': 'fetal fibroblast 1', 'paper_synonyms': 'fibroblasts; F', 'tissue_context': ''} CL:0000057 fibroblast TRUE +fFB2 CL:0000057 fibroblast DOI:10.1038/s41586-018-0698-6 {'name': 'fFB2', 'full_name': 'fetal', 'paper_synonyms': None, 'tissue_context': ''} NO MATCH found FALSE diff --git a/cellsem_agent/graphs/cxg_annotate/resources/f9941c87-2741-4f2b-b158-12c19d2ed50e_cxg_dataset_unique.tsv b/cellsem_agent/graphs/cxg_annotate/resources/f9941c87-2741-4f2b-b158-12c19d2ed50e_cxg_dataset_unique.tsv new file mode 100644 index 0000000..582e616 --- /dev/null +++ b/cellsem_agent/graphs/cxg_annotate/resources/f9941c87-2741-4f2b-b158-12c19d2ed50e_cxg_dataset_unique.tsv @@ -0,0 +1,29 @@ +author_cell_type CL_label CL_ID reference dataset_version +Mid stalk epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Mid tip epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Late stalk epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +AT2 pulmonary alveolar type 2 cell CL:0002063 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Late tip epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Mid airway progenitor respiratory tract epithelial cell CL:0002368 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Pulmonary NE precursor neuroendocrine cell CL:0000165 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +GHRL+ NE precursor neuroendocrine cell CL:0000165 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Mid basal basal cell CL:0000646 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Late basal basal cell CL:0000646 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Pulmonary neuroendocrine neuroendocrine cell CL:0000165 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +MUC5AC+ ASCL1+ progenitor epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Early tip epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Late airway progenitor respiratory tract epithelial cell CL:0002368 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Early stalk epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Early airway progenitor respiratory tract epithelial cell CL:0002368 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Club club cell CL:0000158 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Squamous epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +AT1 pulmonary alveolar type 1 cell CL:0002062 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Secretory 2 lung secretory cell CL:1000272 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Secretory 3 lung secretory cell CL:1000272 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Secretory 1 lung secretory cell CL:1000272 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +GHRL+ neuroendocrine neuroendocrine cell CL:0000165 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +SMG epithelial cell of lung CL:0000082 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +NEUROD1+ pulmonary neuroendocrine pulmonary neuroendocrine cell CL:1000223 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Proximal basal basal cell CL:0000646 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +Secretory progenitors lung secretory cell CL:1000272 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad +SMG basal basal cell CL:0000646 https://doi.org/10.1016/j.cell.2022.11.005 https://datasets.cellxgene.cziscience.com/f9941c87-2741-4f2b-b158-12c19d2ed50e.h5ad diff --git a/cellsem_agent/graphs/cxg_annotate/resources/publications/DOI_10_1038_s41586-018-0698-6.txt b/cellsem_agent/graphs/cxg_annotate/resources/publications/DOI_10_1038_s41586-018-0698-6.txt new file mode 100644 index 0000000..e7fbedf --- /dev/null +++ b/cellsem_agent/graphs/cxg_annotate/resources/publications/DOI_10_1038_s41586-018-0698-6.txt @@ -0,0 +1,193 @@ +Single-cell reconstruction of the early maternal-fetal interface in humans +During early human pregnancy the uterine mucosa transforms into the decidua, into which the fetal placenta implants and where placental trophoblast cells intermingle and communicate with maternal cells. Trophoblast-decidual interactions underlie common diseases of pregnancy, including pre-eclampsia and stillbirth. Here we profile the transcriptomes of about 70,000 single cells from first-trimester placentas with matched maternal blood and decidual cells. The cellular composition of human decidua reveals subsets of perivascular and stromal cells that are located in distinct decidual layers. There are three major subsets of decidual natural killer cells that have distinctive immunomodulatory and chemokine profiles. We develop a repository of ligand-receptor complexes and a statistical tool to predict the cell-type specificity of cell-cell communication via these molecular interactions. Our data identify many regulatory interactions that prevent harmful innate or adaptive immune responses in this environment. Our single-cell atlas of the maternal-fetal interface reveals the cellular organization of the decidua and placenta, and the interactions that are critical for placentation and reproductive success. +During early pregnancy, the uterine mucosal lining:the endometrium: is transformed into the decidua under the influence of progesterone. Decidualization results from a complex and well-orchestrated differentiation program that involves all cellular elements of the mucosa: stromal, glandular and immune cells, the last of which include the distinctive decidual natural killer (dNK) cells. The blastocyst implants into the decidua, and initially:before arterial connections are established:uterine glands are the source of histotrophic nutrition in the placenta. After implantation, placental extravillous trophoblast cells (EVT) invade through the decidua and move towards the spiral arteries, where they destroy the smooth muscle media and transform the arteries into high conductance vessels. Balanced regulation of EVT invasion is critical to pregnancy success: to ensure correct allocation of resources to mother and baby, arteries must be sufficiently transformed but excessive invasion must be prevented. The pivotal regulatory role of the decidua is obvious from the life-threatening, uncontrolled trophoblast invasion that occurs when the decidua is absent, as when the placenta implants on a previous Caesarean section scar. +EVT have a unique human leukocyte antigen (HLA) profile: they do not express the dominant T cell ligands, class I HLA-A and HLA-B, or class II molecules but do express HLA-G and HLA-E and polymorphic HLA-C class I molecules. These trophoblast HLA ligands have receptors that are expressed by the dominant decidual immune cells (that is, dNKs), including maternal killer immunoglobulin-like receptors (KIRs) some of which bind to HLA-C molecules. Certain combinations of maternal KIRs and fetal HLA-C genetic variants are associated with pregnancy disorders such as pre-eclampsia, in which trophoblast invasion is deficient. However, detailed understanding of the cellular interactions in the decidua that support early pregnancy is lacking. +In this study, we used single-cell transcriptomics to comprehensively resolve the cell states that are involved in maternal-fetal communication in the decidua, during early pregnancy when the placenta is established. We then used a computational framework to predict cell-type-specific ligand-receptor complexes and present a new database of the curated complexes (www.CellPhoneDB.org/). By integrating these predictions with spatial in situ analysis, we construct a detailed molecular and cellular map of the human decidual-placental interface. +Maternal and fetal cells in early pregnancy +We combined droplet-based encapsulation (using the 10x Genomics Chromium system) and plate-based Smart-seq2 single-cell transcriptome profiles from the maternal-fetal interface (11 deciduas and 5 placentas from 6-14 gestational weeks) and six matched peripheral blood mononuclear cells (Fig. 1a, b, Supplementary Tables 1, 2, Extended Data Fig. 1). After computational quality control and integration of transcriptomes from both technologies, we performed graph-based clustering (see Methods) of the combined dataset and used cluster-specific marker genes to annotate the clusters (Fig. 1c, Extended Data Figs. 2, 3a-d, Supplementary Table 2). We studied T cell composition and clonal expansion using full-length transcriptomes from Smart-seq2 and reconstructed the T cell receptor sequences from this data, which showed expansion of CD8 T cells in the decidua (Fig. 1d). +We aligned single-cell RNA-sequencing (scRNA-seq) reads from each cell with overlapping single nucleotide polymorphisms called from maternal and fetal genomic DNA to assign cells as fetal or maternal (Fig. 1e, Extended Data Fig. 3e). As expected, decidual samples contained mostly maternal cells with a few fetal HLA-G EVT. Fetal cells dominate the placental samples, with the exception of maternal macrophages (M3 cluster) that express CD14, S100A9, CD163, CD68 and CSF1R (Extended Data Fig. 3f). These are probably derived from blood monocytes incorporated into the syncytium. +Cell communication predicted by CellPhoneDB +To systematically study the interactions between fetal and maternal cells in the decidual-placental interface, we developed a repository (www.CellPhoneDB.org) of ligand-receptor interacting pairs that accounts for their subunit architecture, representing heteromeric complexes accurately (Extended Data Fig. 4a). Both secreted and cell-surface molecules are considered; the repository therefore encompasses ligand-receptor interactions mediated by the diffusion of secreted molecules. Our repository forms the basis of a computational approach to identify biologically relevant ligand-receptor complexes. We consider the expression levels of ligands and receptors within each cell type, and use empirical shuffling to calculate which ligand-receptor pairs display significant cell-type specificity (Extended Data Fig. 4b, see Methods). This predicts molecular interactions between cell populations via specific protein complexes, and generates a potential cell-cell communication network in the decidua and placenta (Extended Data Fig. 4c-e, Supplementary Tables 3, 4). +Trophoblast differentiation by scRNA-seq +To investigate maternal-fetal interactions at the decidual-placental interface, we first analysed fetal trophoblast cells isolated from placental and decidual samples: the latter contain invasive EVT (Extended Data Fig. 5a, b). Consistent with previous results, we resolved two distinct trophoblast differentiation pathways (Fig. 2a). As expected, decidual EVT are at the end of the trajectory, have high levels of expression of HLA-G and no longer express cell-cycle genes (Extended Data Fig. 5c). For villous cytotrophoblast cells, CellPhoneDB predicts interactions of receptors involved in cellular proliferation and differentiation (EGFR, NRP2 and MET) with their corresponding ligands expressed by other cells in the placenta. HBEGF, potentially interacting with EGFR, is expressed by Hofbauer cells, and PGF and HGF:the respective ligands of NRP2 and MET:are expressed by different placental fibroblast subsets (Fig. 2b, Supplementary Table 5). +By contrast, during EVT differentiation there is upregulation of receptors involved in immunomodulation, cellular adhesion and invasion, the ligands of which are expressed by decidual cells (Fig. 2b). For example, ACKR2 is a decoy receptor for inflammatory cytokines that are produced by maternal immune cells and CXCR6 is a chemokine receptor that binds to CXCL16 expressed by the maternal macrophages. Expression of TGFB1:the function of which is to suppress immune responses and activate epithelial-mesenchymal transitions:and its receptor increases as EVT differentiate. Components involved in the epithelial-mesenchymal-transition program are upregulated at the end of the trajectory (Extended Data Fig. 5d); these include PAPPA and PAPPA2, which encode metalloproteinases that are known to be involved in cellular invasion. In pregnancy, a decreased level of PAPPA is a biomarker for pre-eclampsia and fetal growth restriction, which are associated with defective EVT invasion. +Stromal cells in the two decidual layers +EVT initially invade through the surface epithelium into the decidua compacta. Beneath this is the decidua spongiosa that contains hyper-secretory glands, which provide histotrophic nutrition to the early conceptus. Markers that distinguish the different decidual fibroblast populations identify two clusters of perivascular cells (referred to as PV1 and PV2) that share expression of the smooth muscle marker (MGP) and are distinguished by different levels of MCAM, which is higher in PV1, and MMP11, which is higher in PV2 (Fig. 3a, Supplementary Table 6). There are three clusters of stromal cells (labelled dS1, dS2 and dS3), all of which express the WNT inhibitor DKK1. dS1 shares the expression of ACTA2 and TAGLNwith PV1 and PV2, and lacks expression of the classical decidual markers prolactin (PRL) and IGFBP1. By contrast, dS2 and dS3 express IGFBP1, IGFBP2 and IGFBP6 and share markers with two subsets of decidualized stromal cells that have recently been described in vitro. The dS3 subset expresses PRL as well as genes involved in steroid biosynthesis (for example, CYP11A1) (Extended Data Fig. 6a). +To locate the different perivascular and stromal populations in situ, we used immunohistochemistry as well as multiplexed single-molecule fluorescent in situ hybridization (smFISH) for selected markers on serial sections of decidua parietalis. These experiments confirm that cells that express ACTA2 and MCAM are present in the smooth muscle media of the spiral arteries and show that MMP11 is also present, which demonstrates that both PV1 and PV2 are perivascular (Fig. 3b). ACTA2 dS1 cells are present between glands in the decidua spongiosa, whereas IGFBP1 and PRL dS2 and dS3 cells are located in decidua compacta (Fig. 3c, d, Extended Data Fig. 7). CYP11A1 is also expressed more abundantly in decidua compacta than in decidua spongiosa (Extended Data Fig. 6b). +Our CellPhoneDB tool predicts that the cognate receptors for angiogenic factors that are expressed by PV1 and PV2 (for example, ANGPT1 and VEGFA) are located in the endothelium (Fig. 3e). EVT first invade the decidua compacta, where dS2 and dS3 express high levels of LGALS9 and CLEC2D. These molecules could interact with their respective inhibitory receptors TIM3 (also known as HAVCR2) and KLRB1:which are expressed by subsets of dNKs:enabling the stroma to suppress inflammatory reactions in the decidua. +Three decidual NK cell states +We identified three main dNK subsets (dNK1, dNK2 and dNK3), which all co-express the tissue-resident markers CD49A (also known as ITGA1) and CD9 (Extended Data Fig. 8a). dNK1 cells express CD39 (also known as ENTPD1), CYP26A1 and B4GALNT1, whereas the defining markers of dNK2 cells are ANXA1 and ITGB2; the latter is shared with dNK3 cells (Fig. 4a, Supplementary Table 7). dNK3 cells express CD160, KLRB1 and CD103 (also known as ITGAE), but not the innate lymphocyte cell marker CD127 (also known as IL7R) (Extended Data Fig. 8a). +Genes of the KIR family are polymorphic and highly homologous, which makes the quantification of mRNA expression of individual KIR genes challenging. We therefore developed 'KIRid', a method that uses full-length transcript Smart-seq2 data to map the single-cell reads of each donor to the corresponding donor-specific reference of KIR alleles (Fig. 4b, see Methods). We find that dNK1 cells express higher levels of KIRs that can bind to HLA-C molecules: inhibitory KIR2DL1, KIR2DL2 and KIR2DL3 and activating KIR2DS1 and KIR2DS4 (Fig. 4c, Supplementary Table 8). LILRB1, the receptor with high affinity for the dimeric form of HLA-G molecules, is expressed only by the dNK1 subset. Both dNK1 and dNK2:but not dNK3:express activating NKG2C (also known as KLRC2) and NKG2E (also known as KLRC3) as well as inhibitory NKG2A (also known as KLRC1) receptors for HLA-E molecules (Fig. 4c). These results predict a likely function of dNK1 in the recognition and response to EVT +To investigate these three dNK populations further, we analysed six decidual samples by flow cytometry using CD49a (expressed by resident dNKs), combined with markers for each dNK subset predicted from our transcriptomics data (CD39, ITGB2, CD103 and KIR2DL1) (Fig. 4d, Extended Data Fig. 8b). We confirmed the presence of the three dNK populations by flow cytometry and the preferential expression of KIR2DL1 in dNK1 (Fig. 4d, Supplementary Table 9). We analysed the morphology of dNK subsets by Giemsa staining of cells isolated by flow cytometric sorting (Extended Data Fig. 8c). dNK1 contains more cytoplasmic granules than dNK2 and dNK3, which is consistent with our scRNA-seq data that show higher levels of expression of PRF1, GNLY, GZMA and GZMB RNA in this subset (Fig. 4e). Higher levels of expression of the granule proteins (PRF1, GNLY, GZMA and GZMB) are found in KIR compared to KIR dNK cells by flow cytometry (Fig. 4f). dNK1 cells also express high levels of enzymes involved in glycolysis (Fig. 4g). Thus, dNK1 cells are characterized by active glycolytic metabolism, and show higher expression of KIR genes (KIR2DS1, KIR2DS4, KIR2DL1, KIR2DL2 and KIR2DL3), LILRB1 and cytoplasmic granule proteins, suggesting that it is dNK1 cells that particularly interact with EVT. +First pregnancies are associated with lower proportions of dNK cells that express LILRB1, lower birth weights and increased occurrence of disorders such as pre-eclampsia. Metabolomic programming of mature 'memory' natural killer cells also occurs in chronic human cytomegalovirus infection. Together, these findings are consistent with the 'priming' of dNK1 cells during a first pregnancy so they can respond more effectively to the implanting placenta in subsequent pregnancies. +Immunomodulation during early pregnancy +We next used CellPhoneDB to identify the expression of cytokines and chemokines by dNKs, and to predict their interactions with other cells at the maternal-fetal interface (Fig. 5a, Extended Data Fig. 9a). However, contrary to previous studies, we find no evidence for substantial VEGFA or IFNG expression by dNKs in vivo:probably because these studies used dNK cells cultured with IL-2 or IL-15 in vitro. +dNK1 cells express higher levels of CSF1, the receptor of which (CSF1R) is expressed by EVT and macrophages (Fig. 5a, b). Secretion of CSF1 by dNK cells and interaction with the CSF1R on EVT have previously been described, and we now pinpoint this interaction specifically to the dNK1 subset. By contrast, dNK2 and dNK3 express high levels of XCL1, and CCL5 is highly expressed by dNK3 (Fig. 5a, b, Extended Data Fig. 9b). CCR1, the receptor for CCL5, is expressed by EVT, which suggests a role for dNK3 in regulating EVT invasion. The expression pattern of the XCL1-XCR1 ligand-receptor complex suggests functional interactions between dNK2 and dNK3 and both EVT and conventional DC1 (labelled as DC1). DC1 recruitment, which is mediated by natural killer cells, occurs in tumour microenvironments. We find an increased proportion of DC1 compared to DC2:which possibly leads to the expansion of decidual CD8 T cells (Fig. 1d):but co-expression of PD1 (also known as PDCD1) suggests that local T cell activation is limited. +Our results collectively suggest that in the decidua microenvironment all damaging maternal T or natural killer cell responses to fetal trophoblast cells are prevented. There is high expression of PDL1 (also known as CD274) in EVT, which we confirmed in situ by using immunohistochemistry on serial sections of decidua basalis (the site of trophoblast invasion) stained for PDL1 and HLA-G (Extended Data Fig. 9c). We also identified putative inhibitory interactions between dNKs and EVT, in addition to the previously discussed receptor-ligand complexes between KIR2DL1, KIR2DL2 or KIR2DL3 and HLA-C. These include KLRB1 and TIGIT, which are highly expressed by dNK3 cells, potentially binding CLEC2D and PVR, which are expressed by EVT (Fig. 5a). +We predict that the immune microenvironment of the decidua prevents inflammatory responses that could potentially be triggered by trophoblast invasion and destruction of the smooth muscle media of the spiral arteries by trophoblast (Fig. 5c). Subsets of decidual macrophages express immunomodulatory molecules such as IL10, the receptor of which is expressed by EVT and by maternal endothelial, stromal and myeloid cells. dNK1 cells express high levels of SPINK2, and dNK2 and dNK3 cells express high levels of ANXA1. Both of these genes encode proteins that have anti-inflammatory roles, such as inhibiting kallikreins. The dNK1 subset expresses CD39 (which is encoded by ENTPD1), which:together with CD73 (which is encoded by NT5E):converts ATP to adenosine to prevent immune activation (Fig. 5c, Extended Data Fig. 9b). Expression of CD73 is high in epithelial glands and EVT, and the adenosine receptor (ADORA3) is present in macrophages (Fig. 5c, Extended Data Fig. 9b). KIR2DL1 dNK1 cells are in close physical contact with HLA-G EVT (Extended Data Fig. 9d), which suggests that together they could convert extracellular ATP:an inflammatory signal released upon cell death:to adenosine. +Discussion +Reproductive success depends on events that occur during placentation in the first-trimester decidua. Other scRNA-seq studies of uterine cells in pregnancy have analysed cells at the end of gestation or are restricted to fetal placental populations. To our knowledge, our study is the first comprehensive single-cell transcriptomics atlas of the maternal-fetal interface between 6-14 weeks of gestation (Extended Data Fig. 10). Similar to previous scRNA-seq analyses, we predict possible ligand-receptor interactions; we have developed an open repository for this purpose (www.CellPhoneDB.org/). This database accounts for the multimeric nature of ligands and receptors and is integrated with a statistical framework that predicts enriched cellular interactions between two cell types. +We show the differentiation trajectory of trophoblast cells to either villous syncytiotrophoblast (which is involved in nutrient exchange) or EVT (which invade and remodel the spiral arteries), and predict the ligand-receptor interactions that are likely to control these processes. +Our findings also suggest an environment in which any adaptive or innate immune responses that are harmful to the placenta or to the uterus are minimized. This is critical for the compromise that is needed to define the territorial boundary between mother and fetus. This environment has notable parallels with that around tumours, where inflammatory and adaptive immune responses are also dampened. dNK cells comprise about 70% of immune cells in the first-trimester decidua: we define three major subsets of dNK cells and predict that their likely function is to mediate the extent of trophoblast invasion, in addition to coordinating multiple immunomodulatory pathways that involve myeloid cells, T cells and stromal cells. Maternal immune responses are restrained by diverse classes of signalling molecules: cell-surface expression of checkpoint inhibitors such as PD1, PDL1 or TIGIT, tethered ligand-receptor complexes, secreted proteins, and small molecules such as adenosine or steroid hormones. We also show that the dNK1 subset expresses receptors for trophoblast HLA-C, HLA-E and HLA-G molecules, and can be primed metabolically through increased expression of glycolytic enzymes. The increased expression of glycolytic enzymes in dNK1 cells (which represents metabolic priming) suggests that these cells could be responsible for the different reproductive outcomes found in first compared to subsequent pregnancies. +In summary, we identify many molecular and cellular mechanisms that operate to generate a physiologically peaceful decidual environment. This cell atlas of the early maternal-fetal interface provides an essential resource for understanding normal and pathological pregnancies. +Methods +No statistical methods were used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment. +Patient samples. All tissue samples used for this study were obtained with written informed consent from all participants in accordance with the guidelines in The Declaration of Helsinki 2000 from multiple centres. +Human embryo, fetal and decidual samples were obtained from the MRC and Wellcome-funded Human Developmental Biology Resource (HDBR, http://www.hdbr.org), with appropriate maternal written consent and approval from the Newcastle and North Tyneside NHS Health Authority Joint Ethics Committee (08/H0906/21 5). The HDBR is regulated by the UK Human Tissue Authority (HTA; www.hta.gov.uk) and operates in accordance with the relevant HTA Codes of Practice. Decidual tissue for smFISH (Extended Data Fig. 7c) was also covered by this ethics protocol. +Peripheral blood from women undergoing elective terminations was collected under appropriate maternal written consent and with approvals from the Newcastle Academic Health Partners (reference NAHPB-093) and HRA NHS Research Ethics committee North-East-Newcastle North Tyneside 1 (REC reference 12/NE/0395) +Decidual tissue for immunohistochemistry (Fig. 3b, c, Extended Data Figs. 7a, 9c, d) and flow cytometry staining for granule proteins was obtained from elective terminations of normal pregnancies at Addenbrooke's Hospital (Cambridge) between 6 and 12 weeks gestation, under ethical approval from the Cambridge Local Research Ethics Committee (04/Q0108/23). +Decidual tissue for smFISH (Fig. 3d, Extended Data Fig. 6b, 7b) was obtained from the Newcastle Uteroplacental Tissue Bank. Ethics numbers are: Newcastle and North Tyneside Research Ethics Committee 1 Ref:10/H0906/71 and 16/NE/0167. +Isolation of decidual, placental and blood cells. Decidual and placental tissue was washed in Ham's F12 medium, macroscopically separated and then washed for at least 10 min in RPMI or Ham's F12 medium, respectively, before processing. +Decidual tissues were chopped using scalpels into approximately 0.2-mm3 cubes and enzymatically digested in 15 ml 0.4 mg/ml collagenase V (Sigma, C-9263) solution in RPMI 1640 medium (Thermo Fisher Scientific, 21875-034)/10% FCS (Biosfera, FB-1001) at 37 C for 45 min. The supernatant was diluted with medium and passed through a 100- m cell sieve (Corning, 431752) and then a 40- m cell sieve (Corning, 431750). The flow-through was centrifuged and resuspended in 5 ml of red blood cell lysis buffer (Invitrogen, 00-4300) for 10 min. +Each first-trimester placenta was placed in a Petri dish and the placental villi were scraped from the chorionic membrane using a scalpel. The stripped membrane was discarded and the resultant villous tissue was enzymatically digested in 70 ml 0.2% trypsin 250 (Pan Biotech P10-025100P)/0.02% EDTA (Sigma E9884) in PBS with stirring at 37 C for 9 min. The disaggregated cell suspension was passed through sterile muslin gauze (Winware food grade) and washed through with Ham's F12 medium (Biosera SM-H0096) containing 20% FBS (Biosera FB-1001). Cells were pelleted from the filtrate by centrifugation and resuspended in Ham's F12. The undigested gelatinous tissue remnant was retrieved from the gauze and further digested with 10-15 ml collagenase V at 1.0 mg/ml (Sigma C9263) in Ham's F12 medium/10% FBS with gentle shaking at 37 C for 10 min. The disaggregated cell suspension from collagenase digestion was passed through sterile muslin gauze and the cells pelleted from the filtrate as before. Cells obtained from both enzyme digests were pooled together and passed through a 100- m cell sieve (Corning, 431752) and washed in Ham's F12. The flow-through was centrifuged and resuspended in 5 ml of red blood cell lysis buffer (Invitrogen, 00-4300) for 10 min. +Blood samples were carefully layered onto a Ficoll-Paque gradient (Amersham) and centrifuged at 2,000 r.p.m. for 30 min without breaks. Peripheral blood mononuclear cells from the interface between the plasma and the Ficoll-Paque gradient were collected and washed in ice-cold phosphate-buffered saline (PBS), followed by centrifugation at 2,000 r.p.m. for 5 min. The pellet was resuspended in 5 ml of red blood cell lysis buffer (Invitrogen, 00-4300) for 10 min. +Assignment of fetal developmental stage. Up to eight post-conception weeks, embryos are staged using the Carnegie staging method. At fetal stages beyond eight post-conception weeks, age was estimated from measurements of foot length and heel-to-knee length. These were compared with a standard growth chart. +Flow cytometry staining, cell sorting and single-cell RNA-seq. Decidual and blood cells were incubated at 4 C with 2.5 l of antibodies in 1% FBS in DPBS without calcium and magnesium (Thermo Fisher Scientific, 14190136). DAPI was used for live versus dead discrimination. We used an antibody panel designed to enrich for certain populations for single-cell sorting and scRNA-seq. Cells were sorted using a Becton Dickinson (BD) FACS Aria Fusion with 5 excitation lasers (355 nm, 405 nm, 488 nm, 561 nm and 635 nm red), and 18 fluorescent detectors, plus forward and side scatter. The sorter was controlled using BD FACS DIVA software (version 7). The antibodies used are listed in Supplementary Table 10. +For single-cell RNA-seq using the plate-based Smart-seq2 protocol, we created overlapping gates that comprehensively and evenly sampled all immune-cell populations in the decidua (Extended Data Fig. 1). B cells (CD19 or CD20) were excluded from our analysis, owing to their absence in decidua. Single cells were sorted into 96-well full-skirted Eppendorf plates chilled to 4 C, prepared with lysis buffer consisting of 10 l of TCL buffer (Qiagen) supplemented with 1% -mercaptoethanol. Single-cell lysates were sealed, vortexed, spun down at 300g at 4 C for 1 min, immediately placed on dry ice and transferred for storage at 80 C. The Smart-seq2 protocol was performed on single cells as previously described, with some modifications. Libraries were sequenced, aiming at an average depth of 1 million reads per cell, on an Illumina HiSeq 2000 with version 4 chemistry (paired-end, 75-bp reads). +For the droplet scRNA-seq methods, blood and decidual cells were sorted into immune (CD45) and non-immune (CD45) fractions. B cells (CD19 or CD20) were excluded from blood analysis, owing to their absence in decidua. Only viable cells were considered. Placental cells were stained for DAPI and only viable cells were sorted. To improve trophoblast trajectories, an additional enrichment of EPCAM and HLA-G was performed for selected samples (Fig. 2 only). Cells were sorted into an Eppendorf tube containing PBS with 0.04% BSA. Cells were immediately counted using a Neubauer haemocytometer and loaded in the 10x-Genomics Chromium. The 10x-Genomics v2 libraries were prepared as per the manufacturer's instructions. Libraries were sequenced, aiming at a minimum coverage of 50,000 raw reads per cell, on an Illumina HiSeq 4000 (paired-end; read 1: 26 cycles; i7 index: 8 cycles, i5 index: 0 cycles; read 2: 98 cycles). +Flow cytometry staining for granule proteins. For intracellular staining of granule proteins, dNKs were surface-stained for 30 min in FACS buffer with antibodies (listed in Supplementary Table 10). Cells were washed with FACS buffer followed by staining with dead cell marker (DCM Aqua) and streptavidin Qdot605. dNKs were then treated with FIX & PERM (Thermo Fisher Scientific) and stained for granule proteins. Samples were run on an LSRFortessa FACS analyser (BD Biosciences) and data analysed using FlowJo (Tree Star). dNKs were gated as CD3 CD14 CD19 live cells; CD56 NKG2A and then KIR and KIR subsets were generated using Boolean functions with the gates for all the different KIRs stained (KIR), and their inverse gates (KIR). Wilcoxon test was used to compare granule protein staining between paired dNK subsets from the same donor. A P value 0.05 was considered to be statistically significant. +Immunohistochemistry. Four-micrometre tissue sections from formalin-fixed, paraffin-wax-embedded human decidual and placental tissues were dewaxed with Histoclear, cleared in 100% ethanol and rehydrated through gradients of ethanol to PBS. Sections were blocked with 2% serum (of species in which the secondary antibody was made) in PBS, incubated with primary antibody overnight at 4 C and slides were washed in PBS. Biotinylated horse anti-mouse or goat anti-rabbit secondary antibodies were used, followed by Vectastain ABC-HRP reagent (Vector, PK-6100) and developed with di-aminobenzidine (DAB) substrate (Sigma, D4168). Sections were counterstained with Carazzi's haematoxylin and mounted in glycerol and gelatin mounting medium (Sigma, GG1-10). Primary antibody was replaced with equivalent concentrations of mouse or rabbit IgG for negative controls. See Supplementary Table 10 for antibody information. Tissue sections were imaged using a Zeiss Axiovert Z1 microscope and Axiovision imaging software SE64 version 4.8. +smFISH. Samples were fixed in 10% NBF, dehydrated through an ethanol series and embedded in paraffin wax. Five-millimetre samples were cut, baked at 60 C for 1 h and processed using standard pre-treatment conditions, as per the RNAScope multiplex fluorescent reagent kit version 2 assay protocol (manual) or the RNAScope 2.5 LS fluorescent multiplex assay (automated). TSA-plus fluorescein, Cy3 and Cy5 fluorophores were used at 1:1,500 dilution for the manual assay or 1:300 dilution for the automated assay. Slides were imaged on different microscopes: Hamamatsu Nanozoomer S60 (Extended Data Fig. 7c). Zeiss Cell Discoverer 7 (Fig. 4d, Extended Data Figs. 6, 7c). Filter details were as follows. DAPI: excitation 370-400, BS 394, emission 460-500; FITC: excitation 450-488, BS 490, emission 500-55; Cy3: excitation 540-570, BS 573, emission 540-570; Cy5: excitation 615-648, BS 691, emission 662-756. The camera used was a Hamamatsu ORCA-Flash4.0 V3 sCMOS camera. +Whole-genome sequencing. Tissue DNA and RNA were extracted from fresh-frozen samples using the AllPrep DNA/RNA/miRNA kit (Qiagen), following the manufacturer's instructions. Short insert (500-bp) genomic libraries were constructed, flowcells were prepared and 150-bp paired-end sequencing clusters generated on the Illumina HiSeq X platform, according to Illumina no-PCR library protocols, to an average of 30 coverage. Genotype information is provided in Supplementary Table 1. +Single cell RNA-seq data analysis. Droplet-based sequencing data were aligned and quantified using the Cell Ranger Single-Cell Software Suite (version 2.0, 10x Genomics) against the GRCh38 human reference genome provided by Cell Ranger. Cells with fewer than 500 detected genes and for which the total mito-chondrial gene expression exceeded 20% were removed. Mitochondrial genes and genes that were expressed in fewer than three cells were also removed. +SmartSeq2 sequencing data were aligned with HISAT2, using the same genome reference and annotation as the 10x Genomics data. Gene-specific read counts were calculated using HTSeq-count. Cells with fewer than 1,000 detected genes and more than 20% mitochondrial gene expression content were removed. Furthermore, mitochondrial genes and genes expressed in fewer than three cells were also removed. To remove batch effects due to background contamination of cell free RNA, we also removed a set of genes that had a tendency to be expressed in ambient RNA (PAEP, HBG1, HBA1, HBA2, HBM, AHSP and HBG2). +Downstream analyses:such as normalization, shared nearest neighbour graph-based clustering, differential expression analysis and visualization:were performed using the R package Seurat (version 2.3.3). Droplet-based and SmartSeq2 data were integrated using canonical correlation analysis, implemented in the Seurat alignment workflow. Cells, the expression profile of which could not be well-explained by low-dimensional canonical correlation analysis compared to low-dimensional principal component analysis, were discarded, as recommended by the Seurat alignment tutorial. Clusters were identified using the community identification algorithm as implemented in the Seurat 'FindClusters' function. The shared nearest neighbour graph was constructed using between 5 and 40 canonical correlation vectors as determined by the dataset variability; the resolution parameter to find the resulting number of clusters was tuned so that it produced a number of clusters large enough to capture most of the biological variability. UMAP analysis was performed using the RunUMAP function with default parameters. Differential expression analysis was performed based on the Wilcoxon rank-sum test. The P values were adjusted for multiple testing using the Bonferroni correction. Clusters were annotated using canonical cell-type markers. Two clusters of peripheral blood monocytes represented the same cell type and were therefore merged. +We further removed contaminating cells: (i) maternal stromal cells that were gathered in the placenta for one of the fetuses; (ii) a shared decidual-placental cluster with fetal cells mainly present in two fetuses (which we think is likely to be contaminating cells from other fetal tissues due to the surgical procedure). This can occur owing to the source of the tissue and the trauma of surgery. We also removed a cluster for which the top markers were genes associated with dissociation-induced effects. Each of the remaining clusters contained cells from multiple different fetuses, indicating that the cell types and states we observed are not affected by batch effects. +We found further diversity within the T cell clusters, as well as the clusters of endothelial, epithelial and perivascular cells, which we then reanalysed and partitioned separately, using the same alignment and clustering procedure. +The trophoblast clusters (clusters 1, 9, 20, 13 and 16 from Fig. 1d) were taken from the initial analysis of all cells and merged with the enriched EPCAM and HLA-G cells. The droplet-based and Smart-seq2 datasets were integrated and clustered using the same workflow as described above. Only cells that were identified as trophoblast were considered for trajectory analysis. +Trajectory modelling and pseudotemporal ordering of cells was performed with the monocle 2 R package (version 2.8.0). The most highly variable genes were used for ordering the cells. To account for the cell-cycle heterogeneity in the trophoblast subpopulations, we performed hierarchical clustering of the highly variable genes and removed the set of genes that cluster with known cell-cycle genes such as CDK1. Genes which changed along the identified trajectory were identified by performing a likelihood ratio test using the function differentialGeneTest in the monocle 2 package. +Network visualization was done using Cytoscape (version 3.5.1). The decidual network was created considering only edges with more than 30 interactions. The networks layout was set to force-directed layout. +KIR typing. Polymerase chain reaction sequence-specific primer was performed to amplify the genomic DNA for presence or absence of 12 KIR genes (KIR2DL1, KIR2DL2, KIR2DL3, KIR2DL5 (both KIR2DL5A and KIR2DL5B),KIR3DL1, KIR2DS1, KIR2DS2, KIR2DS3, KIR2DS4, KIR2DS5 and KIR3DS1) and the pseudogene KIR2DP1. KIR2DS4 alleles were also typed as being either full-length or having the 22-bp deletion that prevents cell-surface expression. Two pairs of primers were used for each gene, selected to give relatively short amplicons of 100-800 bp, as previously described. Extra KIR primers were designed using sequence information from the IPD-KIR database (release 2.4.0) to detect rare alleles of KIR2DS5 and KIR2DL3 (KIR2DS5, 2DS5rev2: TCC AGA GGG TCA CTG GGA and KIR2DL3, 2DL3rev3: AGA CTC TTG GTC CAT TAC CG). KIR haplotypes were defined by matrix subtraction of gene copy numbers using previously characterized common and contracted KIR haplotypes using the KIR Haplotype Identifier software (www.bioinformatics.cimr.cam.ac.uk/haplotypes). +Inferring maternal or fetal origin of single cells from droplet-based scRNA-seq using whole-genome sequencing variant calls. To match the processing of the whole-genome sequencing datasets, droplet-based sequencing data from decidua and placenta samples were realigned and quantified against the GRCh37 human reference genome using the Cell Ranger Single-Cell Software Suite (version 2.0). The fetal or maternal origin of each barcoded cell was then determined using the tool demuxlet. In brief, demuxlet can be used to deconvolve droplet-based scRNA-seq experiments in which cells are pooled from multiple genetically distinct individuals. Given a set of genotypes corresponding to these individuals, demuxlet infers the most likely genetic identity of each droplet by estimating the likelihood of observing scRNA-seq reads from the droplet overlapping known single nucleotide polymorphisms. Demuxlet inferred the identities of cells in this study by analysing each Cell Ranger-aligned BAM file from decidua and placenta in conjunction with a VCF file, containing the high-quality whole-genome-sequence variant calls from the corresponding mother and fetus. Each droplet was assigned to be maternal, fetal or unknown in origin (ambiguous or a potential doublet), and these identities were then linked with the transcriptome-based cell clustering data to confirm the maternal and fetal identity of each annotated cell type. +T cell receptor analysis by TraCeR. The T cell receptor sequences for each single T cell were assembled using TraCeR, which allowed the reconstruction of the T cell receptors from scRNA-seq data and their expression abundance (transcripts per million), as well as identification of the size, diversity and lineage relation of clonal subpopulations. In total, we obtained the T cell receptor sequences for 1,482 T cells with at least one paired productive or chain. Cells for which more than two recombinants were identified for a particular locus were excluded from further analysis. +Whole-genome sequencing alignment and variant calling. Maternal and fetal whole-genome sequencing data were mapped to the GRCh37.p13 reference genome using BWA-MEM version 0.7.15. The SAMtools fixmate utility (version 1.5) was used to update read-pairing information and mate-related flags. Reads near known indels from the Mills and 1000G gold standard reference set for hg19/GRCh37 were locally realigned using GATK IndelRealigner version 3.7. Base-calling assessment and base-quality scores were adjusted with GATK BaseRecalibrator and PrintReads version 3.7. PCR duplicates were identified and removed using Picard MarkDuplicates version 2.14.1. Finally, bcftools mpileup and call version 1.6 were used to produce genotype likelihoods and output called variants at all known biallelic single nucleotide polymorphism sites that overlap protein-coding genes. For each sample, variants called with phred-scale quality score 200, at least 20 supporting reads and mapping quality 60 were retained as high-quality variants. +Quantification of KIR gene expression by KIRid. The KIR locus is highly polymorphic in terms of both numbers of genes and alleles. Including a single reference sequence for each gene can lead to reference bias for donors that happen to better match the reference sequence. To address these issues, we used a tailored approach in which we first built a total cDNA reference by concatenating the Ensembl coding and non-coding transcript sequences, excluding transcripts belonging to the KIR genes (GRCh38, version 90), and the full set of known KIR cDNAs sequences from the IPD-KIR database (release 2.7.0). For each donor, we removed transcript sequences for KIR genes determined to be absent in that individual, which decreases the extent of multi-mapping and quantification. The single-cell reads of each donor were then mapped to the corresponding donor-specific reference using Kallisto (version 0.43.0 with default options). Expression levels were quantified using the multi-mapping deconvolution tool MMSEQ, and gene-level estimates were obtained by aggregating over different alleles for each KIR gene. +Cell-cell communication analysis. To enable a systematic analysis of cell-cell communication molecules, we developed CellPhoneDB, a public repository of ligands, receptors and their interactions. Our repository relies on the use of public resources to annotate receptors and ligands. We include subunit architecture for both ligands and receptors, to accurately represent heteromeric complexes. +Ligand-receptor pairs are defined based on physical protein-protein interactions (see sections of 'CellPhoneDB annotations'). We provide CellPhoneDB with a user-friendly web interface at www.CellPhoneDB.org, where the user can search for ligand-receptor complexes and interrogate their own single-cell transcriptomics data. +To assess cellular crosstalk between different cell types, we used our repository in a statistical framework for inferring cell-cell communication networks from single-cell transcriptome data. We derived enriched receptor-ligand interactions between two cell types based on expression of a receptor by one cell type and a ligand by another cell type, using the droplet-based data. To identify the most relevant interactions between cell types, we looked for the cell-type specific interactions between ligands and receptors. Only receptors and ligands expressed in more than 10% of the cells in the specific cluster were considered. +We performed pairwise comparisons between all cell types. First, we randomly permuted the cluster labels of all cells 1,000 times and determined the mean of the average receptor expression level of a cluster and the average ligand expression level of the interacting cluster. For each receptor-ligand pair in each pairwise comparison between two cell types, this generated a null distribution. By calculating the proportion of the means which are 'as or more extreme' than the actual mean, we obtained a P value for the likelihood of cell-type specificity of a given receptor-ligand complex. We then prioritized interactions that are highly enriched between cell types based on the number of significant pairs, and manually selected biologically relevant ones. For the multi-subunit heteromeric complexes, we required that all subunits of the complex are expressed (using a threshold of 10%), and therefore we used the member of the complex with the minimum average expression to perform the random shuffling. +CellPhoneDB annotations of membrane, secreted and peripheral proteins. Secreted proteins were downloaded from Uniprot using KW-0964 (secreted). Secreted proteins were annotated as cytokines (KW-0202), hormones (KW-0372), growth factors (KW-0339) and immune-related using Uniprot keywords and manual annotation. Cytokines, hormones, growth factors and other immune-related proteins were annotated as 'secreted highlight' proteins in our lists. +Plasma membrane proteins were downloaded from Uniprot using KW-1003 (cell membrane). Peripheral proteins from the plasma membrane were annotated using the Uniprot Keyword SL-9903, and the remaining proteins were annotated as transmembrane proteins. We completed our lists of plasma transmembrane proteins by doing an extensive manual curation using literature mining and Uniprot description of proteins with transmembrane and immunoglobulin-like domains. +Plasma membrane proteins were annotated as receptors and transporters. Transporters were defined by the Uniprot keyword KW-0813. Receptors were defined by the Uniprot keyword KW-0675. The list of receptors was extensively reviewed and new receptors were added based on Uniprot description and bibliography revision. Receptors involved in immune-cell communication were carefully annotated. +Protein lists are available at https://www.cellphonedb.org/downloads. Three columns indicate whether the protein has been manually curated: 'tags', 'tags_ description', 'tags_reason'. +The tags column is related to the manual curation of a protein, and contains three options: (i) 'N/A', which indicates that the protein has not been manually curated; (ii) 'To_add', which indicates that secreted and/or plasma membrane protein annotation has been added; and (iii) 'To_comment', which indicates that the protein is either secreted (KW-0964) or membrane-associated (KW-1003) but that we manually added a specific property of the protein (that is, the protein is annotated as a receptor). +tags_reason is related to the protein properties, and contains five options: (i) 'extracellular_add', which indicates that the protein is manually annotated as plasma membrane; (ii) 'peripheral_add', which indicates that the protein is manually annotated as a peripheral protein instead of plasma membrane; (iii) 'secreted_add', which indicates that the protein is manually annotated as secreted; (iv) 'secreted_high', which indicates that the protein is manually annotated as secreted highlight. For cytokines, hormones, growth factors and other immune-related proteins; option (v) 'receptor_add' indicates that the protein is manually annotated as a receptor. +tags_description is a brief description of the protein, function or property related to the manually curated protein. +CellPhoneDB annotations of heteromeric receptors and ligands. Heteromeric receptors and ligands (that is, proteins that are complexes of multiple gene products) were annotated by reviewing the literature and Uniprot descriptions. Cytokine complexes, TGF family complexes and integrin complexes were carefully annotated. +If heteromers are defined in the RCSB Protein Data Bank (http://www.rcsb.org/), structural information is included in our CellPhoneDB annotation. Heteromeric complex lists are available at www.CellPhoneDB.org. +CellPhoneDB annotations of interactions. The majority of ligand-receptor interactions were manually curated by reviewing Uniprot descriptions and PubMed information on membrane receptors. Cytokine and chemokine interactions are annotated following the International Union of Pharmacology annotation. Other groups of cell-surface proteins the interactions of which were manually reviewed include the TGF family, integrins, lymphocyte receptors, semaphorins, ephrins, Notch and TNF receptors. +In addition, we considered interacting partners as: (i) binary interactions annotated by IUPHAR (http://www.guidetopharmacology.org/) and (ii) cytokines, hormones and growth factors interacting with receptors annotated by the iMEX consortium (https://www.imexconsortium.org/). +We excluded from our analysis transporters and a curated list of proteins including: (i) co-receptors; (ii) nerve-specific receptors such as those related to ear-binding, olfactory receptors, taste receptors and salivary receptors, (iii) small molecule receptors, (iv) immunoglobulin chains, (v) pseudogenes and (vi) viral and retro-viral proteins, pseudogenes, cancer antigens and photoreceptors. These proteins are annotated as 'others' in the protein list. We also excluded from our analysis a list of interacting partners not directly involved in cell-cell communication. The 'remove_interactions' list is available in https://www.cellphonedb.org/downloads. +Lists of interacting protein chains are available from https://www.cellphonedb.org/downloads. The column labelled 'source' indicates the curation source. Manually curated interactions are annotated as 'curated', and the bibliography used to annotate the interaction is stored in 'comments_interaction'. 'Uniprot' indicates that the interaction has been annotated using UniProt descriptions. +Linking Ensembl and Uniprot identification. We assigned to the custom-curated interaction list all the Ensembl gene identifications by matching information from Uniprot and Ensembl by the gene name. +Database structure. Information is stored in a PostgreSQL relational database (www.postgresql.org). SQLAlchemy (www.sqlalchemy.org) and Python 3 were used to build the database structure and the query logic. All the code is open source and uploaded to the webserver. +Extended Data +Gating strategy for Smart-seq2 data. +a, Gating strategy for a panel of 14 antibodies to analyse immune cells in decidual samples by Smart-seq2 (CD3, CD4, CD8, CD9, CD14, CD16, CD19, CD20, CD34, CD45, CD56, CD94, DAPI, HLA-DR and HLA-G). Cells isolated for Smart-seq2 data were gated on live; CD19- and CD20-negative, singlets and the following cell types were sorted: (i) CD45 CD14highHLA-DRhigh; (ii) CD45 HLA-DR ; (iii) CD45 HLA-DR CD56 CD3 CD4 CD8 ; (iv) CD45 HLA-DR CD56 CD3 CD8 ; (v) CD45 HLA-DR CD56 CD3 CD4 CD8 ; (vi) CD45 HLA-DR CD3 CD56 CD94 (labelled 'all -' on the figure); (vii) CD45 HLA-DR CD3 CD56 CD94 ; (viii) autofluorescence; (ix) CD45 HLA-DR CD3 CD56 CD94 CD9 ; (x) CD45 HLA-DR CD3 CD56 CD94 CD9 ; (xi) CD45 HLA-G ; (xii) CD45 HLA-G. Sample F9 is shown as an example. Cells from different gates were sorted in different plates: myeloid cells (gates (i) and (ii)); T cells (gates (iii), (iv) and (v)); natural killer cells (gates (vi), (vii), (viii), (ix) and (x)); CD45 (gates (xi) and (xii)). Antibody information is provided in Supplementary Table 10. +Quality control of droplet and Smart-seq2 datasets. +a, Histograms show the distribution of the cells from the Smart-seq2 dataset ordered by number of detected genes and mitochondrial gene expression content. b, Histograms show the distribution of the cells from the droplet-based dataset ordered by number of detected genes and mitochondrial gene expression content. c, Total numbers of cells that passed the quality control, processed by Smart-seq2 and droplet scRNA-seq. Each row is a separate donor. d, Canonical correlation vectors (CC1 and CC2) of integrated analysis of decidual and placental cells from the Smart-seq2 (n 5 deciduas, n 2 peripheral blood samples) and droplet-based datasets (n 5 placentas, n 6 deciduas and n 4 blood samples), coloured on the basis of their assignment to clusters and the technology that was used for scRNA-seq. +Overview of droplet and Smart-seq2 datasets. +a, UMAP plot showing the integration of the Smart-seq2 and droplet-based dataset and the log-transformed expression of MKI67 (which marks proliferating cells). b, UMAP plots showing the separate and more-detailed integration analysis of the cells from cluster 14 (perivascular cells), cluster 19 (endothelial cells) and cluster 25 (epithelial cells). Clusters are labelled as in Fig. 1c. c, UMAP visualization of T cell clusters obtained by integrating Smart-seq2 and droplet-based T cells subpopulations (clusters 4, 8, 10 and 15) from Fig. 1c. Cells are coloured by the tissue of origin (top) and the identified clusters (bottom). d, Heat map showing the z-score of the mean log-transformed, normalized counts for each cluster of selected marker genes used to annotate clusters. For a more extensive set of genes, see Supplementary Table 2. Adjusted P value 0.1; Wilcoxon rank-sum test with Bonferroni correction. NK, natural killer cells; NKp, proliferating natural killer cells; MO, monocytes; Granulo, granulocytes; Treg, regulatory T cells; GD, T cells; CD8c, cytotoxic CD8 T cells; Plasma, plasma cells. e, log-likelihood differences between assignment to fetal versus assignment to maternal origin of cells, on the basis of single nucleotide polymorphism calling from the droplet RNA-seq data. Cells are coloured by their assignment as determined by demuxlet. For this figure, we used n 5 placentas, n 6 deciduas and n 4 blood individuals. f, UMAP visualization of the log-transformed, normalized expression of selected marker genes of the M3 subpopulation. +Cell-cell communication networks in the maternal-fetal interface using CellPhoneDB. +a, Information aggregated within www.CellPhoneDB.org. b, Statistical framework used to infer ligand-receptor complex specific to two cell types from single-cell transcriptomics data. Predicted P values for a ligand-receptor complex across two cell clusters are calculated using permutations, in which cells are randomly re-assigned to clusters (see Methods) c, Networks visualizing potential specific interactions in the decidua, in which nodes are clusters (cell types) and edges represent the number of significant ligand-receptor pairs. The network was created for edges with more than 30 interactions and the network layout was set to force-directed layout. Only droplet data were considered for the CellPhoneDB analysis (n 6 deciduas). d, Networks visualizing potential specific interactions in the placenta, in which nodes are clusters and edges represent the number of significant ligand-receptor pairs. The network layout was set to force-directed layout. Only droplet data were considered for the analysis (n 5 placentas). e, An example of significant interactions identified by CellPhoneDB. Violin plots show log-transformed, normalized expression levels of the components of the IL6-IL6R complex in placental cells. IL6 expression is enriched in the fibroblast 2 cluster (F2; dark brown in d) and the two subunits of the IL6 receptors (IL6R and IL6ST) are co-expressed in Hofbauer cells. +Trophoblast analysis. +a, UMAP visualization of the integrated analysis of the trophoblast subpopulations that were used for pseudotime analysis, including the enriched EPCAM and HLA-G cells (see Methods). Cells that were excluded from the pseudotime analysis are coloured in grey (n 5 placentas, n 11 deciduas). b, UMAP visualization of the log-transformed, normalized expression of selected canonical trophoblast marker genes (n 5 placentas). c, Visualization of log-transformed, normalized expression of HLA-G, MKI67 and LGALS13 across trophoblast differentiation. d, Heat map showing genes that are involved in the epithelial-mesenchymal transition, identified as varying significantly as EVT differentiate (q value 0.1, likelihood ratio test, P values were adjusted for the false discovery rate). +Steroid synthesis. +a, Heat map showing relative expression of enzymes involved in cholesterol and steroid synthesis in the three stromal subsets (n 11 deciduas). b, Multiplexed smFISH in two decidua parietalis sections from two different individuals, showing an enrichment of CYP11A1 expression in the decidua compacta. Section stained by CYP11A1, LDLR and DAPI. Images are shown at 40 magnification. A high resolution is needed to detect differences between the sections (n 2 individuals). +In situ staining for the different stromal cells. +a, Immunohistochemistry of decidual serial sections stained for cytokeratin (uterine glands), CD34 (endothelial cells), ACTA2 (perivascular populations and dS1) and IGFBP1 (stromal cells and glandular secretions) (n 2 biological replicates). ACTA2 stromal cells are confined to the stromal cells of the deeper decidua spongiosa, whereas stromal cells in the decidua compacta are ACTA2. IGFBP1 stromal cells are enriched in the decidua compacta, whereas stromal cells around the glands in the decidua spongiosa are IGFBP1. Glandular secretions are IGFBP1. b, Multiplexed smFISH for a decidua parietalis section showing the two decidual layers. ACTA2, dS1 population confined to decidua spongiosa; IGBP1 and PRL, dS2 and dS3 populations confined to decidua compacta. Samples shown are from a different individual than samples shown in Fig. 4d (n 2 biological replicates). c, Multiplexed smFISH for a decidua parietalis section showing the two decidual layers. DKK1, decidual stromal marker; ACTA2, dS1 population confined to decidua spongiosa; PRL, dS3 population confined to decidua compacta (n 1 biological replicate). +Lymphocyte populations in the decidua. +Heat map showing z-scores of the mean log-transformed, normalized expression of selected genes in the lymphocyte populations. Proliferating dNK cells (dNKp) are excluded from the analysis (n 11 deciduas). b, FACS gating strategy in Fig. 5 applied in matched blood. Matched blood for the sample shown in Fig. 5 (n 2 biological replicates). c, Morphology of dNK1, dNK2 and dNK3 subsets by Giemsa-Wright stain after cytospin (representative data from 1 of n 2 biological replicates are shown). Scale bar, 10 m. +Expression of ligands and receptors at the maternal-fetal interface. +a, Heat map showing z-scores of the mean log-transformed, normalized expression of genes annotated as cytokines, growth factors, hormones and angiogenic factors with a log-mean 0.1 in the selected decidual immune populations (n 11 deciduas). b, Violin plots showing log-transformed, normalized expression levels of selected ligands expressed in the three dNK cells and their corresponding receptors expressed on other decidual cells and EVT (CD39, CD73, ADORA3, CSF1, CSF1R, CCL5, CCR1, XCL1 and XCR1; n 11 deciduas, n 5 placentas) c, Immunohistochemistry images of serial decidual sections stained for the EVT marker HLA-G and the inhibitory ligand PDL1. Bottom panels shown the areas in white boxes in the top panels at higher power. HLA-G cells are only present at the site of placentation (decidua basalis) and are absent elsewhere (decidua parietalis). SpA, spiral arteries. The EVT is strongly PDL1. We show representative data from one individual of n 5 biological replicates. d, Immunohistochemistry images of decidual serial sections of the decidual implantation site (at 10 weeks of gestation), stained for the trophoblast cell marker, cytokeratin-7 (red arrow) and the inhibitory receptor KIR2DL1 on a natural killer cell (black arrow). The asterisk marks the lumen of a spiral artery that supplies the conceptus. We show representative data from one individual of n 5 samples). +Encyclopaedia of cells at the maternal-fetal interface. +a, Summary of populations from our scRNa-seq data. Blue, fetal; red, maternal. +Supplementary Material +Reviewer information Nature thanks B. Treutlein and the other anonymous reviewer(s) for their contribution to the peer review of this work. +Author contributions R.V.-T. and S.A.T. conceived the study. Sample and library preparation was performed by R.V.-T. with contributions from M.Y.T., J.-E.P., E.S. and S.L.; FACS experiments were performed by R.V.-T., R.A.B., A.F., A.M.S., R.P.P. and M.A.I.; histology staining was performed by J.N.B., L.G., R.V.-T., M.Y.T., B.M., B.I., S.H., D.H.R. and A.W.-C.; M.E. and R.V.-T. analysed and interpreted the data with contributions from M.V.-T., M.J.T.S., L.W., G.J.W., A.G., A.Z., J.H., K.B.M., K.P., M.H., A.M. and S.A.T.; R.V.-T., A.M. and S.A.T. wrote the manuscript with contributions from M.H., M.E., K.B.M. and M.Y.T.; M.H., A.M. and S.A.T. co-directed the study. All authors read and accepted the manuscript. +Competing interests The authors declare no competing interests. + Online content +Any methods, additional references, Nature Research reporting summaries, source data, statements of data availability and associated accession codes are available at https://doi.org/10.1038/s41586-018-0698-6. +Code availability +CellPhoneDB code is available in https://github.com/Teichlab/cellphonedb. The code can also be downloaded from https://cellphonedb.org/downloads. KIRid can be downloaded from https://github.com/Teichlab/KIRid. +Data availability +Our expression data for different tissues are also available for user-friendly interactive browsing online at http://data.teichlab.org (maternal-fetal interface). The raw sequencing data, expression-count data with cell classifications and the whole-genome sequencing data are deposited at ArrayExpress, with experiment codes E-MTAB-6701 (for droplet-based data), E-MTAB-6678 (for Smart-seq2 data) and E-MTAB-7304 (for the whole-genome sequencing data). Our CellPhoneDB repository is available at www.CellPhoneDB.org. +Endometrial decidualization: of mice and men +Human decidual natural killer cells are a unique NK cell subset with immunomodulatory potential +Uterine glands provide histiotrophic nutrition for the human fetus during the first trimester of pregnancy +Endometrial glands as a source of nutrients, growth factors and cytokines during the first trimester of human pregnancy: a morphological and immunohistochemical study +Rheological and physiological consequences of conversion of the maternal spiral arteries for uteroplacental blood flow during human pregnancy +Why is placentation abnormal in preeclampsia? +Placenta accreta spectrum: a need for more research on its aetiopathogenesis +A critical look at HLA-G +Human leucocyte antigen (HLA) expression of primary trophoblast cells and placental cell lines, determined using single antigen beads to characterize allotype specificities of anti-HLA antibodies +Killer Ig-like receptor expression in uterine NK cells is biased toward recognition of HLA-C and alters with gestational age +Variable NK cell receptors and their MHC class I ligands in immunity, reproduction and human evolution +Co-evolution of NK receptors and HLA ligands in humans is driven by reproduction +Massively parallel digital transcriptional profiling of single cells +Full-length RNA-seq from single cells using Smart-seq2 +The structure of the human placenta: implications for initiating and defending against virus infections +Integrative single-cell and cell-free plasma RNA transcriptomics elucidates placental cellular dynamics +Single-cell RNA-seq reveals the diversity of trophoblast subtypes and patterns of differentiation in the human placenta +Chemokine scavenger D6 is expressed by trophoblasts and aids the survival of mouse embryos transferred into allogeneic recipients +TGF attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells +Placenta: the forgotten organ +Altered biomarkers in trophoblast cells obtained noninvasively prior to clinical manifestation of perinatal disease +Reconstruction of the decidual pathways in human endometrial cells using single-cell RNA-seq +Human predecidual stromal cells have distinctive characteristics of pericytes: cell contractility, chemotactic activity, and expression of pericyte markers and angiogenic factors +Trained memory of human uterine NK cells enhances their function in subsequent pregnancies +The associations of parity and maternal age with small-for- gestational-age, preterm, and neonatal and infant mortality: a meta-analysis +ARID5B regulates metabolic programming in human adaptive NK cells +Decidual NK cells regulate key developmental processes at the human fetal-maternal interface +Secretion of colony stimulating factor-1 by human first trimester placental and decidual cell populations and the effect of this cytokine on trophoblast thymidine uptake in vitro +Effects of colony stimulating factor-1 on human extravillous trophoblast growth and invasion +Trophoblasts acquire a chemokine receptor, CCR1, as they differentiate towards invasive phenotype +NK cells stimulate recruitment of cDC1 into the tumor microenvironment promoting cancer immune control +Kallikrein-related peptidases: bridges between immune functions and extracellular matrix degradation +Regulation of the T cell response by CD39 +Targeting immunosuppressive adenosine in cancer +First-trimester determination of complications of late pregnancy +Single-cell transcriptomics of the human placenta: inferring the cell communication network of the maternal-fetal interface +Multilineage communication regulates human liver bud development from pluripotency +Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer +Single-cell transcriptional profiling reveals cellular diversity and intercommunication in the mouse heart +The blockade of immune checkpoints in cancer immunotherapy +Granulated lymphocytes in human endometrium: histochemical and immunohistochemical studies +Immunocytochemical characterization of the unusual large granular lymphocytes in human endometrium throughout the menstrual cycle +Enabling research with human embryonic and fetal tissue resources +Correlation of fetal age and measurements between 10 and 26 weeks of gestation. Obstet +Immune cells in the placental bed +Preparation of single-cell RNA-seq libraries for next generation sequencing +Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors +HISAT: a fast spliced aligner with low memory requirements +HTSeq:a Python framework to work with high-throughput sequencing data +Spatial reconstruction of single-cell gene expression data +Integrated analysis of single cell transcriptomic data across conditions, technologies, and species +Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations +Reversed graph embedding resolves complex single-cell trajectories +Combinations of maternal KIR and fetal HLA-C genes influence the risk of preeclampsia and reproductive success +The IPD and IMGT/HLA database: allele variant databases +Multiplexed droplet single-cell RNA-sequencing using natural genetic variation +T cell fate and clonality inference from single-cell transcriptomes +The Sequence Alignment/Map format and SAMtools +Natural genetic variation caused by small insertions and deletions in the human genome +A map of human genome variation from population-scale sequencing +From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline +A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data +IPD-the Immuno Polymorphism Database +Near-optimal probabilistic RNA-seq quantification +Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads +International Union of Basic and Clinical Pharmacology. LXXXIX. Update on the extended family of chemokine receptors and introducing a new nomenclature for atypical chemokine receptors +Protein interaction data curation: the International Molecular Exchange (IMEx) consortium +Reporting summary +Further information on research design is available in the Nature Research Reporting Summary linked to this paper. +Identification of cell types at the maternal-fetal interface. +a, Diagram illustrating the decidual-placental interface in early pregnancy. DC, dendritic cells; dM, decidual macrophages; dS, decidual stromal cells; Endo, endothelial cells; Epi, epithelial glandular cells; F, fibroblasts; HB, Hofbauer cells; PV, perivascular cells; SCT, syncytiotrophoblast; VCT, villous cytotrophoblast; EVT, extravillous trophoblast. b, Workflow for single-cell transcriptome profiling of decidua, placenta and maternal peripheral blood mononuclear cells. Numbers in parentheses indicate number of individuals analysed. c, Placental and decidual cell clusters from 10x Genomics and Smart-seq2 (SS2) scRNA-seq analysis visualized by UMAP. Colours indicate cell type or state. n 11 deciduas, n 5 placentas and n 6 blood samples. f, fetal; ILC, innate lymphocyte cells; l, lymphatic; m, maternal; p, proliferative; M3, maternal macrophages. d, UMAP visualization of T cell clonal expansion and clusters by integrating Smart-seq2 and 10x Genomics T cell data from clusters 4, 8, 10 and 15 from c. TCR, T cell receptor. MAIT, mucosal-associated invariant T cell. e, Origin of droplet cells in c by tissue (above) or genotype (below). Purple circle, maternal cells in placenta; green circle, fetal cells in decidua. +Ligand-receptor expression during EVT differentiation. +a, Pseudotime ordering of trophoblast cells reveals EVT and SCT pathways. Enriched EPCAM and HLA-G cells on placental and decidual isolates are included. n 11 deciduas and n 5 placentas. b, Violin plots showing log-transformed, normalized expression levels for selected ligand-receptor pairs that change during pseudotime and are predicted to be significant by CellPhoneDB (EGFR, HBEGF, NRP2, PGF, MET, HGF, ACKR2, CCL5, CXCR6, CXCL16, TGFB1, TGFBR2 and TGFBR1). Cells from Fig. 1c are used for the violin plots. +Stromal distribution in the two distinct decidual layers. +a, Heat map showing relative expression (z-score) of selected genes for perivascular and decidual stromal cells (n 11 deciduas; adjusted P value 0.1; Wilcoxon rank-sum test with Bonferroni correction). b, Immunohistochemistry of a spiral artery in serial sections of the decidua, stained for CD34 (endothelial cells), ACTA2 (PV cells and dS1 cells), MCAM (PV1 cells) and MMP11 (PV2 cells) (n 2 biological replicates). Scale bar, 100 m. c, Immunohistochemistry of decidual sections stained for ACTA2, which distinguishes between ACTA2 dS1 in decidua spongiosa and ACTA2 dS2 and dS3 in decidua compacta (n 3 biological replicates). Right panels are a higher magnification of the respectively numbered inset. Scale bar, 50 m. d, Multiplexed smFISH of decidua parietalis showing two decidual layers. ACTA2 dS1 in decidua spongiosa (40 objective); IGBP1 and PRL dS2 and dS3 confined to decidua compacta (20 objective) (n 2 biological replicates). e, Heat map shows selected significant ligand-receptor interactions (n 6 deciduas, P value 0.05, permutation test, see Methods) between PV cells and dS cells (left) and decidual cells (right) (n 11 deciduas). Assays were carried out at the mRNA level, but are extrapolated to protein interactions +Three dNK populations. +a, Heat map showing relative expression (z-score) of markers defining the three dNK subsets (n 11 deciduas; percentage 1 10%, percentage 2 60%; refers to the percentage of cells with expression above 0 in the corresponding cluster and all other clusters; P value 0.1 after Bonferroni correction, Wilcoxon rank-sum test). b, Workflow for KIRid method (see https://github.com/Teichlab/KIRid). IPD-KIR, database for human KIR (available at https://www.ebi.ac.uk/ipd/kir/). c, z-scores of KIR receptors (mean expression levels). Expression values were generated using Smart-seq2 data and the KIRid approach (n 5 deciduas). d, FACS gating strategy to identify dNK subsets (representative sample from n 6 individuals; Supplementary Table 9). e, z-scores of expression of granule molecules PRF1, GNL1, GZMA and GZMB in dNK subsets (n 11 individuals). f, Flow cytometry to compare staining of granule components in NKG2A KIR versus NKG2A KIR dNK cells (PRF1 n 9 individuals; GNLY n 7 individuals; GZMA n 8 individuals; GZMB n 10 individuals; Supplementary Table 9). Non-parametric paired Wilcoxon test. * 0.05, ** 0.01. g, Right, z-scores of glycolysis enzymes (mean mRNA expression). Left, only differentially expressed enzymes are shown in the glycolysis pathway (n 11 deciduas; P value 0.1 after Bonferroni correction, Wilcoxon rank-sum test) +Multiple regulatory immune responses at the site of placentation. +a, Overview of selected ligand-receptor interactions; P values indicated by circle size, scale on right (permutation test, see Methods). The means of the average expression level of interacting molecule 1 in cluster 1 and interacting molecule 2 in cluster 2 are indicated by colour. Only droplet data were used (n 6 deciduas). Angio., angiogenesis. Assays were carried out at the mRNA level, but are extrapolated to protein interactions. b, Diagram of the main receptors and ligands expressed on the three dNK subsets that are involved in adhesion and cellular recruitment. c, Diagram of the main receptors and ligands expressed on the three dNK subsets that are involved in immunomodulation. \ No newline at end of file diff --git a/cellsem_agent/graphs/nlm_annotate/grounding_statistics.py b/cellsem_agent/graphs/nlm_annotate/grounding_statistics.py index 7337fe6..ca1120c 100644 --- a/cellsem_agent/graphs/nlm_annotate/grounding_statistics.py +++ b/cellsem_agent/graphs/nlm_annotate/grounding_statistics.py @@ -1,6 +1,7 @@ import csv -tsv_path = './resources/groundings.tsv' +# tsv_path = './resources/groundings.tsv' +tsv_path = '../cxg_annotate/resources/groundings.tsv' tp = fp = fn = tn = 0