Mapping galactic morphology through high-dimensional vision, similarity search, and sky-projection.
ANTARIKSH is a research-oriented astrophysical system that explores galaxy morphology through Content-Based Image Retrieval (CBIR). Instead of classifying galaxies into rigid labels, ANTARIKSH learns a visual manifold of galaxies and allows users to navigate the universe by similarity, structure, and neighborhood behavior.
It blends:
- Vision Transformers
- FAISS vector search
- Galaxy Zoo science catalogs
- Aladin Lite sky mapping
- AWS S3 + EC2 deployment
into one unified observatory pipeline.
- Upload a galaxy image
- Extract deep visual embeddings using a Vision Transformer
- Retrieve the most morphologically similar galaxies
- Attach astrophysical metadata (RA, Dec, class probabilities)
- Analyze the local neighborhood (bars, spirals, smoothness)
- Project results onto the real sky using Aladin
- Explore galactic structure as a continuous visual space
This is not just a classifier. It is a galaxy similarity engine + morphology analyzer + sky mapper.
ANTARIKSH operates in three major scientific phases:
Offline Universe Construction → Online Retrieval Engine → Analysis & Mapping Layer
This phase constructs the galactic manifold.
- Galaxy images stored in AWS S3
- Images streamed and cached locally
- Vision Transformer extracts 768-D embeddings
- FAISS IVF index trained
- Vector index + metadata persisted
- Galaxy Zoo 2 catalogs loaded into memory
Outputs:
faiss.indexmetadata.json- Galaxy catalog DataFrame
- Cached S3 samples
This builds the latent universe.
Triggered when a user uploads a galaxy.
- Image received by FastAPI
- Preprocessing + normalization
- ViT extracts feature vector
- FAISS searches nearest neighbors
- Similarity scores computed
- Top-K neighbors retrieved
- Galaxy Zoo metadata attached
Outputs:
- Similar galaxy set
- Morphological profiles
- Coordinates
- Structured scientific response
This is the manifold navigation stage.
Transforms retrieval into astronomy.
-
Neighborhood statistics computed
- Spiral probability
- Bar fraction
- Smoothness
- Sky dispersion
-
Class distribution visualized
-
Scientific interpretation generated
-
RA/Dec projected into Aladin Lite
-
User explores the real sky context
This is where vision meets astrophysics.
ANTARIKSH integrates:
-
Galaxy Zoo 2
gz2_hart16.csv.gz→ morphological vote fractionsgz2_filename_mapping.csv→ image-catalog linkage
-
Sloan Digital Sky Survey (SDSS)
-
Aladin Lite Sky Atlas
Used for:
- Morphological statistics
- Neighborhood coherence
- Sky projection
- Physical interpretation
Here is a table summarizing the statistics for Smooth Galaxies and Spiral Galaxies based on the antariksh_catalog:
| Statistic | Smooth Galaxies | Spiral Galaxies |
|---|---|---|
| Count | 4,558 | 6,641 |
| Mean (Total Votes) | 144.06 | 211.48 |
| Std (Total Votes) | 16.75 | 73.49 |
| Min (Total Votes) | 64 | 91 |
| 25% (Total Votes) | 133 | 151 |
| 50% (Total Votes) | 142 | 187 |
| 75% (Total Votes) | 153 | 261 |
| Max (Total Votes) | 265 | 539 |
| Mean (Classifications) | 44.28 | 42.09 |
| Std (Classifications) | 4.31 | 6.24 |
| Min (Classifications) | 23 | 23 |
| 25% (Classifications) | 42 | 38 |
| 50% (Classifications) | 44 | 42 |
| 75% (Classifications) | 46 | 46 |
| Max (Classifications) | 66 | 72 |
-
Total Votes:
- Spiral galaxies have a higher mean (211.48) and a wider range (91–539) of total votes compared to smooth galaxies (mean: 144.06, range: 64–265).
- This suggests that spiral galaxies may have more variability in the number of votes they receive.
-
Total Classifications:
- Smooth galaxies have a slightly higher mean number of classifications (44.28) compared to spiral galaxies (42.09).
- However, the range of classifications is similar for both types, with smooth galaxies having a slightly higher maximum (66 vs. 72).
-
S3
- Raw galaxy images
- Query uploads
- Cached retrievals
-
EC2
- FastAPI backend
- FAISS engine
- Vision Transformer inference
- NGINX reverse proxy
-
Local caching
- Query images
- S3 galaxy mirrors
- Retrieval thumbnails
This design allows:
- Lightweight EC2 storage
- Scalable image hosting
- Portable indexing
- Open ANTARIKSH Observatory
- Upload a galaxy image
- System extracts deep features
- FAISS retrieves similar galaxies
- Celestial neighborhood appears
- Morphology distributions update
- Cluster behavior is summarized
- Galaxies plotted on Aladin sky
- User explores structure, context, and evolution
From pixels → to vectors → to galaxies → to the sky.
- Built a working CBIR engine for galaxy morphology
- Integrated Vision Transformers with FAISS at scale
- Constructed a visual manifold of ~14k galaxies
- Linked deep vision outputs with real astrophysical catalogs
- Implemented scientific neighborhood analytics
- Designed a multi-panel observatory interface
- Achieved end-to-end pipeline execution
- Successfully deployed on AWS EC2 + S3
- Connected AI retrieval with sky-level projection
No hype. Just real systems, real data, real pipelines.
Galaxy Zoo classes are imbalanced.
But ANTARIKSH is not trained as a classifier.
It is:
- embedding-based
- similarity-driven
- structure-oriented
Therefore:
- class imbalance does not bias retrieval
- dominant morphologies simply shape the manifold density
- rare morphologies still form distinct regions
In fact, this enables:
- discovery of rare outliers
- transitional galaxy forms
- morphology continua
ANTARIKSH/
│
├── backend/ → FastAPI routes & orchestration
├── src/
│ ├── retrieval.py
│ ├── processor.py
│ ├── galaxy_catalog.py
│ └── cloud/s3_manager.py
│
├── vector_db/ → FAISS + embeddings
├── cache/ → uploads & S3 mirrors
├── frontend/ → React Observatory UI
├── data/ → Galaxy Zoo catalogs
└── logs/
ANTARIKSH follows an experimental research-build cycle.
v1.0→ Initial CBIR prototypev2.0→ ViT + FAISS integrationv3.0→ Galaxy Zoo catalog couplingv4.0→ Scientific analytics layerv5.0→ Observatory UI + mapping
Each release reflects:
- architectural refinement
- scientific depth
- pipeline stability
Branches track:
- feature research
- interface evolution
- backend scaling
ANTARIKSH is a foundation.
It can evolve into:
- anomaly discovery engine
- morphological evolution mapper
- unsupervised galaxy taxonomy
- multi-survey fusion platform
- astrophysical knowledge system
Not a demo. A direction.
- Galaxy Zoo Project
- Sloan Digital Sky Survey
- Aladin Sky Atlas
- FAISS by Meta AI
- Vision Transformers
- Citizen scientists who classified the cosmos
ANTARIKSH does not try to label the universe.
It tries to listen to its structure.
And that’s a far more powerful thing.




