Skip to content

Deepmalya2506/ANTARIKSH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ANTARIKSH Cover

A N T A R I K S H

Deep Neural Manifold Mapping for Galactic Structures

Mapping galactic morphology through high-dimensional vision, similarity search, and sky-projection.

ANTARIKSH is a research-oriented astrophysical system that explores galaxy morphology through Content-Based Image Retrieval (CBIR). Instead of classifying galaxies into rigid labels, ANTARIKSH learns a visual manifold of galaxies and allows users to navigate the universe by similarity, structure, and neighborhood behavior.

It blends:

  • Vision Transformers
  • FAISS vector search
  • Galaxy Zoo science catalogs
  • Aladin Lite sky mapping
  • AWS S3 + EC2 deployment

into one unified observatory pipeline.


What ANTARIKSH Does

  • Upload a galaxy image
  • Extract deep visual embeddings using a Vision Transformer
  • Retrieve the most morphologically similar galaxies
  • Attach astrophysical metadata (RA, Dec, class probabilities)
  • Analyze the local neighborhood (bars, spirals, smoothness)
  • Project results onto the real sky using Aladin
  • Explore galactic structure as a continuous visual space

This is not just a classifier. It is a galaxy similarity engine + morphology analyzer + sky mapper.


System Overview

ANTARIKSH operates in three major scientific phases:

Offline Universe Construction  →  Online Retrieval Engine  →  Analysis & Mapping Layer

Workflow Architecture

🌐 System Pipeline

ANTARIKSH Pipeline

🔹 1. Offline Processing & Indexing

This phase constructs the galactic manifold.

  • Galaxy images stored in AWS S3
  • Images streamed and cached locally
  • Vision Transformer extracts 768-D embeddings
  • FAISS IVF index trained
  • Vector index + metadata persisted
  • Galaxy Zoo 2 catalogs loaded into memory

Outputs:

  • faiss.index
  • metadata.json
  • Galaxy catalog DataFrame
  • Cached S3 samples

This builds the latent universe.


🔹 2. Image Retrieval Phase (Online)

ANTARIKSH RetrievedImages

Triggered when a user uploads a galaxy.

  • Image received by FastAPI
  • Preprocessing + normalization
  • ViT extracts feature vector
  • FAISS searches nearest neighbors
  • Similarity scores computed
  • Top-K neighbors retrieved
  • Galaxy Zoo metadata attached

Outputs:

  • Similar galaxy set
  • Morphological profiles
  • Coordinates
  • Structured scientific response

This is the manifold navigation stage.


🔹 3. Analysis & Mapping Phase

ANTARIKSH Pipeline

Transforms retrieval into astronomy.

  • Neighborhood statistics computed

    • Spiral probability
    • Bar fraction
    • Smoothness
    • Sky dispersion
  • Class distribution visualized

  • Scientific interpretation generated

  • RA/Dec projected into Aladin Lite

  • User explores the real sky context

This is where vision meets astrophysics.


ANTARIKSH Pipeline

Scientific Data Sources

ANTARIKSH integrates:

  • Galaxy Zoo 2

    • gz2_hart16.csv.gz → morphological vote fractions
    • gz2_filename_mapping.csv → image-catalog linkage
  • Sloan Digital Sky Survey (SDSS)

  • Aladin Lite Sky Atlas

Used for:

  • Morphological statistics
  • Neighborhood coherence
  • Sky projection
  • Physical interpretation

Here is a table summarizing the statistics for Smooth Galaxies and Spiral Galaxies based on the antariksh_catalog:

Statistic Smooth Galaxies Spiral Galaxies
Count 4,558 6,641
Mean (Total Votes) 144.06 211.48
Std (Total Votes) 16.75 73.49
Min (Total Votes) 64 91
25% (Total Votes) 133 151
50% (Total Votes) 142 187
75% (Total Votes) 153 261
Max (Total Votes) 265 539
Mean (Classifications) 44.28 42.09
Std (Classifications) 4.31 6.24
Min (Classifications) 23 23
25% (Classifications) 42 38
50% (Classifications) 44 42
75% (Classifications) 46 46
Max (Classifications) 66 72

Observations:

  1. Total Votes:

    • Spiral galaxies have a higher mean (211.48) and a wider range (91–539) of total votes compared to smooth galaxies (mean: 144.06, range: 64–265).
    • This suggests that spiral galaxies may have more variability in the number of votes they receive.
  2. Total Classifications:

    • Smooth galaxies have a slightly higher mean number of classifications (44.28) compared to spiral galaxies (42.09).
    • However, the range of classifications is similar for both types, with smooth galaxies having a slightly higher maximum (66 vs. 72).

Cloud & Deployment

AWS Infrastructure

  • S3

    • Raw galaxy images
    • Query uploads
    • Cached retrievals
  • EC2

    • FastAPI backend
    • FAISS engine
    • Vision Transformer inference
    • NGINX reverse proxy
  • Local caching

    • Query images
    • S3 galaxy mirrors
    • Retrieval thumbnails

This design allows:

  • Lightweight EC2 storage
  • Scalable image hosting
  • Portable indexing

User Experience Flow

  1. Open ANTARIKSH Observatory
  2. Upload a galaxy image
  3. System extracts deep features
  4. FAISS retrieves similar galaxies
  5. Celestial neighborhood appears
  6. Morphology distributions update
  7. Cluster behavior is summarized
  8. Galaxies plotted on Aladin sky
  9. User explores structure, context, and evolution

From pixels → to vectors → to galaxies → to the sky.


Achievements

  • Built a working CBIR engine for galaxy morphology
  • Integrated Vision Transformers with FAISS at scale
  • Constructed a visual manifold of ~14k galaxies
  • Linked deep vision outputs with real astrophysical catalogs
  • Implemented scientific neighborhood analytics
  • Designed a multi-panel observatory interface
  • Achieved end-to-end pipeline execution
  • Successfully deployed on AWS EC2 + S3
  • Connected AI retrieval with sky-level projection

No hype. Just real systems, real data, real pipelines.


About Class Imbalance

Galaxy Zoo classes are imbalanced.

But ANTARIKSH is not trained as a classifier.

It is:

  • embedding-based
  • similarity-driven
  • structure-oriented

Therefore:

  • class imbalance does not bias retrieval
  • dominant morphologies simply shape the manifold density
  • rare morphologies still form distinct regions

In fact, this enables:

  • discovery of rare outliers
  • transitional galaxy forms
  • morphology continua

Project Structure

ANTARIKSH/
│
├── backend/        → FastAPI routes & orchestration
├── src/
│   ├── retrieval.py
│   ├── processor.py
│   ├── galaxy_catalog.py
│   └── cloud/s3_manager.py
│
├── vector_db/      → FAISS + embeddings
├── cache/          → uploads & S3 mirrors
├── frontend/       → React Observatory UI
├── data/           → Galaxy Zoo catalogs
└── logs/

Releases & Evolution

ANTARIKSH follows an experimental research-build cycle.

  • v1.0 → Initial CBIR prototype
  • v2.0 → ViT + FAISS integration
  • v3.0 → Galaxy Zoo catalog coupling
  • v4.0 → Scientific analytics layer
  • v5.0 → Observatory UI + mapping

Each release reflects:

  • architectural refinement
  • scientific depth
  • pipeline stability

Branches track:

  • feature research
  • interface evolution
  • backend scaling

Vision

ANTARIKSH is a foundation.

It can evolve into:

  • anomaly discovery engine
  • morphological evolution mapper
  • unsupervised galaxy taxonomy
  • multi-survey fusion platform
  • astrophysical knowledge system

Not a demo. A direction.


Credits & Inspiration

  • Galaxy Zoo Project
  • Sloan Digital Sky Survey
  • Aladin Sky Atlas
  • FAISS by Meta AI
  • Vision Transformers
  • Citizen scientists who classified the cosmos

Closing Note

ANTARIKSH does not try to label the universe.

It tries to listen to its structure.

And that’s a far more powerful thing.