The Decentralized AI Training Data Marketplace
DataForge is a revolutionary platform that transforms how AI training datasets are created, licensed, and distributed. Built on blockchain technology using the Origin SDK, DataForge enables creators to monetize their valuable training data while providing AI developers with access to high-quality, ethically-sourced datasets.
Submission for Camp Network Hackathon
Deployed URL: https://data-forge-seven.vercel.app
DataForge bridges the gap between data creators and AI developers by providing:
- Decentralized Dataset Marketplace: Browse, purchase, and license AI training datasets with transparent pricing
- NFT-Based Licensing: Each dataset is minted as an NFT with embedded licensing terms using Origin Protocol
- Creator Monetization: Data creators earn revenue through direct sales and usage-based licensing
- Quality Assurance: Community-driven rating and verification system for dataset quality
- Flexible Licensing: Multiple licensing models from one-time purchases to subscription-based access
- Analytics Dashboard: Track dataset performance, earnings, and market trends
- Easy Upload: Drag-and-drop interface for dataset uploads with metadata management
- Smart Licensing: Automated licensing terms with customizable usage rights
- Revenue Tracking: Real-time analytics on sales, downloads, and earnings
- Quality Metrics: Community feedback and rating system
- Curated Marketplace: Discover high-quality datasets across various domains
- Instant Access: Purchase and download datasets with blockchain-verified licenses
- Bulk Licensing: Enterprise-grade licensing for large-scale AI projects
- Origin SDK Integration: Leverages Origin Protocol for decentralized marketplace functionality
- NFT Minting: Each dataset becomes a tradeable NFT with embedded licensing
- Smart Contracts: Automated royalty distribution and license enforcement
- Transparent Transactions: All purchases and licenses recorded on-chain
- Frontend: Next.js 14, React, TypeScript, Tailwind CSS
- Backend: Node.js, MongoDB, Next.js API Routes
- Blockchain: Origin SDK, Base Camp network
- Authentication: NextAuth.js with Web3 wallet integration
- UI Components: Radix UI, Lucide Icons
For AI Companies & Researchers:
- Licensing Uncertainty: Most datasets on platforms like Kaggle lack clear commercial licensing terms
- Legal Risk: Using "free" datasets without proper licensing can lead to costly legal disputes
- Quality Issues: No standardized quality metrics or creator accountability
- Attribution Problems: Difficulty tracking data provenance and giving proper credit
For Data Creators:
- No Monetization: Valuable training data shared freely with no compensation
- Lack of Control: Once uploaded, creators lose control over how their data is used
- No Recognition: Creators rarely receive attribution for their contributions to AI breakthroughs
DataForge addresses these critical issues through blockchain-verified licensing and decentralized ownership:
- Clear Licensing: Every dataset has transparent, legally-binding license terms stored on-chain
- Creator Compensation: Automated royalty distribution ensures creators are fairly compensated
- Usage Tracking: Blockchain records provide complete audit trails for compliance
- Quality Assurance: Community-driven ratings and verification systems
DataForge leverages the Origin SDK for comprehensive blockchain functionality:
import { CampProvider, useAuth, useAuthState } from "@campnetwork/origin/react";
// Wrap app with Origin provider
<CampProvider clientId="your-client-id">
<App />
</CampProvider>
// Use authentication hooks
const auth = useAuth();
const { authenticated } = useAuthState();// Direct file minting with licensing terms
const tokenId = await auth.origin.mintFile(
file, // File object
{
name: "Dataset Name",
description: "Dataset description",
category: "computer-vision",
},
{
price: BigInt("1000000000000000000"), // 1 ETH in wei
duration: 86400, // 1 day in seconds
royaltyBps: 500, // 5% royalty
paymentToken: "0x0000000000000000000000000000000000000000", // ETH
},
);// Check user access to datasets
const hasAccess = await auth.origin.hasAccess(tokenId, userAddress);
// Purchase dataset access with automatic payment handling
const result = await auth.origin.buyAccessSmart(tokenId, periods);
// Download dataset content (only if user has access)
const data = await auth.origin.getData(tokenId);// Get license terms for any dataset
const terms = await auth.origin.getTerms(tokenId);
// Check subscription expiry
const expiry = await auth.origin.subscriptionExpiry(tokenId, userAddress);
// Renew existing access
const renewal = await auth.origin.renewAccess(tokenId, periods);// Verify ownership and metadata
const owner = await auth.origin.ownerOf(tokenId);
const contentHash = await auth.origin.contentHash(tokenId);
const tokenURI = await auth.origin.tokenURI(tokenId);
// Update license terms (owner only)
await auth.origin.updateTerms(tokenId, newLicenseTerms);- Decentralized File Storage: Files uploaded through Origin's infrastructure
- Smart Contract Integration: Automated licensing and payment processing
- Access Control: Blockchain-verified permissions for dataset downloads
- Royalty Distribution: Automatic creator compensation on each sale
- Node.js 18+
- MongoDB instance
- Web3 wallet (MetaMask recommended)
- Origin SDK API keys
# Clone the repository
git clone https://github.com/a-singh09/dataforge.git
cd dataforge
# Install dependencies
npm install
# Copy environment variables
cp .env.example .env.local# Database
MONGODB_URI=your_mongodb_connection_string
# Origin SDK
ORIGIN_API_KEY=your_origin_api_key
ORIGIN_MARKETPLACE_ADDRESS=your_marketplace_contract_address
ORIGIN_NETWORK=mainnet# Seed the database with initial data
npm run seed# Start the development server
npm run devVisit http://localhost:3000 to see DataForge in action!
- Upload: Upload your training dataset with comprehensive metadata
- License: Define licensing terms (commercial use, attribution, etc.)
- Mint: Dataset is minted as an NFT on the blockchain via Origin SDK
- List: Dataset appears in the marketplace for discovery
- Earn: Receive payments automatically through smart contracts
- Browse: Explore datasets by category, quality rating, and price
- Preview: Access sample data and metadata before purchase
- Purchase: Buy dataset licenses using cryptocurrency
- Download: Instant access to full dataset with verified licensing
- Blockchain Verification: All transactions and licenses verified on-chain
- IPFS Storage: Decentralized storage prevents data loss or censorship
- Smart Contract Escrow: Secure transactions with automated dispute resolution
- Community Moderation: User-driven quality control and reporting system
DataForge addresses critical challenges in the AI industry:
- Data Scarcity: Provides access to diverse, high-quality training datasets
- Creator Rights: Ensures fair compensation for data creators
- Licensing Clarity: Transparent, blockchain-verified usage rights
- Quality Assurance: Community-driven quality control mechanisms
- Automatic Renew Mechanism: Licenses renew automatically without manually making transactions.
- AI Model Integration: Direct API access for seamless model training
- Collaborative Datasets: Tools for community-driven dataset creation, giving each member a royalty share for his work.
- Advanced Analytics: ML-powered dataset recommendation engine
- Live Demo: https://data-forge-seven.vercel.app
Built with β€οΈ for the Camp Network Hackathon
Empowering the future of AI through decentralized data markets