Skip to content

Advanced Bot Detection Heuristics #209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .playground/nuxt.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ export default defineNuxtConfig({
*/
defineNuxtModule({
setup(_, nuxt) {
nuxt.hooks.hook('robots:config', (config) => {
const catchAll = config.groups.find(g => g.userAgent.includes('*'))
if (catchAll) {
catchAll.disallow.push('/__link-checker__/')
}
console.log({ catchAll, groups: config.groups })
})
if (!nuxt.options.dev)
return

Expand Down
9 changes: 9 additions & 0 deletions .playground/pages/index.vue
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
<script lang="ts" setup>
import { useBotDetection } from '#robots/app/composables/useBotDetection'

const bot = useBotDetection()
</script>

<template>
<div>
<div>
<NuxtLink to="/secret">
Secret page - not crawlable
</NuxtLink>
<div>
Is Bot: {{ bot }}
</div>
</div>
</div>
</template>
61 changes: 61 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Development Commands

- **Build**: `pnpm build` - Builds the module using nuxt-module-build and generates client
- **Development**: `pnpm dev` - Runs playground at `.playground` directory
- **Development Preparation**: `pnpm dev:prepare` - Prepares development environment with stub build
- **Test**: `pnpm test` - Runs vitest test suite
- **Lint**: `pnpm lint` - Runs ESLint with auto-fix using @antfu/eslint-config
- **Type Check**: `pnpm typecheck` - Runs TypeScript compiler for type checking
- **Client Development**: `pnpm client:dev` - Runs devtools UI client on port 3300
- **Release**: `pnpm release` - Builds, bumps version, and publishes

## Architecture Overview

This is a Nuxt module (`@nuxtjs/robots`) that provides robots.txt generation and robot meta tag functionality for Nuxt applications.

### Core Module Structure

- **`src/module.ts`**: Main module entry point with module options and setup logic
- **`src/runtime/`**: Runtime code that gets injected into user applications
- **`app/`**: Client-side runtime (composables, plugins)
- **`server/`**: Server-side runtime (middleware, routes, composables)
- **`src/kit.ts`**: Utilities for build-time module functionality
- **`src/util.ts`**: Shared utilities exported to end users

### Key Runtime Components

- **Server Routes**:
- `/robots.txt` route handler in `src/runtime/server/routes/robots-txt.ts`
- Debug routes under `/__robots__/` for development
- **Server Composables**: `getSiteRobotConfig()` and `getPathRobotConfig()` for runtime robot configuration
- **Client Composables**: `useRobotsRule()` for accessing robot rules in Vue components
- **Meta Plugin**: Automatically injects robot meta tags and X-Robots-Tag headers

### Build System

- Uses `@nuxt/module-builder` with unbuild configuration in `build.config.ts`
- Exports multiple entry points: main module, `/util`, and `/content`
- Supports both ESM and CommonJS via rollup configuration

### Test Structure

- **Integration Tests**: Test fixtures in `test/fixtures/` with full Nuxt apps
- **Unit Tests**: Focused tests in `test/unit/` for specific functionality
- Uses `@nuxt/test-utils` for testing Nuxt applications
- Test environment automatically set to production mode

### Development Workflow

The module supports a playground at `.playground` for local development and manual testing. The client UI (devtools integration) is developed separately in the `client/` directory.

### I18n Integration

The module has special handling for i18n scenarios, with logic in `src/i18n.ts` for splitting paths and handling localized routes.

### Content Integration

Provides integration with Nuxt Content module via `src/content.ts` for content-based robot configurations.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
---
title: Nuxt Hooks
title: "Hook: robots:config"
description: Learn how to use Nuxt hooks to modify the robots config.
---

## `'robots:config'`{lang="ts"}

**Type:** `(config: ResolvedModuleOptions) => void | Promise<void>`{lang="ts"}

This hook allows you to modify the robots config before it is used to generate the robots.txt and meta tags.
Expand Down
8 changes: 8 additions & 0 deletions libs/is-bot/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
node_modules/
dist/
*.log
.DS_Store
coverage/
.nyc_output/
*.tgz
*.tar.gz
162 changes: 162 additions & 0 deletions libs/is-bot/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Bot Detection Library

A framework-agnostic bot detection library with advanced behavioral analysis capabilities.

## Features

- πŸ€– **Advanced Bot Detection**: Multi-layered analysis including user agents, behavioral patterns, and timing analysis
- πŸ”§ **Framework Agnostic**: Works with any web framework through driver pattern
- πŸš€ **H3/Nuxt Ready**: Built-in support for H3 events and Nuxt applications
- πŸ“Š **Behavioral Analysis**: Modular system with simple, intermediate, and advanced detection behaviors
- πŸ’Ύ **Flexible Storage**: Supports multiple storage backends through adapter pattern
- 🎯 **High Performance**: Optimized with batch operations and intelligent caching
- πŸ›‘οΈ **Security Focused**: IP allowlists/blocklists, rate limiting, and threat detection

## Installation

```bash
npm install @nuxtjs/robots-bot-detection
```

## Quick Start

### Basic Usage

```typescript
import { BotDetectionEngine, MemoryAdapter, H3SessionIdentifier } from '@nuxtjs/robots-bot-detection'

// Create storage adapter
const storage = new MemoryAdapter()

// Create session identifier
const sessionIdentifier = new H3SessionIdentifier()

// Create engine
const engine = new BotDetectionEngine({
storage,
sessionIdentifier,
config: {
thresholds: {
likelyBot: 70,
definitelyBot: 90
}
}
})

// Analyze a request
const request = {
path: '/api/data',
method: 'GET',
headers: {
'user-agent': 'Mozilla/5.0 ...'
},
ip: '192.168.1.1',
timestamp: Date.now()
}

const result = await engine.analyze(request)
console.log(`Bot score: ${result.score}`)
console.log(`Is bot: ${result.isBot}`)
```

### H3/Nuxt Integration

```typescript
import { BotDetectionEngine, UnstorageBehaviorAdapter, H3SessionIdentifier } from '@nuxtjs/robots-bot-detection'
import { useStorage } from 'unstorage'

const storage = useStorage('redis://localhost:6379')
const adapter = new UnstorageBehaviorAdapter(storage)
const sessionIdentifier = new H3SessionIdentifier('your-session-secret')

const engine = new BotDetectionEngine({
storage: adapter,
sessionIdentifier
})

// In your H3 handler
export default defineEventHandler(async (event) => {
const result = await engine.analyze(request, event)

if (result.isBot) {
throw createError({
statusCode: 429,
statusMessage: 'Too Many Requests'
})
}

// Continue with normal processing
})
```

## API Reference

### BotDetectionEngine

The main engine class for bot detection.

#### Constructor Options

```typescript
interface BotDetectionEngineOptions {
storage: BehaviorStorage
sessionIdentifier: SessionIdentifier
responseStatusProvider?: ResponseStatusProvider
config?: BotDetectionConfig
}
```

#### Methods

- `analyze(request: BotDetectionRequest, event?: H3Event): Promise<BotDetectionResponse>`
- `updateConfig(config: Partial<BotDetectionConfig>): void`
- `cleanup(): Promise<void>`

### Storage Adapters

#### MemoryAdapter
In-memory storage for development and testing.

#### UnstorageBehaviorAdapter
Production-ready storage adapter using unstorage.

### Behavior Configuration

Configure which detection behaviors to enable:

```typescript
const config = {
behaviors: {
simple: {
pathAnalysis: { enabled: true, weight: 1.0 },
basicTiming: { enabled: true, weight: 0.8 },
basicRateLimit: { enabled: true, weight: 1.2 }
},
intermediate: {
burstDetection: { enabled: true, weight: 1.0 },
headerConsistency: { enabled: true, weight: 0.9 }
},
advanced: {
advancedTiming: { enabled: false, weight: 1.5 },
browserFingerprint: { enabled: false, weight: 1.3 }
}
}
}
```

## Testing

```bash
# Run tests
npm test

# Run tests with coverage
npm run test:coverage

# Run tests in watch mode
npm run dev
```

## License

MIT License - see LICENSE file for details.
70 changes: 70 additions & 0 deletions libs/is-bot/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{
"name": "@nuxtjs/robots-bot-detection",
"version": "1.0.0",
"description": "Framework-agnostic bot detection library",
"type": "module",
"main": "./dist/index.js",
"module": "./dist/index.js",
"types": "./dist/index.d.ts",
"exports": {
".": {
"types": "./dist/index.d.ts",
"import": "./dist/index.js",
"require": "./dist/index.cjs"
},
"./h3": {
"types": "./dist/drivers/h3.d.ts",
"import": "./dist/drivers/h3.js",
"require": "./dist/drivers/h3.cjs"
},
"./behaviors": {
"types": "./dist/behaviors/index.d.ts",
"import": "./dist/behaviors/index.js",
"require": "./dist/behaviors/index.cjs"
}
},
"files": [
"dist",
"src"
],
"scripts": {
"build": "tsup",
"dev": "tsup --watch",
"test": "vitest",
"test:run": "vitest run",
"test:coverage": "vitest run --coverage",
"typecheck": "tsc --noEmit",
"lint": "eslint src test --ext .ts,.js",
"lint:fix": "eslint src test --ext .ts,.js --fix"
},
"keywords": [
"bot-detection",
"security",
"web-scraping",
"rate-limiting",
"h3",
"nuxt",
"nitro"
],
"author": "Nuxt Team",
"license": "MIT",
"dependencies": {
"unstorage": "^1.16.0"
},
"peerDependencies": {
"h3": "^1.0.0"
},
"devDependencies": {
"@types/node": "^20.19.4",
"eslint": "^9.30.1",
"h3": "^1.15.3",
"tsup": "^8.5.0",
"typescript": "^5.8.3",
"vitest": "^3.2.4"
},
"repository": {
"type": "git",
"url": "https://github.com/nuxt-modules/robots.git",
"directory": "libs/is-bot"
}
}
Loading