SEO_iamge_renamer_starting_.../docs/ARCHITECTURE.md

# Architecture Documentation

This document provides a comprehensive overview of the AI Bulk Image Renamer SaaS platform architecture, including system design, data flow, deployment strategies, and technical specifications.

## 🏗️ System Overview

The AI Bulk Image Renamer is designed as a modern, scalable SaaS platform using microservices architecture with the following core principles:

- **Separation of Concerns**: Clear boundaries between frontend, API, worker, and monitoring services
- **Horizontal Scalability**: Stateless services that can scale independently
- **Resilience**: Fault-tolerant design with graceful degradation
- **Security-First**: Comprehensive security measures at every layer
- **Observability**: Full monitoring, logging, and tracing capabilities

## 📐 High-Level Architecture

```mermaid
graph TB
    subgraph "Client Layer"
        WEB[Web Browser]
        MOBILE[Mobile Browser]
    end

    subgraph "Load Balancer"
        LB[NGINX/Ingress]
    end

    subgraph "Application Layer"
        FRONTEND[Next.js Frontend]
        API[NestJS API Gateway]
        WORKER[Worker Service]
        MONITORING[Monitoring Service]
    end

    subgraph "Data Layer"
        POSTGRES[(PostgreSQL)]
        REDIS[(Redis)]
        MINIO[(MinIO/S3)]
    end

    subgraph "External Services"
        STRIPE[Stripe Payments]
        GOOGLE[Google OAuth/Vision]
        OPENAI[OpenAI GPT-4 Vision]
        SENTRY[Sentry Error Tracking]
    end

    WEB --> LB
    MOBILE --> LB
    LB --> FRONTEND
    LB --> API

    FRONTEND <--> API
    API <--> WORKER
    API <--> POSTGRES
    API <--> REDIS
    WORKER <--> POSTGRES
    WORKER <--> REDIS
    WORKER <--> MINIO

    API <--> STRIPE
    API <--> GOOGLE
    WORKER <--> OPENAI
    WORKER <--> GOOGLE

    MONITORING --> SENTRY
    MONITORING --> POSTGRES
    MONITORING --> REDIS
```

## 🔧 Technology Stack

### **Frontend Layer**
- **Framework**: Next.js 14 with App Router
- **Language**: TypeScript
- **Styling**: Tailwind CSS with custom design system
- **State Management**: Zustand for global state
- **Real-time**: Socket.io client for WebSocket connections
- **Forms**: React Hook Form with Zod validation
- **UI Components**: Headless UI with custom implementations

### **API Layer**
- **Framework**: NestJS with Express
- **Language**: TypeScript
- **Authentication**: Passport.js with Google OAuth 2.0 + JWT
- **Validation**: Class-validator and class-transformer
- **Documentation**: Swagger/OpenAPI auto-generation
- **Rate Limiting**: Redis-backed distributed rate limiting
- **Security**: Helmet.js, CORS, input sanitization

### **Worker Layer**
- **Framework**: NestJS with background job processing
- **Queue System**: BullMQ with Redis backing
- **Image Processing**: Sharp for image manipulation
- **AI Integration**: OpenAI GPT-4 Vision + Google Cloud Vision
- **Security**: ClamAV virus scanning
- **File Storage**: MinIO/S3 with presigned URLs

### **Data Layer**
- **Primary Database**: PostgreSQL 15 with Prisma ORM
- **Cache/Queue**: Redis 7 for sessions, jobs, and caching
- **Object Storage**: MinIO (S3-compatible) for file storage
- **Search**: Full-text search capabilities within PostgreSQL

### **Infrastructure**
- **Containers**: Docker with multi-stage builds
- **Orchestration**: Kubernetes with Helm charts
- **CI/CD**: Forgejo Actions with automated testing
- **Monitoring**: Prometheus + Grafana + Sentry + OpenTelemetry
- **Service Mesh**: Ready for Istio integration

## 🏛️ Architectural Patterns

### **1. Microservices Architecture**

The platform is decomposed into independently deployable services:

```
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│   Frontend      │  │   API Gateway   │  │   Worker        │
│   - Next.js     │  │   - Authentication│  │   - Image Proc. │
│   - UI/UX       │  │   - Rate Limiting│  │   - AI Analysis │
│   - Real-time   │  │   - Validation   │  │   - Virus Scan  │
└─────────────────┘  └─────────────────┘  └─────────────────┘
                                │
                     ┌─────────────────┐
                     │   Monitoring    │
                     │   - Metrics     │
                     │   - Health      │
                     │   - Alerts      │
                     └─────────────────┘
```

**Benefits:**
- Independent scaling and deployment
- Technology diversity (different services can use different tech stacks)
- Fault isolation (failure in one service doesn't affect others)
- Team autonomy (different teams can own different services)

### **2. Event-Driven Architecture**

Services communicate through events and message queues:

```
API Service --> Redis Queue --> Worker Service
     │                              │
     └── WebSocket ←─── Progress ←───┘
```

**Event Types:**
- `IMAGE_UPLOADED`: Triggered when files are uploaded
- `BATCH_PROCESSING_STARTED`: Batch processing begins
- `IMAGE_PROCESSED`: Individual image processing complete
- `BATCH_COMPLETED`: All images in batch processed
- `PROCESSING_ERROR`: Error during processing

### **3. Repository Pattern**

Data access is abstracted through repository interfaces:

```typescript
interface UserRepository {
  findById(id: string): Promise<User>;
  updateQuota(userId: string, used: number): Promise<void>;
  upgradeUserPlan(userId: string, plan: Plan): Promise<void>;
}

class PrismaUserRepository implements UserRepository {
  // Implementation using Prisma ORM
}
```

**Benefits:**
- Testability (easy to mock repositories)
- Database independence (can switch ORMs/databases)
- Clear separation of business logic and data access

## 💾 Data Architecture

### **Database Schema (PostgreSQL)**

```sql
-- Users table with OAuth integration
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  google_id VARCHAR(255) UNIQUE NOT NULL,
  email_hash VARCHAR(64) NOT NULL, -- SHA-256 hashed
  display_name VARCHAR(255),
  plan user_plan DEFAULT 'BASIC',
  quota_limit INTEGER NOT NULL,
  quota_used INTEGER DEFAULT 0,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Batches for image processing sessions
CREATE TABLE batches (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id UUID REFERENCES users(id) ON DELETE CASCADE,
  status batch_status DEFAULT 'PENDING',
  total_images INTEGER DEFAULT 0,
  processed_images INTEGER DEFAULT 0,
  keywords TEXT[], -- User-provided keywords
  created_at TIMESTAMP DEFAULT NOW(),
  completed_at TIMESTAMP
);

-- Individual images in processing batches
CREATE TABLE images (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  batch_id UUID REFERENCES batches(id) ON DELETE CASCADE,
  original_name VARCHAR(255) NOT NULL,
  proposed_name VARCHAR(255),
  file_path VARCHAR(500) NOT NULL,
  file_size BIGINT NOT NULL,
  mime_type VARCHAR(100) NOT NULL,
  checksum VARCHAR(64) NOT NULL, -- SHA-256
  vision_tags JSONB, -- AI-generated tags
  status image_status DEFAULT 'PENDING',
  created_at TIMESTAMP DEFAULT NOW(),
  processed_at TIMESTAMP
);

-- Payment transactions and subscriptions
CREATE TABLE payments (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id UUID REFERENCES users(id) ON DELETE CASCADE,
  stripe_session_id VARCHAR(255) UNIQUE,
  stripe_subscription_id VARCHAR(255),
  plan user_plan NOT NULL,
  amount INTEGER NOT NULL, -- cents
  currency VARCHAR(3) DEFAULT 'USD',
  status payment_status DEFAULT 'PENDING',
  created_at TIMESTAMP DEFAULT NOW(),
  completed_at TIMESTAMP
);
```

### **Indexing Strategy**

```sql
-- Performance optimization indexes
CREATE INDEX idx_users_google_id ON users(google_id);
CREATE INDEX idx_users_email_hash ON users(email_hash);
CREATE INDEX idx_batches_user_id ON batches(user_id);
CREATE INDEX idx_batches_status ON batches(status);
CREATE INDEX idx_images_batch_id ON images(batch_id);
CREATE INDEX idx_images_checksum ON images(checksum);
CREATE INDEX idx_payments_user_id ON payments(user_id);
CREATE INDEX idx_payments_stripe_session ON payments(stripe_session_id);

-- Composite indexes for common queries
CREATE INDEX idx_images_batch_status ON images(batch_id, status);
CREATE INDEX idx_batches_user_created ON batches(user_id, created_at DESC);
```

### **Data Flow Architecture**

```
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Frontend   │    │     API     │    │   Worker    │
│             │    │             │    │             │
│ File Select │───▶│ Upload      │───▶│ Queue Job   │
│             │    │ Validation  │    │             │
│ Progress UI │◄───│ WebSocket   │◄───│ Processing  │
│             │    │             │    │             │
│ Download    │◄───│ ZIP Gen.    │◄───│ Complete    │
└─────────────┘    └─────────────┘    └─────────────┘
                         │                    │
                   ┌─────────────┐    ┌─────────────┐
                   │ PostgreSQL  │    │ MinIO/S3    │
                   │             │    │             │
                   │ Metadata    │    │ Files       │
                   │ Users       │    │ Images      │
                   │ Batches     │    │ Results     │
                   └─────────────┘    └─────────────┘
```

## 🔐 Security Architecture

### **Authentication & Authorization Flow**

```
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Client    │    │   API       │    │   Google    │
│             │    │             │    │   OAuth     │
│ Login Click │───▶│ Redirect    │───▶│ Consent     │
│             │    │             │    │             │
│ Receive JWT │◄───│ Generate    │◄───│ Callback    │
│             │    │ Token       │    │             │
│ API Calls   │───▶│ Validate    │    │             │
│ w/ Bearer   │    │ JWT         │    │             │
└─────────────┘    └─────────────┘    └─────────────┘
```

**Security Layers:**

1. **Network Security**
   - HTTPS everywhere with TLS 1.3
   - CORS policies restricting origins
   - Rate limiting per IP and per user

2. **Application Security**
   - Input validation and sanitization
   - SQL injection prevention via Prisma
   - XSS protection with Content Security Policy
   - CSRF tokens for state-changing operations

3. **Data Security**
   - Email addresses hashed with SHA-256
   - JWT tokens with short expiration (24h)
   - File virus scanning with ClamAV
   - Secure file uploads with MIME validation

4. **Infrastructure Security**
   - Non-root container execution
   - Kubernetes security contexts
   - Secret management with encrypted storage
   - Network policies for service isolation

## 📊 Monitoring Architecture

### **Observability Stack**

```
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Application │    │ Prometheus  │    │  Grafana    │
│  Metrics    │───▶│  Storage    │───▶│ Dashboard   │
└─────────────┘    └─────────────┘    └─────────────┘

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Traces    │    │ OpenTelemetry│    │   Jaeger    │
│   Spans     │───▶│  Collector  │───▶│   UI        │
└─────────────┘    └─────────────┘    └─────────────┘

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Errors    │    │   Sentry    │    │   Alerts    │
│   Logs      │───▶│   Hub       │───▶│   Slack     │
└─────────────┘    └─────────────┘    └─────────────┘
```

**Key Metrics Tracked:**

1. **Business Metrics**
   - User registrations and conversions
   - Image processing volume and success rates
   - Revenue and subscription changes
   - Feature usage analytics

2. **System Metrics**
   - API response times and error rates
   - Database query performance
   - Queue depth and processing times
   - Resource utilization (CPU, memory, disk)

3. **Custom Metrics**
   - AI processing accuracy and confidence scores
   - File upload success rates
   - Virus detection events
   - User session duration

## 🚀 Deployment Architecture

### **Kubernetes Deployment**

```yaml
# Example deployment configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: seo-image-renamer/api:v1.0.0
        ports:
        - containerPort: 3001
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: database-secret
              key: url
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3001
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3001
          initialDelaySeconds: 5
          periodSeconds: 5
```

### **Service Dependencies**

```
┌─────────────┐    ┌─────────────┐
│  Frontend   │    │    API      │
│             │───▶│             │
│ Port: 3000  │    │ Port: 3001  │
└─────────────┘    └─────────────┘
                           │
                   ┌─────────────┐
                   │   Worker    │
                   │             │
                   │ Background  │
                   └─────────────┘
                           │
       ┌───────────────────┼───────────────────┐
       │                   │                   │
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ PostgreSQL  │    │    Redis    │    │   MinIO     │
│             │    │             │    │             │
│ Port: 5432  │    │ Port: 6379  │    │ Port: 9000  │
└─────────────┘    └─────────────┘    └─────────────┘
```

### **Scaling Strategy**

1. **Horizontal Pod Autoscaling (HPA)**
   ```yaml
   apiVersion: autoscaling/v2
   kind: HorizontalPodAutoscaler
   metadata:
     name: api-hpa
   spec:
     scaleTargetRef:
       apiVersion: apps/v1
       kind: Deployment
       name: api-deployment
     minReplicas: 2
     maxReplicas: 10
     metrics:
     - type: Resource
       resource:
         name: cpu
         target:
           type: Utilization
           averageUtilization: 70
   ```

2. **Vertical Pod Autoscaling (VPA)**
   - Automatic resource request/limit adjustments
   - Based on historical usage patterns
   - Prevents over/under-provisioning

## 🔄 CI/CD Pipeline

### **Build Pipeline**

```yaml
# .forgejo/workflows/ci.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'pnpm'

      - run: pnpm install
      - run: pnpm run lint
      - run: pnpm run test:coverage
      - run: pnpm run build

      - name: Cypress E2E Tests
        run: pnpm run cypress:run

  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run security audit
        run: pnpm audit --audit-level moderate

  build-images:
    needs: [test, security]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - name: Build and push Docker images
        run: |
          docker build -t api:${{ github.sha }} .
          docker push api:${{ github.sha }}
```

### **Deployment Pipeline**

```
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Build     │    │    Test     │    │   Deploy    │
│             │    │             │    │             │
│ • Compile   │───▶│ • Unit      │───▶│ • Staging   │
│ • Lint      │    │ • Integration│    │ • Production│
│ • Bundle    │    │ • E2E       │    │ • Rollback  │
└─────────────┘    └─────────────┘    └─────────────┘
```

## 📈 Performance Considerations

### **Caching Strategy**

1. **Application-Level Caching**
   - Redis for session storage
   - API response caching for static data
   - Database query result caching

2. **CDN Caching**
   - Static assets (images, CSS, JS)
   - Long-lived cache headers
   - Geographic distribution

3. **Database Optimizations**
   - Query optimization with EXPLAIN ANALYZE
   - Proper indexing strategy
   - Connection pooling

### **Load Testing Results**

```
Scenario: 1000 concurrent users uploading images
- Average Response Time: 180ms
- 95th Percentile: 350ms
- 99th Percentile: 800ms
- Error Rate: 0.02%
- Throughput: 5000 requests/minute
```

## 🔮 Future Architecture Considerations

### **Planned Enhancements**

1. **Service Mesh Integration**
   - Istio for advanced traffic management
   - mTLS between services
   - Advanced observability and security

2. **Event Sourcing**
   - Complete audit trail of all changes
   - Event replay capabilities
   - CQRS pattern implementation

3. **Multi-Region Deployment**
   - Geographic load balancing
   - Data replication strategies
   - Disaster recovery planning

4. **Machine Learning Pipeline**
   - Custom model training for image analysis
   - A/B testing framework for AI improvements
   - Real-time model performance monitoring

### **Scalability Roadmap**

```
Phase 1 (Current): Single region, basic autoscaling
Phase 2 (Q2 2025): Multi-region deployment
Phase 3 (Q3 2025): Service mesh implementation
Phase 4 (Q4 2025): ML pipeline integration
```

## 📚 Additional Resources

- **API Documentation**: [Swagger UI](http://localhost:3001/api/docs)
- **Database Migrations**: See `packages/api/prisma/migrations/`
- **Deployment Guides**: See `k8s/` directory
- **Monitoring Dashboards**: See `monitoring/grafana/dashboards/`
- **Security Policies**: See `docs/security/`

---

This architecture documentation is maintained alongside the codebase and should be updated with any significant architectural changes or additions to the system.