Cravatar Avatar NSFW Review Solution: Model Selection and Deployment Recommendations

Background

The Cravatar avatar service requires NSFW content moderation for user-uploaded avatars. Key scenario characteristics:

  • Image dimensions: 80×80 to 512×512
  • Daily volume: 1,000–5,000 unique images
  • Requirements: High accuracy and low false-positive rate—especially critical for anime-style and artistic avatars

Recommended Model: Marqo/nsfw-image-detection-384

We compared Falconsai/nsfw_image_detection and Marqo/nsfw-image-detection-384, and recommend the latter:

Falconsai Marqo (Recommended)
Input resolution 224×224 384×384
Architecture ViT-base-patch16-224 ViT-base-patch16-384
Parameter count ~86M ~86M
Edge-case performance Moderate; frequent false positives on anime/artistic avatars Superior; specifically optimized for ambiguous (“gray-zone”) cases

Why choose Marqo:

  1. Its 384×384 input resolution better matches typical avatar sizes: downsampling from 512×512 incurs minimal quality loss, while upsampling 80×80 preserves more discriminative features.
  2. The Marqo team has fine-tuned this model for image search and classification tasks, significantly reducing false positives in edge cases.
  3. Deployment complexity is identical—only the model name needs changing.

Recommended Architecture: Two-Tier Moderation

User uploads avatar
    │
    ▼
[In-house pre-screening layer] Marqo/nsfw-image-detection-384
    │
    ├── confidence > 0.95 → immediately flagged as NSFW and blocked  
    ├── confidence < 0.05 → automatically approved  
    └── gray zone (0.05–0.95) → forwarded to cloud API for secondary review  
                              │
                              ├── Domestic traffic → Tencent Cloud / Alibaba Cloud Content Moderation API  
                              └── Overseas traffic → Cloudflare Workers AI or Hugging Face Inference Endpoints

This architecture eliminates 70–80% of cloud API calls, substantially reducing operational costs.

Hardware Requirements Assessment

This model is extremely lightweight—no GPU required:

Configuration CPU Memory GPU Latency per image Daily throughput
Minimal 2 cores 4 GB None 100–200 ms 5,000+
Recommended 4 cores 8 GB None 50–100 ms 50,000+
GPU-accelerated 2 cores 4 GB T4-class 5–15 ms Millions

Memory breakdown:

  • PyTorch runtime: ~1–1.5 GB
  • Model weights (FP16): ~172 MB
  • Peak inference memory (including activations): ~500–800 MB
  • Total: ~2 GB — well within a 4 GB RAM budget

Further optimization: Replace PyTorch with ONNX Runtime — memory usage drops to ~800 MB and inference speed increases 2–3×; even a single-core, 2 GB system suffices.

Deployment Reference

FastAPI + Transformers pipeline — core code snippet:

from fastapi import FastAPI, UploadFile
from transformers import pipeline
from PIL import Image
import io

app = FastAPI()
classifier = pipeline("image-classification", model="Marqo/nsfw-image-detection-384")

@app.post("/classify")
async def classify(file: UploadFile):
    image = Image.open(io.BytesIO(await file.read())).convert("RGB")
    result = classifier(image)
    return {"predictions": result}

Avatar-Specific Considerations

  1. Low-resolution handling: 80×80 avatars lack sufficient detail. Upscale to 384×384 before inference (aligned with the ViT’s native input size).
  2. Anime/digital art: These are prone to false positives. Widen the gray-zone threshold and defer final judgment to the cloud API.
  3. Batch processing: Batch inference is supported; using batch_size=32 improves throughput by 5–8×.

Cost Estimation (5,000 images/day)

  • In-house pre-screening: One 2-core, 4 GB cloud server ≈ ¥50–100/month
  • Assuming 20% enter the gray zone requiring cloud API review: 1,000 images/day
  • Tencent Cloud image moderation: ¥1.5 per 1,000 images → ≈ ¥45/month
  • Total estimated cost: ~¥100–150/month
    (Compared to full-cloud moderation at ~¥225/month — ~40% savings)

Cost advantages of in-house pre-screening become even more pronounced at higher volumes.


@modiqi Here’s the complete proposal—covering model selection, system architecture, hardware requirements, deployment code, and cost analysis. Please evaluate and let us know how you’d like to proceed. Feel free to ask any questions. If needed, I can also set up a PoC on your cluster’s macOS host to run sample tests.