Skip to content

itzlambda/Tutorial-Codebase-Knowledge

 
 

Repository files navigation

Turns Codebase into Easy Tutorial with AI

License: MIT

Ever stared at a new codebase written by others feeling completely lost? This tutorial shows you how to build an AI agent that analyzes GitHub repositories and creates beginner-friendly tutorials explaining exactly how the code works.

This is a tutorial project of Pocket Flow, a 100-line LLM framework. It crawls GitHub repositories and builds a knowledge base from the code. It analyzes entire codebases to identify core abstractions and how they interact, and transforms complex code into beginner-friendly tutorials with clear visualizations.

  🔸 🎉 Reached Hacker News Front Page (April 2025) with >800 up‑votes: Discussion »

⭐ Example Results for Popular GitHub Repositories!

🤯 All these tutorials are generated entirely by AI by crawling the GitHub repo!

  • AutoGen Core - Build AI teams that talk, think, and solve problems together like coworkers!

  • Browser Use - Let AI surf the web for you, clicking buttons and filling forms like a digital assistant!

  • Celery - Supercharge your app with background tasks that run while you sleep!

  • Click - Turn Python functions into slick command-line tools with just a decorator!

  • Codex - Turn plain English into working code with this AI terminal wizard!

  • Crawl4AI - Train your AI to extract exactly what matters from any website!

  • CrewAI - Assemble a dream team of AI specialists to tackle impossible problems!

  • DSPy - Build LLM apps like Lego blocks that optimize themselves!

  • FastAPI - Create APIs at lightning speed with automatic docs that clients will love!

  • Flask - Craft web apps with minimal code that scales from prototype to production!

  • Google A2A - The universal language that lets AI agents collaborate across borders!

  • LangGraph - Design AI agents as flowcharts where each step remembers what happened before!

  • LevelDB - Store data at warp speed with Google's engine that powers blockchains!

  • MCP Python SDK - Build powerful apps that communicate through an elegant protocol without sweating the details!

  • NumPy Core - Master the engine behind data science that makes Python as fast as C!

  • OpenManus - Build AI agents with digital brains that think, learn, and use tools just like humans do!

  • Pydantic Core - Validate data at rocket speed with just Python type hints!

  • Requests - Talk to the internet in Python with code so simple it feels like cheating!

  • SmolaAgents - Build tiny AI agents that punch way above their weight class!

  • Showcase Your AI-Generated Tutorials in Discussions!

🚀 Getting Started

  1. Clone this repository

  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up LLM in utils/call_llm.py by providing credentials. By default, you can use the AI Studio key with this client for Gemini Pro 2.5:

    # utils/call_llm.py - Using litellm
    import litellm
    import os
    import logging
    import json
    
    # By default, we Google Gemini 2.5 pro, as it shows great performance for code understanding
    def call_llm(prompt: str, use_cache: bool = True) -> str:
        # ... (logging and caching logic) ...
    
        # Call the LLM using litellm
        model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-exp-03-25") # Example model
        # For Vertex AI, ensure VERTEXAI_PROJECT and VERTEXAI_LOCATION env vars are set
        # litellm_model_name = f"vertex_ai/{model}" # Use this if provider prefix is needed
        litellm_model_name = model # Assuming prefix is not needed or handled by env vars
        
        try:
            response = litellm.completion(
                model=litellm_model_name,
                messages=[{"role": "user", "content": prompt}]
            )
            response_text = response.choices[0].message.content
        except Exception as e:
            logger.error(f"LiteLLM call failed: {e}")
            raise e # Or handle error appropriately
        
        # ... (logging and caching logic) ...
        return response_text
  4. Generate a complete codebase tutorial by running the main script:

    # Analyze a GitHub repository
    python main.py --repo https://github.com/username/repo --include "*.py" "*.js" --exclude "tests/*" --max-size 50000
    
    # Or, analyze a local directory
    python main.py --dir /path/to/your/codebase --include "*.py" --exclude "*test*"
    
    # Or, generate a tutorial in Chinese
    python main.py --repo https://github.com/username/repo --language "Chinese"
    • --repo or --dir - Specify either a GitHub repo URL or a local directory path (required, mutually exclusive)
    • -n, --name - Project name (optional, derived from URL/directory if omitted)
    • -t, --token - GitHub token (or set GITHUB_TOKEN environment variable)
    • -o, --output - Output directory (default: ./output)
    • -i, --include - Files to include (e.g., ".py" ".js")
    • -e, --exclude - Files to exclude (e.g., "tests/" "docs/")
    • -s, --max-size - Maximum file size in bytes (default: 100KB)
    • --language - Language for the generated tutorial (default: "english")

The application will crawl the repository, analyze the codebase structure, generate tutorial content in the specified language, and save the output in the specified directory (default: ./output).

💡 Development Tutorial

  • I built using Agentic Coding, the fastest development paradigm, where humans simply design and agents code.

  • The secret weapon is Pocket Flow, a 100-line LLM framework that lets Agents (e.g., Cursor AI) build for you

  • Check out the Step-by-step YouTube development tutorial:



About

Fork with some opinionated changes

Resources

License

Stars

Watchers

Forks

Languages

  • Python 96.6%
  • JavaScript 3.4%