GitHub Repository Hygiene Plan

Last updated 19 Dec 2025, 11:05

Status: Planned | Priority: Medium | Target: Q1 2026

Current State

  • Git repo size: 4.0 GB
  • Tracked files: 6,388
  • Largest blobs: 6.4 MB (Jupyter notebooks)

Size Contributors

Directory Size Purpose Action
playbooks/ 139 MB External reference material Remove, link externally
third_party/ 13 MB External dependencies Consider submodules
.tmp/ - Temporary build artifacts Remove from history

Large Files in History

Files over 1 MB in git history:

  • playbooks/openai-cookbook/examples/*.ipynb (multiple 3-6 MB notebooks)
  • playbooks/openai-cookbook/examples/dalle/images/*.png (3 MB each)
  • .tmp/venv-hook/ (compiled Python extensions)

Cleanup Plan

Phase 1: Remove External References (Safe)

These directories contain external reference material that should be linked, not vendored:

# Remove playbooks directory
git rm -r playbooks/

# Update docs to link to external repos
# - openai-cookbook -> https://github.com/openai/openai-cookbook
# - trust-safety evals -> https://github.com/openai/evals
# - autoscaling -> link to AWS/GCP docs

Phase 2: Git History Cleanup (Requires Coordination)

Warning: This rewrites history and requires all collaborators to re-clone.

# Using git-filter-repo (recommended over filter-branch)
pip install git-filter-repo

# Remove large files from history
git filter-repo --path playbooks --invert-paths
git filter-repo --path .tmp --invert-paths

# Alternative: Use BFG Repo-Cleaner
bfg --delete-folders playbooks
bfg --strip-blobs-bigger-than 5M

Phase 3: LFS for Necessary Large Files

For files that must remain (images, binaries):

# Install Git LFS
git lfs install

# Track large file types
git lfs track "*.png"
git lfs track "*.jpeg"
git lfs track "*.ipynb"

# Migrate existing files
git lfs migrate import --include="*.png,*.jpeg"

Expected Results

Metric Before After Phase 1 After Phase 2
Repo size 4.0 GB ~3.8 GB ~500 MB
Tracked files 6,388 ~5,500 ~5,500
Clone time ~5 min ~4 min ~1 min

Files to Keep

  • Core application code (apps/, config/, infra/)
  • Documentation (docs/)
  • Tests (tests/)
  • Scripts (scripts/, tools/)
  • Odoo customizations (odoo/custom/)

Dependencies to Review

Consider converting to git submodules:

# If third_party is needed, use submodules
git submodule add https://github.com/openai/openai-python third_party/openai-python

Gitignore Improvements

Add to .gitignore:

# Build artifacts
.tmp/
*.so
*.dylib

# Large generated files
*.ipynb_checkpoints/
playbooks/

# IDE
.cursor/

Pre-Commit Hook Enhancement

Add file size check to pre-push hook:

# Reject files larger than 1MB
find . -size +1M -type f | grep -v node_modules | grep -v docker-data | while read f; do
  if git ls-files --error-unmatch "$f" &>/dev/null; then
    echo "ERROR: Large file tracked: $f"
    exit 1
  fi
done

Execution Timeline

  1. Week 1: Remove playbooks/ directory, update documentation
  2. Week 2: Coordinate with team, schedule history rewrite window
  3. Week 3: Execute git-filter-repo, force push
  4. Week 4: All collaborators re-clone, verify integrity

Author: Claude Code | Created: 2025-12-19 | Review: TBD