GitHub Repository Hygiene Plan
Status: Planned | Priority: Medium | Target: Q1 2026
Current State
- Git repo size: 4.0 GB
- Tracked files: 6,388
- Largest blobs: 6.4 MB (Jupyter notebooks)
Size Contributors
| Directory | Size | Purpose | Action |
|---|---|---|---|
playbooks/ |
139 MB | External reference material | Remove, link externally |
third_party/ |
13 MB | External dependencies | Consider submodules |
.tmp/ |
- | Temporary build artifacts | Remove from history |
Large Files in History
Files over 1 MB in git history:
playbooks/openai-cookbook/examples/*.ipynb(multiple 3-6 MB notebooks)playbooks/openai-cookbook/examples/dalle/images/*.png(3 MB each).tmp/venv-hook/(compiled Python extensions)
Cleanup Plan
Phase 1: Remove External References (Safe)
These directories contain external reference material that should be linked, not vendored:
# Remove playbooks directory
git rm -r playbooks/
# Update docs to link to external repos
# - openai-cookbook -> https://github.com/openai/openai-cookbook
# - trust-safety evals -> https://github.com/openai/evals
# - autoscaling -> link to AWS/GCP docs
Phase 2: Git History Cleanup (Requires Coordination)
Warning: This rewrites history and requires all collaborators to re-clone.
# Using git-filter-repo (recommended over filter-branch)
pip install git-filter-repo
# Remove large files from history
git filter-repo --path playbooks --invert-paths
git filter-repo --path .tmp --invert-paths
# Alternative: Use BFG Repo-Cleaner
bfg --delete-folders playbooks
bfg --strip-blobs-bigger-than 5M
Phase 3: LFS for Necessary Large Files
For files that must remain (images, binaries):
# Install Git LFS
git lfs install
# Track large file types
git lfs track "*.png"
git lfs track "*.jpeg"
git lfs track "*.ipynb"
# Migrate existing files
git lfs migrate import --include="*.png,*.jpeg"
Expected Results
| Metric | Before | After Phase 1 | After Phase 2 |
|---|---|---|---|
| Repo size | 4.0 GB | ~3.8 GB | ~500 MB |
| Tracked files | 6,388 | ~5,500 | ~5,500 |
| Clone time | ~5 min | ~4 min | ~1 min |
Files to Keep
- Core application code (
apps/,config/,infra/) - Documentation (
docs/) - Tests (
tests/) - Scripts (
scripts/,tools/) - Odoo customizations (
odoo/custom/)
Dependencies to Review
Consider converting to git submodules:
# If third_party is needed, use submodules
git submodule add https://github.com/openai/openai-python third_party/openai-python
Gitignore Improvements
Add to .gitignore:
# Build artifacts
.tmp/
*.so
*.dylib
# Large generated files
*.ipynb_checkpoints/
playbooks/
# IDE
.cursor/
Pre-Commit Hook Enhancement
Add file size check to pre-push hook:
# Reject files larger than 1MB
find . -size +1M -type f | grep -v node_modules | grep -v docker-data | while read f; do
if git ls-files --error-unmatch "$f" &>/dev/null; then
echo "ERROR: Large file tracked: $f"
exit 1
fi
done
Execution Timeline
- Week 1: Remove
playbooks/directory, update documentation - Week 2: Coordinate with team, schedule history rewrite window
- Week 3: Execute git-filter-repo, force push
- Week 4: All collaborators re-clone, verify integrity
Author: Claude Code | Created: 2025-12-19 | Review: TBD