llm-txts for use with your favorite long context LLM or RAG system

Most token sizes here aim for <800K tokens, but it remains a challenge with some particularly large documentation sets. Work is ongoing to pare them down for LLM digestability.

GitHub

Each entry is formatted as "tokens txt_name". Estimated token counts are given in the thousands, and are the txt byte lengths divided by 4.

License Acknowledgments