Remember when analyzing financial documents meant drowning in hundreds of pages of dense corporate jargon? Those days are gone. Today, you can process SEC filings faster than a day trader processes coffee � and with significantly better accuracy.
This guide shows you how to analyze SEC filings using Ollama, transforming overwhelming 10-K and 10-Q documents into actionable insights. You'll learn to extract key financial metrics, identify risks, and summarize complex information using local AI models.
What Are SEC Filings and Why Analyze Them?
SEC filings are mandatory financial reports that public companies submit to the Securities and Exchange Commission. These documents contain critical information about company performance, risks, and future outlook.
Key Filing Types
10-K Annual Reports provide comprehensive business overviews, including:
- Financial performance for the fiscal year
- Business operations and strategy
- Risk factors and management discussion
- Audited financial statements
10-Q Quarterly Reports offer interim updates covering:
- Quarterly financial results
- Significant events and changes
- Updated risk assessments
- Unaudited financial statements
The challenge? These documents often exceed 100 pages and contain complex financial terminology that takes hours to analyze manually.
Why Use Ollama for SEC Filings Analysis?
Ollama offers several advantages for financial document processing:
Privacy and Security: Your sensitive financial data stays on your local machine, never reaching external servers.
Cost Efficiency: No API costs or usage limits � analyze unlimited documents once Ollama is installed.
Customization: Fine-tune models for specific financial terminology and analysis requirements.
Speed: Process large documents in minutes rather than hours of manual review.
Setting Up Your Ollama Environment
Prerequisites
Before starting, ensure you have:
- 16GB+ RAM (32GB recommended for large documents)
- Python 3.8 or higher
- Basic command line familiarity
Installing Ollama
Download and install Ollama from the official website:
# For macOS
curl -fsSL https://ollama.ai/install.sh | sh
# For Linux
curl -fsSL https://ollama.ai/install.sh | sh
# For Windows
# Download installer from https://ollama.ai/download
Choosing the Right Model
For SEC filings analysis, these models work best:
# Install recommended models
ollama pull llama3.1:8b # Good balance of speed and accuracy
ollama pull llama3.1:13b # Better for complex financial analysis
ollama pull codellama:7b # Excellent for structured data extraction
Model Selection Guidelines:
- llama3.1:8b: Best for quick summaries and basic analysis
- llama3.1:13b: Ideal for detailed financial interpretation
- codellama:7b: Perfect for extracting structured financial data
Essential Python Libraries for SEC Analysis
Install the required dependencies:
pip install requests beautifulsoup4 pandas numpy python-dotenv
pip install langchain langchain-community
pip install pypdf2 textract
Create your project structure:
sec-analysis/
+-- data/
� +-- raw/
� +-- processed/
+-- src/
� +-- downloader.py
� +-- processor.py
� +-- analyzer.py
+-- outputs/
+-- config.py
Step 1: Downloading SEC Filings
SEC EDGAR API Integration
Create a downloader module to fetch filings:
# src/downloader.py
import requests
import json
from pathlib import Path
import time
class SECDownloader:
def __init__(self, user_agent="YourCompany analysis@yourcompany.com"):
self.base_url = "https://data.sec.gov/api/xbrl/companyfacts"
self.headers = {"User-Agent": user_agent}
self.session = requests.Session()
self.session.headers.update(self.headers)
def get_company_cik(self, ticker):
"""Get CIK number from ticker symbol"""
url = "https://www.sec.gov/files/company_tickers.json"
response = self.session.get(url)
companies = response.json()
for company in companies.values():
if company['ticker'] == ticker.upper():
return str(company['cik_str']).zfill(10)
return None
def download_filing(self, cik, filing_type="10-K", count=1):
"""Download recent filings for a company"""
url = f"https://data.sec.gov/submissions/CIK{cik}.json"
try:
response = self.session.get(url)
response.raise_for_status()
data = response.json()
# Filter for specific filing type
filings = data['filings']['recent']
filing_urls = []
for i, form in enumerate(filings['form']):
if form == filing_type and len(filing_urls) < count:
accession = filings['accessionNumber'][i]
filing_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{accession.replace('-', '')}/{accession}.txt"
filing_urls.append({
'url': filing_url,
'date': filings['filingDate'][i],
'accession': accession
})
return filing_urls
except requests.RequestException as e:
print(f"Error downloading filing: {e}")
return []
def save_filing(self, filing_info, save_path):
"""Save filing content to file"""
try:
time.sleep(0.1) # Rate limiting
response = self.session.get(filing_info['url'])
response.raise_for_status()
Path(save_path).parent.mkdir(parents=True, exist_ok=True)
with open(save_path, 'w', encoding='utf-8') as f:
f.write(response.text)
print(f"Downloaded: {filing_info['accession']}")
return True
except Exception as e:
print(f"Error saving filing: {e}")
return False
# Usage example
downloader = SECDownloader()
cik = downloader.get_company_cik("AAPL")
filings = downloader.download_filing(cik, "10-K", 1)
for filing in filings:
downloader.save_filing(filing, f"data/raw/{filing['accession']}.txt")
Step 2: Processing and Parsing SEC Documents
Document Preprocessing
Raw SEC filings contain HTML tags, headers, and formatting that need cleaning:
# src/processor.py
import re
from bs4 import BeautifulSoup
import pandas as pd
from pathlib import Path
class SECProcessor:
def __init__(self):
self.financial_sections = [
"CONSOLIDATED STATEMENTS OF OPERATIONS",
"CONSOLIDATED BALANCE SHEETS",
"CONSOLIDATED STATEMENTS OF CASH FLOWS",
"ITEM 1A. RISK FACTORS",
"ITEM 2. MANAGEMENT'S DISCUSSION AND ANALYSIS"
]
def clean_html(self, content):
"""Remove HTML tags and clean text"""
soup = BeautifulSoup(content, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text and clean whitespace
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = ' '.join(chunk for chunk in chunks if chunk)
return text
def extract_sections(self, content):
"""Extract specific sections from SEC filing"""
sections = {}
text = self.clean_html(content)
# Define section patterns
patterns = {
"business_overview": r"ITEM 1\.?\s*BUSINESS(.*?)(?=ITEM 1A|ITEM 2)",
"risk_factors": r"ITEM 1A\.?\s*RISK FACTORS(.*?)(?=ITEM 1B|ITEM 2)",
"md_and_a": r"ITEM 2\.?\s*MANAGEMENT'S DISCUSSION AND ANALYSIS(.*?)(?=ITEM 3|ITEM 4)",
"financial_statements": r"CONSOLIDATED STATEMENTS OF OPERATIONS(.*?)(?=CONSOLIDATED BALANCE SHEETS|ITEM)"
}
for section_name, pattern in patterns.items():
match = re.search(pattern, text, re.DOTALL | re.IGNORECASE)
if match:
sections[section_name] = match.group(1).strip()
return sections
def extract_financial_tables(self, content):
"""Extract financial data tables"""
soup = BeautifulSoup(content, 'html.parser')
tables = soup.find_all('table')
financial_data = []
for table in tables:
# Look for tables with financial data indicators
table_text = table.get_text().lower()
if any(indicator in table_text for indicator in ['revenue', 'net income', 'total assets']):
try:
df = pd.read_html(str(table))[0]
financial_data.append(df)
except Exception as e:
continue
return financial_data
def chunk_text(self, text, chunk_size=4000, overlap=200):
"""Split text into overlapping chunks for processing"""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = ' '.join(words[i:i + chunk_size])
chunks.append(chunk)
if i + chunk_size >= len(words):
break
return chunks
# Usage example
processor = SECProcessor()
with open("data/raw/0000320193-24-000007.txt", 'r') as f:
content = f.read()
sections = processor.extract_sections(content)
chunks = processor.chunk_text(sections.get('business_overview', ''))
Step 3: Analyzing Documents with Ollama
Setting Up Ollama Client
Create an analysis module that interfaces with Ollama:
# src/analyzer.py
import requests
import json
import time
from typing import List, Dict, Any
class OllamaAnalyzer:
def __init__(self, model_name="llama3.1:8b", base_url="http://localhost:11434"):
self.model_name = model_name
self.base_url = base_url
self.session = requests.Session()
def generate_response(self, prompt: str, context: str = "") -> str:
"""Generate response using Ollama model"""
full_prompt = f"{context}\n\nPrompt: {prompt}" if context else prompt
payload = {
"model": self.model_name,
"prompt": full_prompt,
"stream": False,
"options": {
"temperature": 0.1, # Lower temperature for consistent financial analysis
"top_p": 0.9,
"num_predict": 2000
}
}
try:
response = self.session.post(
f"{self.base_url}/api/generate",
json=payload,
timeout=300
)
response.raise_for_status()
return response.json()["response"]
except requests.exceptions.RequestException as e:
print(f"Error communicating with Ollama: {e}")
return ""
def analyze_financial_performance(self, financial_text: str) -> Dict[str, Any]:
"""Analyze financial performance from text"""
prompt = """
Analyze the following financial information and provide:
1. Key financial metrics (revenue, profit margins, growth rates)
2. Year-over-year comparisons
3. Performance trends
4. Notable financial highlights or concerns
Format your response as structured data with clear sections.
"""
response = self.generate_response(prompt, financial_text)
return {"analysis": response, "section": "financial_performance"}
def extract_risk_factors(self, risk_text: str) -> Dict[str, Any]:
"""Extract and categorize risk factors"""
prompt = """
Extract and categorize the main risk factors from this text:
1. Market risks
2. Operational risks
3. Financial risks
4. Regulatory risks
5. Technology risks
For each category, list the top 3 most significant risks with brief explanations.
"""
response = self.generate_response(prompt, risk_text)
return {"analysis": response, "section": "risk_factors"}
def summarize_business_overview(self, business_text: str) -> Dict[str, Any]:
"""Summarize business operations and strategy"""
prompt = """
Provide a comprehensive business summary including:
1. Core business activities and revenue sources
2. Market position and competitive advantages
3. Recent strategic initiatives or changes
4. Future outlook and growth plans
Keep the summary concise but comprehensive.
"""
response = self.generate_response(prompt, business_text)
return {"analysis": response, "section": "business_overview"}
def compare_quarterly_results(self, current_q: str, previous_q: str) -> Dict[str, Any]:
"""Compare quarterly results between periods"""
prompt = """
Compare these two quarterly reports and identify:
1. Key financial changes (revenue, expenses, profit)
2. Significant business developments
3. Changes in risk profile
4. Management outlook differences
Highlight the most important changes and their implications.
"""
context = f"Current Quarter:\n{current_q}\n\nPrevious Quarter:\n{previous_q}"
response = self.generate_response(prompt, context)
return {"analysis": response, "section": "quarterly_comparison"}
# Usage example
analyzer = OllamaAnalyzer()
# Analyze different sections
if 'financial_statements' in sections:
financial_analysis = analyzer.analyze_financial_performance(
sections['financial_statements']
)
if 'risk_factors' in sections:
risk_analysis = analyzer.extract_risk_factors(
sections['risk_factors']
)
Step 4: Advanced Analysis Techniques
Sentiment Analysis for Management Discussion
def analyze_management_sentiment(self, md_text: str) -> Dict[str, Any]:
"""Analyze sentiment in management discussion"""
prompt = """
Analyze the tone and sentiment of this management discussion:
1. Overall sentiment (positive, negative, neutral)
2. Confidence level in future performance
3. Key concerns or optimistic statements
4. Language indicators of financial stress or strength
Provide specific examples from the text to support your analysis.
"""
response = self.generate_response(prompt, md_text)
return {"analysis": response, "section": "management_sentiment"}
Competitive Analysis Extraction
def extract_competitive_insights(self, business_text: str) -> Dict[str, Any]:
"""Extract competitive positioning and market analysis"""
prompt = """
Extract competitive intelligence from this business description:
1. Main competitors mentioned
2. Market share or positioning claims
3. Competitive advantages highlighted
4. Market trends and challenges discussed
5. Strategic responses to competition
Focus on actionable competitive insights.
"""
response = self.generate_response(prompt, business_text)
return {"analysis": response, "section": "competitive_analysis"}
Step 5: Generating Comprehensive Reports
Report Generation System
# src/report_generator.py
import json
from datetime import datetime
from pathlib import Path
class ReportGenerator:
def __init__(self, output_dir="outputs"):
self.output_dir = Path(output_dir)
self.output_dir.mkdir(exist_ok=True)
def generate_comprehensive_report(self, analyses: List[Dict], company_info: Dict) -> str:
"""Generate a comprehensive analysis report"""
report_sections = []
# Header
report_sections.append(f"""
# SEC Filing Analysis Report
**Company:** {company_info.get('name', 'Unknown')}
**Ticker:** {company_info.get('ticker', 'Unknown')}
**Filing Date:** {company_info.get('filing_date', 'Unknown')}
**Analysis Date:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
---
""")
# Executive Summary
report_sections.append("""
## Executive Summary
""")
# Process each analysis section
for analysis in analyses:
section_title = analysis['section'].replace('_', ' ').title()
report_sections.append(f"""
## {section_title}
{analysis['analysis']}
---
""")
# Combine all sections
full_report = '\n'.join(report_sections)
# Save report
filename = f"{company_info.get('ticker', 'company')}_analysis_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md"
report_path = self.output_dir / filename
with open(report_path, 'w', encoding='utf-8') as f:
f.write(full_report)
return str(report_path)
def generate_json_summary(self, analyses: List[Dict], company_info: Dict) -> str:
"""Generate JSON summary for programmatic use"""
summary = {
"company_info": company_info,
"analysis_timestamp": datetime.now().isoformat(),
"sections": analyses
}
filename = f"{company_info.get('ticker', 'company')}_summary_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
summary_path = self.output_dir / filename
with open(summary_path, 'w', encoding='utf-8') as f:
json.dump(summary, f, indent=2, ensure_ascii=False)
return str(summary_path)
Step 6: Complete Analysis Pipeline
Main Analysis Script
# main_analysis.py
import argparse
from src.downloader import SECDownloader
from src.processor import SECProcessor
from src.analyzer import OllamaAnalyzer
from src.report_generator import ReportGenerator
def analyze_company(ticker: str, filing_type: str = "10-K", model: str = "llama3.1:8b"):
"""Complete analysis pipeline for a company"""
# Initialize components
downloader = SECDownloader()
processor = SECProcessor()
analyzer = OllamaAnalyzer(model_name=model)
reporter = ReportGenerator()
print(f"Starting analysis for {ticker}...")
# Step 1: Download filing
cik = downloader.get_company_cik(ticker)
if not cik:
print(f"Could not find CIK for ticker {ticker}")
return
filings = downloader.download_filing(cik, filing_type, 1)
if not filings:
print(f"No {filing_type} filings found for {ticker}")
return
# Download the most recent filing
filing_info = filings[0]
filing_path = f"data/raw/{filing_info['accession']}.txt"
if not downloader.save_filing(filing_info, filing_path):
print("Failed to download filing")
return
# Step 2: Process document
print("Processing document...")
with open(filing_path, 'r', encoding='utf-8') as f:
content = f.read()
sections = processor.extract_sections(content)
# Step 3: Analyze sections
print("Analyzing sections with Ollama...")
analyses = []
if 'business_overview' in sections:
business_analysis = analyzer.summarize_business_overview(sections['business_overview'])
analyses.append(business_analysis)
if 'risk_factors' in sections:
risk_analysis = analyzer.extract_risk_factors(sections['risk_factors'])
analyses.append(risk_analysis)
if 'financial_statements' in sections:
financial_analysis = analyzer.analyze_financial_performance(sections['financial_statements'])
analyses.append(financial_analysis)
if 'md_and_a' in sections:
sentiment_analysis = analyzer.analyze_management_sentiment(sections['md_and_a'])
analyses.append(sentiment_analysis)
# Step 4: Generate reports
print("Generating reports...")
company_info = {
'name': ticker, # Could be enhanced with company name lookup
'ticker': ticker,
'filing_date': filing_info['date'],
'filing_type': filing_type
}
report_path = reporter.generate_comprehensive_report(analyses, company_info)
json_path = reporter.generate_json_summary(analyses, company_info)
print(f"Analysis complete!")
print(f"Report saved to: {report_path}")
print(f"JSON summary saved to: {json_path}")
return report_path, json_path
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Analyze SEC filings with Ollama')
parser.add_argument('ticker', help='Company ticker symbol')
parser.add_argument('--filing-type', default='10-K', choices=['10-K', '10-Q'], help='Filing type to analyze')
parser.add_argument('--model', default='llama3.1:8b', help='Ollama model to use')
args = parser.parse_args()
analyze_company(args.ticker, args.filing_type, args.model)
Step 7: Running Your Analysis
Command Line Usage
# Analyze Apple's latest 10-K
python main_analysis.py AAPL
# Analyze Microsoft's latest 10-Q with larger model
python main_analysis.py MSFT --filing-type 10-Q --model llama3.1:13b
# Analyze Tesla's annual report
python main_analysis.py TSLA --filing-type 10-K
Sample Output Structure
Your analysis will generate files like:
outputs/
+-- AAPL_analysis_20250709_143022.md
+-- AAPL_summary_20250709_143022.json
+-- MSFT_analysis_20250709_151045.md
+-- MSFT_summary_20250709_151045.json
Advanced Features and Customizations
Custom Analysis Prompts
Tailor prompts for specific analysis needs:
# Industry-specific analysis
def analyze_tech_company(self, business_text: str) -> Dict[str, Any]:
"""Specialized analysis for technology companies"""
prompt = """
Analyze this technology company with focus on:
1. R&D investments and innovation pipeline
2. Software vs hardware revenue mix
3. Platform and ecosystem strategies
4. AI-ML capabilities and implementations
5. Data privacy and security measures
Provide insights relevant to tech industry investors.
"""
response = self.generate_response(prompt, business_text)
return {"analysis": response, "section": "tech_analysis"}
Batch Processing Multiple Companies
def batch_analyze_companies(tickers: List[str], filing_type: str = "10-K"):
"""Analyze multiple companies in batch"""
results = {}
for ticker in tickers:
try:
print(f"Analyzing {ticker}...")
report_path, json_path = analyze_company(ticker, filing_type)
results[ticker] = {
'status': 'success',
'report_path': report_path,
'json_path': json_path
}
except Exception as e:
results[ticker] = {
'status': 'error',
'error': str(e)
}
return results
# Usage
tech_companies = ["AAPL", "MSFT", "GOOGL", "AMZN"]
results = batch_analyze_companies(tech_companies)
Performance Optimization Tips
Memory Management
For large documents, implement memory-efficient processing:
def process_large_document(self, file_path: str, chunk_size: int = 2000):
"""Process large documents in chunks to manage memory"""
analyses = []
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
sections = self.extract_sections(content)
for section_name, section_content in sections.items():
chunks = self.chunk_text(section_content, chunk_size)
chunk_analyses = []
for i, chunk in enumerate(chunks):
analysis = self.analyze_chunk(chunk, section_name)
chunk_analyses.append(analysis)
# Combine chunk analyses
combined_analysis = self.combine_chunk_analyses(chunk_analyses, section_name)
analyses.append(combined_analysis)
return analyses
Caching Results
Implement caching to avoid reprocessing:
import hashlib
import pickle
from pathlib import Path
class AnalysisCache:
def __init__(self, cache_dir="cache"):
self.cache_dir = Path(cache_dir)
self.cache_dir.mkdir(exist_ok=True)
def get_cache_key(self, content: str, prompt: str) -> str:
"""Generate cache key from content and prompt"""
combined = f"{content}{prompt}"
return hashlib.md5(combined.encode()).hexdigest()
def get_cached_result(self, cache_key: str):
"""Retrieve cached result if available"""
cache_file = self.cache_dir / f"{cache_key}.pkl"
if cache_file.exists():
with open(cache_file, 'rb') as f:
return pickle.load(f)
return None
def cache_result(self, cache_key: str, result: Any):
"""Cache analysis result"""
cache_file = self.cache_dir / f"{cache_key}.pkl"
with open(cache_file, 'wb') as f:
pickle.dump(result, f)
Troubleshooting Common Issues
Model Loading Problems
def check_model_availability(self, model_name: str) -> bool:
"""Check if model is available in Ollama"""
try:
response = self.session.get(f"{self.base_url}/api/tags")
if response.status_code == 200:
models = response.json()
available_models = [model['name'] for model in models.get('models', [])]
return model_name in available_models
return False
except Exception:
return False
# Usage
if not analyzer.check_model_availability("llama3.1:8b"):
print("Model not found. Installing...")
os.system("ollama pull llama3.1:8b")
SEC API Rate Limiting
class RateLimitedDownloader(SECDownloader):
def __init__(self, requests_per_second: float = 10):
super().__init__()
self.min_delay = 1.0 / requests_per_second
self.last_request_time = 0
def _enforce_rate_limit(self):
"""Enforce rate limiting between requests"""
current_time = time.time()
time_since_last = current_time - self.last_request_time
if time_since_last < self.min_delay:
time.sleep(self.min_delay - time_since_last)
self.last_request_time = time.time()
Best Practices and Tips
Document Processing Best Practices
- Text Cleaning: Always clean HTML and formatting before analysis
- Section Identification: Use regex patterns to identify key sections accurately
- Chunk Size: Optimize chunk sizes based on your model's context window
- Error Handling: Implement robust error handling for network and parsing issues
Analysis Quality Improvements
- Consistent Prompts: Use consistent, well-tested prompts for reliable results
- Temperature Settings: Use low temperature (0.1-0.3) for factual analysis
- Validation: Cross-reference extracted data with original documents
- Context Preservation: Maintain context when processing document chunks
Security Considerations
- Data Privacy: Ensure sensitive financial data stays local
- Access Control: Implement proper file permissions for cached data
- Audit Trail: Log all analysis activities for compliance
- Input Validation: Validate all user inputs and file paths
Conclusion
Analyzing SEC filings with Ollama transforms a time-consuming manual process into an automated, insightful workflow. You can now process complex financial documents in minutes rather than hours, extract key insights with high accuracy, and generate comprehensive reports that highlight critical business information.
The combination of local AI processing with Ollama ensures your sensitive financial data remains secure while providing powerful analysis capabilities. Whether you're analyzing quarterly reports for investment decisions or conducting comprehensive annual report reviews, this guide provides the foundation for efficient, automated SEC filings analysis.
Start with the basic pipeline and gradually add advanced features like sentiment analysis, competitive intelligence extraction, and batch processing to create a comprehensive financial analysis system tailored to your specific needs.
Ready to streamline your financial analysis workflow? Download Ollama today and begin processing SEC filings with the power of local AI models.