Ever tried organizing your sock drawer while blindfolded? That's how most crypto investors feel when analyzing the 20,000+ cryptocurrencies flooding the market. Good news: Ollama's machine learning models can sort this digital chaos into meaningful market sectors.
Cryptocurrency clustering analysis with Ollama transforms raw blockchain data into actionable market insights. This guide shows you how to classify cryptocurrencies by sector, identify market trends, and make data-driven investment decisions.
You'll learn to:
- Set up Ollama for cryptocurrency analysis
- Implement clustering algorithms for market classification
- Analyze real crypto data with Python
- Interpret results for trading strategies
Why Cryptocurrency Market Classification Matters
The crypto market lacks standardized sector classifications. Traditional finance uses GICS sectors, but crypto operates differently. Without proper classification, investors make decisions based on incomplete market understanding.
Problems with manual crypto classification:
- Subjective categorization bias
- Inconsistent sector definitions
- Time-intensive research process
- Missing emerging sector trends
Benefits of automated clustering analysis:
- Objective data-driven classifications
- Consistent sector definitions
- Real-time market insights
- Identifies new market segments
What is Ollama for Cryptocurrency Analysis?
Ollama provides local language models that analyze cryptocurrency data without cloud dependencies. Unlike traditional clustering tools, Ollama combines natural language processing with numerical analysis.
Key advantages:
- Local processing (no API costs)
- Multiple model options
- Privacy-focused analysis
- Customizable for crypto-specific needs
Prerequisites and Setup
System Requirements
# Minimum requirements
- RAM: 8GB (16GB recommended)
- Storage: 10GB free space
- OS: Linux, macOS, or Windows
- Python: 3.8+
Install Ollama
# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# Download from https://ollama.com/download
Install Required Python Libraries
# requirements.txt
ollama==0.2.1
pandas==2.1.0
numpy==1.24.3
scikit-learn==1.3.0
matplotlib==3.7.2
seaborn==0.12.2
requests==2.31.0
plotly==5.15.0
pip install -r requirements.txt
Pull Ollama Models
# Download models for analysis
ollama pull llama2:7b
ollama pull codellama:13b
ollama pull mistral:7b
Data Collection and Preparation
Fetch Cryptocurrency Data
import pandas as pd
import requests
import numpy as np
from datetime import datetime, timedelta
def fetch_crypto_data(limit=100):
"""
Fetch cryptocurrency data from CoinGecko API
Returns DataFrame with crypto metrics
"""
url = "https://api.coingecko.com/api/v3/coins/markets"
params = {
'vs_currency': 'usd',
'order': 'market_cap_desc',
'per_page': limit,
'page': 1,
'sparkline': False,
'price_change_percentage': '7d,30d,1y'
}
response = requests.get(url, params=params)
data = response.json()
# Convert to DataFrame
df = pd.DataFrame(data)
# Select relevant columns
columns = [
'id', 'symbol', 'name', 'current_price', 'market_cap',
'market_cap_rank', 'fully_diluted_valuation', 'total_volume',
'high_24h', 'low_24h', 'price_change_24h', 'price_change_percentage_24h',
'market_cap_change_24h', 'market_cap_change_percentage_24h',
'circulating_supply', 'total_supply', 'max_supply', 'ath', 'atl'
]
return df[columns].copy()
# Fetch data
crypto_df = fetch_crypto_data(200)
print(f"Loaded {len(crypto_df)} cryptocurrencies")
Feature Engineering
def create_features(df):
"""
Create features for clustering analysis
"""
# Handle missing values
df = df.fillna(0)
# Calculate additional metrics
df['volatility_24h'] = (df['high_24h'] - df['low_24h']) / df['current_price']
df['price_to_ath_ratio'] = df['current_price'] / df['ath']
df['price_to_atl_ratio'] = df['current_price'] / df['atl']
df['supply_ratio'] = df['circulating_supply'] / df['total_supply'].replace(0, np.nan)
# Log transformation for skewed features
log_features = ['market_cap', 'total_volume', 'current_price']
for feature in log_features:
df[f'log_{feature}'] = np.log1p(df[feature])
# Rank-based features
df['market_cap_rank_norm'] = df['market_cap_rank'] / df['market_cap_rank'].max()
df['volume_rank'] = df['total_volume'].rank(ascending=False)
df['volume_rank_norm'] = df['volume_rank'] / df['volume_rank'].max()
return df
# Apply feature engineering
crypto_features = create_features(crypto_df)
Ollama Model Setup for Crypto Analysis
Initialize Ollama Client
import ollama
class CryptoAnalyzer:
def __init__(self, model_name="llama2:7b"):
self.model = model_name
self.client = ollama.Client()
def analyze_crypto_description(self, name, symbol):
"""
Use Ollama to analyze cryptocurrency description
"""
prompt = f"""
Analyze the cryptocurrency {name} ({symbol}) and classify it into one of these sectors:
- DeFi (Decentralized Finance)
- Gaming & Metaverse
- Infrastructure & Layer 1
- Privacy
- Payments
- Stablecoinss
- Meme Coins
- NFT & Collectibles
- Exchange Tokens
- Lending & Borrowing
Provide only the sector name and a brief reason (max 50 words).
"""
response = self.client.generate(
model=self.model,
prompt=prompt,
stream=False
)
return response['response']
def get_crypto_sectors(self, crypto_list):
"""
Classify multiple cryptocurrencies
"""
results = []
for name, symbol in crypto_list:
try:
classification = self.analyze_crypto_description(name, symbol)
results.append({
'name': name,
'symbol': symbol,
'classification': classification
})
except Exception as e:
print(f"Error analyzing {name}: {e}")
results.append({
'name': name,
'symbol': symbol,
'classification': "Unknown"
})
return results
# Initialize analyzer
analyzer = CryptoAnalyzer()
Implementing Clustering Analysis
Numerical Clustering
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
def numerical_clustering(df, n_clusters=8):
"""
Perform K-means clustering on numerical features
"""
# Select numerical features
feature_columns = [
'log_market_cap', 'log_total_volume', 'log_current_price',
'volatility_24h', 'price_change_percentage_24h',
'market_cap_rank_norm', 'volume_rank_norm',
'price_to_ath_ratio', 'supply_ratio'
]
# Prepare data
X = df[feature_columns].fillna(0)
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Apply K-means
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
clusters = kmeans.fit_predict(X_scaled)
# Add cluster labels
df['numerical_cluster'] = clusters
# PCA for visualization
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
df['pca_1'] = X_pca[:, 0]
df['pca_2'] = X_pca[:, 1]
return df, kmeans, scaler
# Apply numerical clustering
clustered_df, kmeans_model, scaler = numerical_clustering(crypto_features)
Hybrid Clustering with Ollama
def hybrid_clustering(df, analyzer):
"""
Combine numerical clustering with Ollama semantic analysis
"""
# Get top 50 cryptocurrencies for semantic analysis
top_cryptos = df.head(50)
crypto_list = [(row['name'], row['symbol']) for _, row in top_cryptos.iterrows()]
# Analyze with Ollama
semantic_results = analyzer.get_crypto_sectors(crypto_list)
# Parse results
semantic_df = pd.DataFrame(semantic_results)
# Extract sector from classification text
def extract_sector(text):
sectors = [
'DeFi', 'Gaming', 'Infrastructure', 'Privacy', 'Payments',
'Stablecoins', 'Meme', 'NFT', 'Exchange', 'Lending'
]
for sector in sectors:
if sector.lower() in text.lower():
return sector
return 'Other'
semantic_df['sector'] = semantic_df['classification'].apply(extract_sector)
# Merge with main dataframe
df = df.merge(semantic_df[['name', 'sector']], on='name', how='left')
df['sector'] = df['sector'].fillna('Unclassified')
return df
# Apply hybrid clustering
final_df = hybrid_clustering(clustered_df, analyzer)
Visualization and Analysis
Cluster Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
def visualize_clusters(df):
"""
Create comprehensive cluster visualizations
"""
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
# 1. PCA scatter plot
scatter = axes[0, 0].scatter(
df['pca_1'], df['pca_2'],
c=df['numerical_cluster'],
cmap='viridis',
alpha=0.6
)
axes[0, 0].set_xlabel('PCA Component 1')
axes[0, 0].set_ylabel('PCA Component 2')
axes[0, 0].set_title('Numerical Clusters (PCA)')
plt.colorbar(scatter, ax=axes[0, 0])
# 2. Market Cap vs Volume
axes[0, 1].scatter(
df['log_market_cap'], df['log_total_volume'],
c=df['numerical_cluster'],
cmap='viridis',
alpha=0.6
)
axes[0, 1].set_xlabel('Log Market Cap')
axes[0, 1].set_ylabel('Log Total Volume')
axes[0, 1].set_title('Market Cap vs Volume')
# 3. Sector distribution
sector_counts = df['sector'].value_counts()
axes[1, 0].pie(sector_counts.values, labels=sector_counts.index, autopct='%1.1f%%')
axes[1, 0].set_title('Sector Distribution')
# 4. Cluster characteristics
cluster_stats = df.groupby('numerical_cluster').agg({
'market_cap': 'mean',
'total_volume': 'mean',
'volatility_24h': 'mean'
}).round(2)
sns.heatmap(cluster_stats.T, annot=True, ax=axes[1, 1], cmap='Blues')
axes[1, 1].set_title('Cluster Characteristics')
plt.tight_layout()
plt.show()
# Generate visualizations
visualize_clusters(final_df)
Interactive Dashboard
def create_interactive_dashboard(df):
"""
Create interactive Plotly dashboard
"""
# Market cap vs volume with sectors
fig1 = px.scatter(
df,
x='log_market_cap',
y='log_total_volume',
color='sector',
size='volatility_24h',
hover_data=['name', 'symbol', 'current_price'],
title='Cryptocurrency Market Analysis'
)
# Sector performance
sector_perf = df.groupby('sector').agg({
'price_change_percentage_24h': 'mean',
'market_cap': 'sum'
}).reset_index()
fig2 = px.bar(
sector_perf,
x='sector',
y='price_change_percentage_24h',
title='Average 24h Performance by Sector'
)
# Cluster analysis
fig3 = px.scatter(
df,
x='pca_1',
y='pca_2',
color='numerical_cluster',
hover_data=['name', 'sector'],
title='Numerical Clustering Results'
)
return fig1, fig2, fig3
# Create dashboard
dash_figs = create_interactive_dashboard(final_df)
Interpreting Results
Cluster Analysis
def analyze_clusters(df):
"""
Analyze cluster characteristics and patterns
"""
# Numerical cluster analysis
cluster_summary = df.groupby('numerical_cluster').agg({
'market_cap': ['mean', 'std', 'count'],
'total_volume': ['mean', 'std'],
'volatility_24h': ['mean', 'std'],
'price_change_percentage_24h': ['mean', 'std']
}).round(2)
print("Numerical Cluster Summary:")
print(cluster_summary)
# Sector analysis
sector_summary = df.groupby('sector').agg({
'market_cap': ['mean', 'count'],
'volatility_24h': 'mean',
'price_change_percentage_24h': 'mean'
}).round(2)
print("\nSector Analysis:")
print(sector_summary)
# Cross-tabulation
cross_tab = pd.crosstab(df['numerical_cluster'], df['sector'])
print("\nCluster vs Sector Cross-tabulation:")
print(cross_tab)
return cluster_summary, sector_summary, cross_tab
# Analyze results
cluster_analysis = analyze_clusters(final_df)
Investment Insights
def generate_investment_insights(df):
"""
Generate actionable investment insights
"""
insights = []
# High-growth sectors
sector_growth = df.groupby('sector')['price_change_percentage_24h'].mean().sort_values(ascending=False)
top_sectors = sector_growth.head(3)
insights.append(f"Top performing sectors: {', '.join(top_sectors.index)}")
# Undervalued opportunities
undervalued = df[
(df['price_to_ath_ratio'] < 0.3) &
(df['market_cap_rank'] <= 100)
].sort_values('market_cap_rank')
if len(undervalued) > 0:
insights.append(f"Potential undervalued assets: {', '.join(undervalued['name'].head(5))}")
# High-volume, low-volatility options
stable_high_volume = df[
(df['volatility_24h'] < df['volatility_24h'].quantile(0.3)) &
(df['volume_rank'] <= 50)
]
if len(stable_high_volume) > 0:
insights.append(f"Stable high-volume assets: {', '.join(stable_high_volume['name'].head(3))}")
return insights
# Generate insights
investment_insights = generate_investment_insights(final_df)
for insight in investment_insights:
print(f"💡 {insight}")
Advanced Applications
Real-time Monitoring
import time
from datetime import datetime
class CryptoMonitor:
def __init__(self, analyzer, update_interval=300):
self.analyzer = analyzer
self.update_interval = update_interval # 5 minutes
self.historical_data = []
def monitor_clusters(self):
"""
Monitor cluster changes in real-time
"""
while True:
try:
# Fetch new data
current_data = fetch_crypto_data(100)
current_features = create_features(current_data)
# Apply clustering
clustered_data, _, _ = numerical_clustering(current_features)
# Store timestamp
clustered_data['timestamp'] = datetime.now()
self.historical_data.append(clustered_data)
# Analyze changes
if len(self.historical_data) > 1:
self.detect_cluster_changes()
print(f"Updated at {datetime.now()}")
time.sleep(self.update_interval)
except Exception as e:
print(f"Monitor error: {e}")
time.sleep(60) # Wait 1 minute on error
def detect_cluster_changes(self):
"""
Detect significant cluster movements
"""
current = self.historical_data[-1]
previous = self.historical_data[-2]
# Compare cluster assignments
merged = current.merge(
previous[['name', 'numerical_cluster']],
on='name',
suffixes=('_current', '_previous')
)
# Find assets that changed clusters
changed = merged[merged['numerical_cluster_current'] != merged['numerical_cluster_previous']]
if len(changed) > 0:
print(f"🔄 {len(changed)} assets changed clusters:")
for _, row in changed.iterrows():
print(f" {row['name']}: {row['numerical_cluster_previous']} → {row['numerical_cluster_current']}")
# Initialize monitor (uncomment to run)
# monitor = CryptoMonitor(analyzer)
# monitor.monitor_clusters()
Portfolio Optimization
def optimize_portfolio(df, risk_tolerance='medium'):
"""
Create optimized portfolio based on cluster analysis
"""
# Define risk profiles
risk_profiles = {
'low': {'volatility_max': 0.1, 'min_market_cap': 1e9},
'medium': {'volatility_max': 0.3, 'min_market_cap': 1e8},
'high': {'volatility_max': 1.0, 'min_market_cap': 1e6}
}
profile = risk_profiles[risk_tolerance]
# Filter based on risk profile
filtered_df = df[
(df['volatility_24h'] <= profile['volatility_max']) &
(df['market_cap'] >= profile['min_market_cap'])
]
# Select top performers from each sector
portfolio = []
for sector in filtered_df['sector'].unique():
if sector != 'Unclassified':
sector_assets = filtered_df[filtered_df['sector'] == sector]
top_asset = sector_assets.loc[sector_assets['market_cap'].idxmax()]
portfolio.append(top_asset)
portfolio_df = pd.DataFrame(portfolio)
# Calculate portfolio metrics
total_market_cap = portfolio_df['market_cap'].sum()
portfolio_df['weight'] = portfolio_df['market_cap'] / total_market_cap
expected_return = (portfolio_df['price_change_percentage_24h'] * portfolio_df['weight']).sum()
portfolio_volatility = (portfolio_df['volatility_24h'] * portfolio_df['weight']).sum()
print(f"Portfolio Return (24h): {expected_return:.2f}%")
print(f"Portfolio Volatility: {portfolio_volatility:.2f}")
print(f"Assets: {len(portfolio_df)}")
return portfolio_df[['name', 'symbol', 'sector', 'weight', 'current_price']]
# Create optimized portfolio
portfolio = optimize_portfolio(final_df, 'medium')
print("\nOptimized Portfolio:")
print(portfolio)
Best Practices and Tips
Performance Optimization
# Optimize Ollama performance
def optimize_ollama():
"""
Optimize Ollama for better performance
"""
# Use smaller models for faster processing
fast_models = ['llama2:7b', 'mistral:7b']
# Batch processing
def batch_analysis(crypto_list, batch_size=10):
results = []
for i in range(0, len(crypto_list), batch_size):
batch = crypto_list[i:i+batch_size]
batch_results = analyzer.get_crypto_sectors(batch)
results.extend(batch_results)
return results
# Cache results
import json
def cache_analysis(results, filename='crypto_analysis_cache.json'):
with open(filename, 'w') as f:
json.dump(results, f)
def load_cache(filename='crypto_analysis_cache.json'):
try:
with open(filename, 'r') as f:
return json.load(f)
except FileNotFoundError:
return []
Error Handling
def robust_analysis(df, analyzer, max_retries=3):
"""
Robust analysis with error handling
"""
for attempt in range(max_retries):
try:
# Apply clustering
result = hybrid_clustering(df, analyzer)
return result
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt < max_retries - 1:
time.sleep(5) # Wait before retry
else:
# Fallback to numerical clustering only
print("Falling back to numerical clustering only")
result, _, _ = numerical_clustering(df)
result['sector'] = 'Unclassified'
return result
Troubleshooting Common Issues
Ollama Connection Issues
def test_ollama_connection():
"""
Test Ollama connection and models
"""
try:
client = ollama.Client()
models = client.list()
print(f"Available models: {[m['name'] for m in models['models']]}")
# Test generation
response = client.generate(
model='llama2:7b',
prompt='Test prompt',
stream=False
)
print("Connection successful!")
return True
except Exception as e:
print(f"Connection failed: {e}")
return False
# Test connection
if not test_ollama_connection():
print("Please check Ollama installation and model availability")
Memory Management
def optimize_memory_usage(df, max_rows=1000):
"""
Optimize memory usage for large datasets
"""
if len(df) > max_rows:
# Sample data
df_sample = df.sample(n=max_rows, random_state=42)
print(f"Sampled {max_rows} rows from {len(df)} total")
return df_sample
# Optimize data types
for col in df.select_dtypes(include=['float64']).columns:
df[col] = df[col].astype('float32')
for col in df.select_dtypes(include=['int64']).columns:
df[col] = df[col].astype('int32')
return df
Results and Performance Metrics
Model Validation
from sklearn.metrics import silhouette_score, calinski_harabasz_score
def validate_clustering(df, feature_columns):
"""
Validate clustering performance
"""
X = df[feature_columns].fillna(0)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Test different cluster numbers
cluster_range = range(2, 11)
silhouette_scores = []
ch_scores = []
for n_clusters in cluster_range:
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
labels = kmeans.fit_predict(X_scaled)
sil_score = silhouette_score(X_scaled, labels)
ch_score = calinski_harabasz_score(X_scaled, labels)
silhouette_scores.append(sil_score)
ch_scores.append(ch_score)
# Find optimal number of clusters
optimal_k = cluster_range[np.argmax(silhouette_scores)]
print(f"Optimal number of clusters: {optimal_k}")
print(f"Best silhouette score: {max(silhouette_scores):.3f}")
return optimal_k, silhouette_scores, ch_scores
# Validate clustering
feature_cols = ['log_market_cap', 'log_total_volume', 'volatility_24h', 'price_change_percentage_24h']
optimal_k, sil_scores, ch_scores = validate_clustering(final_df, feature_cols)
Conclusion
Cryptocurrency clustering analysis with Ollama provides a powerful framework for understanding crypto market dynamics. By combining numerical analysis with semantic classification, you can identify investment opportunities, track market trends, and build optimized portfolios.
Key takeaways:
- Ollama enables local, privacy-focused crypto analysis
- Hybrid clustering combines quantitative and qualitative insights
- Real-time monitoring helps capture market changes
- Proper validation ensures reliable results
Next steps:
- Implement automated rebalancing based on cluster changes
- Integrate with trading APIs for live portfolio management
- Expand analysis to include social sentiment data
- Build custom Ollama models for crypto-specific tasks
Start your cryptocurrency clustering analysis today and transform market chaos into actionable intelligence.