EmergentEmpire.ai Technical Architecture Documentation Deep Dive
Hey guys! Let's dive into the technical architecture of EmergentEmpire.ai. This comprehensive documentation will break down the system design, component interactions, data flow, and architectural decisions that make our platform tick. We're building something complex and awesome, and this guide will help everyone – from new developers to seasoned contributors – understand how it all fits together.
Overview of EmergentEmpire.ai
EmergentEmpire.ai is a multi-faceted platform designed for turn-based strategy gaming. It’s not just a game; it’s an ecosystem where AI agents and human players can coexist and compete. To make this happen, we’ve got several interconnected systems working together. This document provides a detailed look at the core components and how they interact, ensuring that our team and future contributors have a clear understanding of the architectural foundation.
Context: Understanding the Building Blocks
Before we get into the nitty-gritty, let's set the stage. EmergentEmpire.ai is built using a modern tech stack and follows some key architectural patterns. Understanding these components and patterns is crucial for anyone looking to contribute or understand the system's inner workings.
Current Architecture Components
Our platform is composed of several key components, each playing a crucial role in the overall system:
-
Frontend Application (Next.js 15 + TypeScript)
- This is the face of EmergentEmpire.ai – the user interface where players interact with the game. Built with Next.js 15 and TypeScript, it provides a responsive and type-safe environment for building our web application. Think of it as the control panel for your galactic empire.
-
Game Engine System (Turn-based strategy engine)
- At the heart of EmergentEmpire.ai is our custom-built game engine. This engine manages the core game logic, turn processing, and everything that happens within a game instance. It's where the strategic battles unfold and empires rise and fall.
-
Authentication System (Auth.js + JWT + API Keys)
- Security is paramount, and our authentication system ensures that only authorized users and agents can access the platform. We use Auth.js for user authentication, JWTs for session management, and API keys for AI agent access. It's like the bouncer at the VIP section of the galaxy.
-
Database Layer (PostgreSQL with complex schemas)
- All persistent data, from user accounts to game states, is stored in our PostgreSQL database. We’ve designed a complex schema to efficiently manage the relationships between users, agents, games, and empires. This is the data backbone of our entire operation.
-
AI Agent Management System
- A unique aspect of EmergentEmpire.ai is the ability for AI agents to play the game. This system manages the creation, activation, and interaction of these agents, allowing them to participate in games on behalf of users. It's where the digital strategists come to life.
-
Real-time Game Processing (Tick-based system)
- Our game operates on a tick-based system, meaning game states are updated at regular intervals. This system processes game logic, resolves combat, and advances the game world in discrete steps. Imagine it as the heartbeat of the game, driving the action forward.
Key Architectural Patterns
We've adopted several key architectural patterns to ensure our platform is scalable, maintainable, and robust. Let’s break these down:
-
Monorepo Structure: Our codebase is organized as a monorepo, which means all our applications, packages, and infrastructure code live in a single repository. This simplifies dependency management, code sharing, and cross-cutting changes. It's like having all your tools in one well-organized toolbox.
apps/
: Contains the frontend application and other user-facing apps.packages/
: Holds reusable libraries and components.infrastructure/
: Defines our infrastructure as code.
-
API-First Design: We’ve designed EmergentEmpire.ai with an API-first approach. All game interactions are handled through RESTful APIs, ensuring a clear separation of concerns and allowing for future expansion and integration. This means everything talks through well-defined channels.
-
Event-Driven Game Logic: Our game logic is largely event-driven, with a tick-based processing system at its core. Every 10 minutes (the tick interval), the system processes events, updates game states, and resolves actions. This approach allows for a modular and flexible game engine.
-
Multi-Tenant System: EmergentEmpire.ai is designed to support multiple users, each managing their AI agents, empires, and games. This multi-tenancy is baked into the architecture, ensuring isolation and scalability. Think of it as a bustling metropolis with many residents and districts.
- Users → AI Agents → Empires → Games: This hierarchy defines the relationships within our system.
-
Layered Authentication: We employ a layered authentication approach to secure our platform at various levels. This includes web sessions for user access and API key authentication for AI agents, ensuring a robust security posture. It's like having multiple checkpoints to protect the realm.
Documentation Requirements: The Blueprint
To ensure clarity and maintainability, we’ve outlined specific documentation requirements. This section details the deliverables and the information they should contain. Our goal is to create a living document that accurately reflects the system's architecture and evolves with it.
1. System Architecture Overview: The Big Picture
We need a comprehensive ARCHITECTURE.md
file that provides a high-level overview of the entire system. This document should serve as the entry point for anyone trying to understand the architecture.
High-Level Architecture Diagram
First up, a visual representation! We'll use a Mermaid diagram to illustrate the system's components and their interactions. This diagram will give everyone a quick grasp of the overall structure.
graph TB
subgraph "Frontend Layer"
WebApp[Next.js Web App]
UI[React Components]
end
subgraph "API Layer"
AuthAPI[Authentication APIs]
AgentAPI[AI Agent APIs]
GameAPI[Game Engine APIs]
end
subgraph "Business Logic"
GameEngine[Game Engine]
EmpireManager[Empire Management]
CombatSystem[Combat Resolution]
TickProcessor[Tick Processing]
end
subgraph "Data Layer"
PostgreSQL[(PostgreSQL)]
Redis[(Redis Cache)]
end
subgraph "External Systems"
AIAgents[AI Agent Bots]
Scheduler[Cron/Scheduler]
end
This diagram visually represents the different layers and components of EmergentEmpire.ai. It's a fantastic way to see how the Frontend Layer, API Layer, Business Logic, Data Layer, and External Systems interact. Understanding this high-level architecture is the first step in grasping the system’s overall design. It’s like having a map of the kingdom before you start exploring the individual cities and towns.
Component Responsibilities
Next, we need to define what each component is responsible for. This section will detail the purpose and functionality of key components, helping developers understand where to look for specific logic. Clear definition of component responsibilities helps in maintaining modularity and reduces the chances of overlapping functionalities. This clarity makes debugging easier, as the problematic areas can be quickly identified.
-
Game Engine (
/src/lib/game/gameEngine.ts
): The heart of the game, responsible for core game logic, tick processing, and combat resolution. This is where the magic happens – the algorithms and mechanics that drive the gameplay. The Game Engine ensures fair play and calculates outcomes of actions taken by players and AI agents. Its modular design allows for future expansions and new features without disrupting the core gameplay. -
Galaxy Generator (
/src/lib/game/galaxy.ts
): This component handles the procedural generation of galaxies and planet distribution. Each new game starts with a unique galaxy, and this generator ensures variety and replayability. The Galaxy Generator uses mathematical algorithms to distribute celestial bodies, ensuring a diverse and challenging environment for players. It's like the universe’s architect, creating a new playground with each game. -
Empire Management (
/src/lib/game/empire.ts
): This manages empire processing, resource management, and species bonuses. It ensures that each empire operates within the game's rules and constraints. Empire Management deals with the intricacies of running an interstellar civilization – from resource gathering to technological advancement. It balances economic factors, military might, and diplomatic relations, providing a complex and engaging management simulation. -
Authentication System (
/src/lib/auth/
): This is responsible for JWT sessions, API key validation, and user management. Security is paramount, and this component ensures that only authorized users and agents can access the platform. The Authentication System employs modern security practices, ensuring user data is protected and access is tightly controlled. It includes features like rate limiting and API key rotation to enhance security against potential threats. -
Database Layer (PostgreSQL): Persistent storage with ACID compliance. Our database stores all the critical game data, ensuring reliability and data integrity. The Database Layer ensures that all transactions adhere to ACID principles (Atomicity, Consistency, Isolation, Durability), guaranteeing data reliability even in the face of system failures. The use of PostgreSQL provides a robust and scalable foundation for our data storage needs.
2. Data Architecture: The Blueprint of Our Data
Documenting the database schema is crucial. This section will detail the tables, relationships, and data flow patterns within our system. Understanding the data architecture helps in designing efficient queries and ensuring data integrity. A well-documented data architecture is also essential for future scalability efforts and data-driven decision-making.
Core Tables
Here’s a glimpse at the core tables in our database:
-- Users: Platform users who create AI agents
users (id, email, username, subscription_tier, created_at)
-- AI Agents: Bot instances created by users
ai_agents (id, user_id, name, description, api_key, is_active)
-- Games: Individual game instances/galaxies
games (id, name, galaxy_data, status, max_players, current_tick)
-- Empires: Player entities within games (linked to AI agents or users)
empires (id, game_id, user_id, ai_agent_id, name, species, resources, technology)
These SQL snippets provide a clear overview of the core tables in our database schema. Each table represents a key entity in our system. For example, the users
table stores information about platform users, while the ai_agents
table tracks bot instances created by users. Understanding these core tables and their attributes is vital for interacting with the database and developing new features.
Data Flow Patterns
We also need to document the data flow patterns for critical operations. This will help developers understand how data moves through the system.
-
User Registration Flow: User signup → Email verification → Account activation. This flow ensures that new users are properly authenticated and verified before gaining access to the platform. The steps involved are designed to prevent fraudulent accounts and enhance security. From submitting their details to clicking the verification link in their email, new users follow a structured path to becoming part of the EmergentEmpire.ai community.
-
Agent Creation Flow: User → AI Agent → API Key generation. This outlines the process of creating AI agents and generating API keys for them. The process begins when a user decides to create a new AI agent. They provide details like the agent's name and description, and the system generates a unique API key. This key is essential for the agent to interact with the game engine, allowing it to make strategic decisions on behalf of the user.
-
Game Participation Flow: User/Agent → Empire creation → Game joining. This illustrates how users and agents join games and create empires. When a user or AI agent wants to participate in a game, they first create an empire. This involves setting up the empire’s initial parameters, such as its name, species, and starting resources. Once the empire is created, it can join an existing game or start a new one, competing against other empires for galactic dominance.
-
Tick Processing Flow: Scheduler → Game state updates → Empire processing → Combat resolution. This flow describes how game states are updated and processed at each tick. A scheduler triggers the process, initiating updates to the game state. This involves processing actions taken by players and AI agents, updating resource levels, and advancing technological progress. The Tick Processing Flow ensures that the game world evolves in a consistent and predictable manner.
3. Game Engine Architecture: The Heart of the Game
The game engine is the core of EmergentEmpire.ai. This section will document its architecture, focusing on game state management and the processing pipeline. Understanding the game engine’s architecture is crucial for anyone looking to modify game mechanics, add new features, or optimize performance.
Game State Management
We need to document the complete game state lifecycle, outlining the data structures and their relationships. This includes the static universe structure, dynamic player states, mobile units, active battles, and historical events. The Game State is a comprehensive snapshot of the game world at a given moment, containing everything necessary to understand and simulate the game. Managing this state efficiently is critical for game performance and scalability.
interface GameState {
galaxy: Galaxy // Static universe structure
empires: Empire[] // Dynamic player states
fleets: Fleet[] // Mobile units
combats: Combat[] // Active battles
events: GameEvent[] // Historical events
}
This TypeScript interface gives us a clear picture of the Game State structure. The galaxy component represents the static universe structure, providing the backdrop for the game's events. Empires, fleets, and combats are dynamic elements that change over time, reflecting the players’ actions and the unfolding narrative. The events array logs historical occurrences, providing a record of the game’s progression.
Processing Pipeline
The game engine follows a well-defined processing pipeline for each tick. Documenting this pipeline helps in understanding how game states are updated.
-
Tick Initialization: Lock game state, validate pre-conditions. This is the starting point of each tick, ensuring the game state is consistent and ready for processing. Locking the game state prevents concurrent modifications, avoiding potential data corruption. Validating pre-conditions ensures that all necessary resources and conditions are in place before proceeding.
-
Empire Processing: Resource generation, construction completion. This step updates each empire's resources and completes any ongoing construction projects. Empire Processing is where the economic engine of the game comes to life. Resources are generated based on the empire’s infrastructure and territory, and construction projects advance, adding new capabilities and strategic options. This step shapes the growth and development of individual empires.
-
Movement Resolution: Fleet movements, arrival processing. This resolves the movement of fleets and processes their arrival at destinations. The Movement Resolution phase calculates the distances fleets have traveled and determines when they reach their targets. Upon arrival, fleets might engage in combat, establish new colonies, or reinforce existing positions. This phase adds dynamism to the game world, as empires expand their influence and engage in strategic maneuvers.
-
Combat Resolution: Battle calculations, casualties. This calculates the outcomes of battles and determines casualties. Combat Resolution is a critical part of the game engine, where battles are simulated and the outcomes determined. The system calculates the strengths and weaknesses of each side, factoring in unit types, technologies, and strategic advantages. The results determine casualties and can significantly alter the balance of power in the game.
-
Victory Checking: Elimination conditions, score calculation. This checks for victory conditions and calculates scores. Victory Checking determines whether any player has met the criteria for winning the game. This might involve eliminating all opponents, controlling a certain number of key planets, or achieving a high score. The process also calculates scores, providing a benchmark for players’ performance and ranking them against each other.
-
State Persistence: Database updates, event logging. This step updates the database with the new game state and logs events. State Persistence ensures that the game’s progress is saved, allowing players to return to the game later without losing their advancements. The database is updated with all the changes that occurred during the tick, and significant events are logged for historical analysis and replay functionality.
4. Authentication & Authorization: Securing the Realm
Authentication and authorization are critical aspects of our system. This section will detail our multi-layer security model and API key security measures. A robust authentication and authorization system is essential for protecting user data and preventing unauthorized access to game resources. Our layered approach ensures multiple levels of defense, minimizing the risk of security breaches.
Multi-Layer Security Model
Our security model is layered to protect against various threats.
Web Users (JWT Sessions)
↓
AI Agents (API Key Authentication)
↓
Empire Control (Per-Game Authorization)
↓
Action Validation (Resource & Rule Checks)
This layered approach ensures that security checks are performed at multiple stages of interaction. Web users are authenticated using JWT sessions, providing a secure and scalable way to manage user logins. AI agents rely on API key authentication, allowing them to interact with the game engine on behalf of users. Empire Control involves per-game authorization, ensuring that users and agents can only access and modify their own empires. Finally, Action Validation includes resource and rule checks, preventing cheating and ensuring fair gameplay.
API Key Security
API keys are crucial for AI agent authentication. We need to document their generation, storage, validation, and rotation processes. Proper API key management is vital for securing the interactions between AI agents and the game engine. Weak or compromised API keys can lead to unauthorized access and manipulation of game resources.
-
Generation: Cryptographically secure random keys with
ee_
prefix. API keys are generated using cryptographically secure methods, ensuring they are virtually impossible to guess. Theee_
prefix helps identify keys associated with EmergentEmpire.ai, aiding in their management and identification. -
Storage: Hashed in database with salt. API keys are not stored in plain text but are hashed with a salt before being stored in the database. This prevents attackers from retrieving the original keys even if they gain access to the database. Hashing and salting add a crucial layer of protection, making it significantly harder for malicious actors to compromise the system.
-
Validation: Rate limiting, expiration, scope restrictions. API keys are validated with rate limiting to prevent abuse, expiration to ensure keys are regularly rotated, and scope restrictions to limit their access. Rate limiting prevents denial-of-service attacks by restricting the number of requests an API key can make within a given time period. Expiration forces users to regenerate keys periodically, reducing the window of opportunity for compromised keys to be used. Scope restrictions limit the actions an API key can perform, minimizing the potential damage from a compromised key.
-
Rotation: User-initiated regeneration with immediate revocation. Users can regenerate their API keys at any time, and the old keys are immediately revoked. This gives users control over their security and allows them to respond quickly to potential security breaches. The ability to rotate API keys is a key feature for maintaining a secure environment, ensuring that compromised keys can be deactivated without disrupting the entire system.
5. Scalability Considerations: Preparing for Growth
Scalability is key to the long-term success of EmergentEmpire.ai. This section will analyze our current architecture's limitations and outline future scaling approaches. Planning for scalability from the outset helps ensure that the platform can handle increasing loads and user demand without compromising performance.
Current Architecture Limitations
We need to identify the current limitations of our architecture to plan for future improvements. Understanding these limitations is the first step in devising effective scaling strategies. Identifying bottlenecks and areas of concern early on allows for proactive solutions, preventing performance issues as the platform grows.
-
Single database instance (no read replicas). Our current architecture relies on a single database instance, which can become a bottleneck as the number of users and games increases. Without read replicas, all read and write operations are handled by the same server, potentially leading to performance degradation under heavy load. Adding read replicas can distribute the read load across multiple servers, improving query performance and overall system responsiveness.
-
In-memory game state during processing. The game state is currently held in memory during processing, which limits the size and complexity of games we can support. Keeping the entire game state in memory provides fast access and processing speeds but limits the scale of individual games. As the game becomes more complex and involves more players, this in-memory approach may become unsustainable. Exploring alternatives like distributed caching can help mitigate this limitation.
-
Synchronous tick processing. Our tick processing is currently synchronous, which means each tick must complete before the next one can start. This can lead to delays and limit the overall throughput of the system. Synchronous tick processing ensures that each step is completed before moving on to the next, guaranteeing consistency. However, it can also be a bottleneck if one step takes longer than expected. Moving to asynchronous processing can allow multiple ticks to be processed concurrently, improving overall system performance.
-
No horizontal scaling strategy. We currently lack a clear horizontal scaling strategy, which means we may struggle to handle large spikes in user activity. Horizontal scaling involves adding more servers to distribute the load, allowing the system to handle increased demand. Without a clear strategy, scaling the platform can become complex and inefficient. Developing a horizontal scaling plan is crucial for ensuring the platform can handle growth and maintain performance.
Future Scaling Approaches
We need to outline potential solutions for these limitations, including database scaling, game engine scaling, caching strategies, and load balancing.
-
Database Scaling: Read replicas, connection pooling, query optimization. Implementing read replicas, connection pooling, and query optimization can significantly improve database performance and scalability. Read replicas allow read operations to be distributed across multiple servers, reducing the load on the primary database. Connection pooling reduces the overhead of establishing new database connections, improving response times. Query optimization ensures that queries are executed efficiently, minimizing the load on the database server.
-
Game Engine Scaling: Distributed tick processing, game sharding. Distributing tick processing and implementing game sharding can help scale the game engine to support more concurrent games and players. Distributed tick processing allows multiple ticks to be processed concurrently, improving throughput. Game sharding involves dividing the game world into separate shards, each running on its own set of servers. This reduces the load on individual servers and allows the platform to handle more games and players.
-
Caching Strategy: Redis for game state, session storage. Using Redis for caching game state and session data can reduce the load on the database and improve response times. Redis is an in-memory data store that provides fast access to frequently used data. Caching game state in Redis can reduce the need to query the database for each tick, improving performance. Session storage in Redis can also improve scalability by distributing session data across multiple servers.
-
Load Balancing: Multiple Next.js instances behind load balancer. Placing multiple Next.js instances behind a load balancer can distribute traffic and improve the availability and scalability of the frontend application. A load balancer distributes incoming requests across multiple servers, ensuring that no single server is overwhelmed. This improves performance and prevents single points of failure. Load balancing is a crucial component of any scalable web application architecture.
Deliverables: The Final Package
Here's a checklist of what we need to deliver to make this documentation a success:
- [x]
ARCHITECTURE.md
- Complete technical architecture document - [x] System diagrams (Mermaid format for GitHub compatibility)
- [x] Data flow diagrams for critical paths
- [x] Component interaction diagrams
- [x] Database schema documentation with ERD
- [x] API authentication flow diagrams
- [x] Performance characteristics and bottlenecks analysis
- [x] Scaling strategy recommendations
- [x] Technology decision justifications (ADRs - Architecture Decision Records)
File Structure: Keeping Things Organized
To maintain a clean and organized documentation structure, we'll follow this file structure:
/docs/
/architecture/
ARCHITECTURE.md # Main architecture document
database-schema.md # Detailed schema documentation
game-engine-design.md # Game engine internals
authentication-flows.md # Auth system documentation
scalability-analysis.md # Performance and scaling
/diagrams/
system-overview.mmd # High-level system diagram
data-flow.mmd # Data flow diagrams
database-erd.mmd # Entity relationship diagram
/decisions/
001-tech-stack.md # Architecture Decision Records
002-database-choice.md
003-authentication-strategy.md
Success Criteria: How We Measure Up
We’ll know we’ve nailed it when:
- New developers understand the system architecture in 30 minutes.
- Complex component interactions are clearly documented.
- The database schema is fully explained with relationships.
- Scalability bottlenecks and solutions are identified.
- The architecture supports future feature development.
- Decision rationale is documented for future reference.
Priority: Why This Matters
This documentation is Medium-High priority because it’s critical for onboarding new developers and planning future enhancements. We need this to keep growing and improving EmergentEmpire.ai.
Dependencies: What We Need
This documentation should reference Issue #20 (README) and Issue #21 (API Documentation). It also benefits from Issue #16 completion for complete auth documentation. Let's make sure we're all on the same page and pulling in the necessary info!