Database Management

Ridgeback relies on a MySQL-compatible database to store network event data, configuration settings, policy triggers, user accounts, and more. Proper database management ensures efficient storage, quick access to data, and reliable long-term operation. This chapter covers how to select a compatible database, the pros and cons of containerized versus standalone deployments, how the database integrates into Ridgeback’s start order, and an overview of the databases and schemas that Ridgeback uses internally.

Selecting a Compatible Database

Compatibility and Requirements:
Ridgeback supports MySQL-compatible databases, which include:

MariaDB: A common default choice. MariaDB is a drop-in replacement for MySQL with robust community support.
MySQL Community Edition or Enterprise: The original MySQL distribution.
Cloud-Hosted MySQL Services:
- Amazon RDS for MySQL
- Azure Database for MySQL
- Google Cloud SQL for MySQL
MySQL-Compatible Engines: Any database that can speak the MySQL protocol and follow similar schemas.

Considerations:

Performance: For high volumes of network events, choose a database known for good performance and scalability.
Backup and Recovery: Ensure easy backup and restore procedures, especially if compliance requires data retention.
Cost and Licensing: Some enterprise MySQL editions or certain managed cloud services involve licensing fees.
Integration with Existing Infrastructure: Use a database type your IT team is familiar with, potentially aligning with existing backup scripts, monitoring tools, and expertise.

Recommended Default: For most on-premises deployments, MariaDB or MySQL Community Edition works seamlessly and is straightforward to set up.

Database in a Container (Local) vs. a Standalone Database

Database in a Container:

Pros:
- Easy Setup: Ridgeback often provides a docker-compose configuration that can spin up a MariaDB instance quickly.
- Portability: Everything can run on a single machine for small deployments or demos.
- Simplified Maintenance: No separate provisioning of database servers; one command to bring it all up.
Cons:
- Performance Limitations: Containers share resources with Ridgeback services, potentially impacting performance under heavy load.
- Limited Long-Term Storage: Container-based databases often rely on Docker volumes; if not carefully managed, you risk data loss when removing containers.
- Scaling Challenges: Harder to scale to large, multi-terabyte datasets.

Standalone Database:

Pros:
- Better Performance and Scalability: Dedicated database servers with optimized hardware or cloud-managed solutions.
- Robust Backups and DR: Easier to integrate with enterprise backup solutions, snapshotting, and replication tools.
- Clear Separation of Concerns: The database is managed independently, making upgrades, patches, and scaling more flexible.
Cons:
- Additional Complexity: Requires separate provisioning, configuration, and monitoring.
- Potential Additional Costs: A separate VM, cloud instance, or hardware might be needed.

Recommendation:

For testing or small pilots: A containerized local database may suffice.
For production or enterprise environments: A standalone MySQL/MariaDB instance or a managed cloud database is strongly recommended for reliability, scalability, and compliance.

Start Order and the Database

Ridgeback’s containers depend on the database being available and reachable before certain services start correctly. For example, the server, policy, or analytics containers may attempt database connections early in their startup process.

Key Points:

Bring Up the Database First: If using a local or containerized database, run docker compose up -d db before other services.
Wait for DB Readiness: Some orchestration tools or health checks ensure that the database is ready (i.e., listening on the proper port and accepting connections) before the Ridgeback services attempt to connect.
Failed Connections: If the database is not ready, Ridgeback services may fail to start or log connection errors. Restarting those containers after the DB is confirmed ready usually resolves the issue.

Tip: Use depends_on in docker-compose.yml files or write a small script that checks the database’s readiness before starting Ridgeback services.

The Databases Used by Ridgeback

Ridgeback logically separates data into multiple databases (or schemas) within the same MySQL-compatible server to organize data by function and security domain.

Common database names might include:

CustomerDb: Holds core data related to endpoints, users, organizations, and policy configuration.
AuthenticationDb: Dedicated for user authentication details, salted hashes, MFA tokens, and account recovery entries.
EventDb (e.g., NetEvent tables): Stores large volumes of network event metadata. This is the heart of Ridgeback’s forensic and analytical capabilities.
PolicyDb: Stores policy definitions, triggers, and related metadata.
AnalyticsDb: May store aggregated metrics, reports, or computed insights.

Note: The exact naming conventions and which schemas are used may depend on the Ridgeback version. Consult the Ridgeback release notes or documentation for the most accurate and current database naming conventions.

The Database Schemas Used by Ridgeback

Ridgeback’s schemas (or databases) contain multiple tables that fulfill specific roles:

CustomerDb (Example):
- User Table: Basic user profiles (email, permissions).
- Permissions Table: Detailed ACLs and roles.
- Organization Table: Multi-tenant environments may store org-level data here.
AuthenticationDb:
- Auth Table: Stores user authentication credentials (hashed passwords, last login, failed attempts).
- Recovery Table: Password reset tokens, expiration times.
Data_XYZ (e.g., Data_00000000_0000_0000_0000_000000000000.NetEvent):
- NetEvent Table: The main event log of observed network activity.
- Endpoint or Device Table: Endpoint metadata, MAC/IP associations.
- DnsEvent, DhcpEvent, or ArpEvent Tables: If split by event type, these contain specific subsets of network events.
PolicyDb:
- Policy Table: Policy definitions and triggers.
- Action Table: Actions associated with policies, like sending an email alert.
AnalyticsDb (Optional or Combined):
- Aggregations: Precomputed summaries of events for quicker reporting.
- Metrics: Key performance indicators, risk indices, and summarized counts of recon attempts, active threats, etc.

Relationships and Indexes:

Foreign keys may link user accounts in CustomerDb to events or actions in Data_XYZ schemas.
Proper indexing is crucial for performance; Ridgeback’s schemas are typically optimized to handle large volumes of NetEvents and quick lookups by time, IP, or MAC address.

Customization or Direct Access:

Direct SQL queries can extract specific insights. For example:

SELECT src_ip, dst_ip, time
FROM Data_00000000_0000_0000_0000_000000000000.NetEvent
WHERE time >= NOW() - INTERVAL 1 HOUR
  AND dst_ip IS NULL;

This query might reveal endpoints probing unused IP addresses in the last hour.

Warning: Avoid schema alterations without consulting Ridgeback support. Changing table structures, indexes, or datatypes may break application logic or future upgrades.