Talk to Sales

Benchmarks

View scores and output across OCR models spanning many document categories.

Want to run these evals on your own documents?

Talk to Sales
Page 1

Database Glossary

A

Access control list (ACL)

An access control list, often shortened to ACL, is a security policy list that dictates which actions each user or process can perform on which resources. There are many different types of ACLs, but they each describe the permissions and access patterns that are allowed by a system.

Anti-caching

Anti-caching is a strategy that can be used when data is not found in the faster in-memory cache and must be retrieved from slower, persistent storage. The technique involves aborting the transaction and kicking off an asynchronous operation to fetch the data from the slower medium to memory. The transaction can be retried later and the information will be ready to served from memory.

Authentication

Authentication is an action that validates an identity. In computing and databases, authentication is mainly used as a way to prove that the person or process requesting access has the credentials to validate that they can operate with a specific identity. In practical terms, this might include providing an identity (like a username) and associated authentication material (such as a password, a certificate or key file, or a secret generated by a hardware device belonging to the person associated with the identity). Authentication is used in conjunction with authorization to determine if a user has permission to perform actions on a system.

B

Blue-green deployments

Blue-green deployments are a technique for deploying software updates with little to no downtime by managing active traffic between two identical sets of infrastructure. New releases can be deployed to the inactive infrastructure group and tested independently. To release the new version, a traffic routing mechanism is switched to direct traffic from the current infrastructure to the infrastructure with the new version. The previously-active infrastructure now functions as the target for the next updates. This strategy is helpful in that the routing mechanism can easily switch back and forth to roll backwards or forwards depending on the success of a deployment.

C

Cache invalidation

Cache invalidation is the process of targeting and removing specific items from a cache. Most often, this is performed as part of a routine when updating records so that the data in the cache does not serve stale data to clients.

Collections

In document databases, collections are containers that are used to store groups of documents together. The collections may have semantic meaning assigned by the application and database designers, but otherwise are simply a way to partition different sets of documents from one another in the system. Different collections can be assigned different properties and actions can be performed targeting specific collections of documents.

Command query responsibility segregation

Command query responsibility segregation is an application design pattern that allows you to separate operations based on their impact on the underlying database. In general, this usually means providing different mechanisms for queries that read data versus queries that change data. Separating these two contexts allows you to make infrastructure and system changes to scale each use-case independently, increasing performance.

Consistency

Consistency is a property of data systems that means that the individual data entities do not conflict and continue to model the information they intend to even as changes are introduced. Each piece of data and change must be validated to ensure that it conforms to the rules imposed on the data structures and care must be taken to balance out any changes that should impact other data (like debiting and crediting different accounts at the same time).

D

Database abstraction layer

A database abstraction layer is a programming interface that attempts to abstract differences between underlying database technologies to provide a unified experience or interface to the application layer. Database abstraction layers are often helpful for developers because they help to normalize the implementation differences between various offerings and can stay stable even as the underlying technology evolves. However, there are some challenges as well, such as leaking abstractions, masking implementation-specific features or optimizations from the user, and creating a dependency that can be difficult to dislodge.

D

Database engine

A database engine is the piece of a database management system responsible for defining how data is stored and retrieved, as well as the actions supported for interacting with the system and data. Some database management systems support multiple database engines that offer different features and designs, while other systems only support a single database engine that has been designed to align with the goals of the software.

Dataset

A dataset, sometimes spelled data set, is a single collection of data. Typically, this represents a chunk of related data applicable to a certain task, application, or area of concern. Typically, datasets are a combination of the data itself as well as the structure and context necessary to interpret it. They often consist of a combination of quantitative and qualitative values that can act as the raw data for further analysis and interpretation.

E

Eventual consistency

Eventual consistency is a description of a consistency / availability strategy implemented by certain distributed computing or database systems. The CAP theorem of distributed systems states that systems must choose whether prioritize availability or data consistency in the face of a network partition. Eventual consistent systems make the choice to favor availability by continuing to serve requests even if the server's peers are not available to confirm operations. Eventually, when the partition is resolved, a consistency routine will run to decide on the most correct state of any inconsistent data, but there will be a time where the data on different servers are not in agreement.

Extract-transform-load (ETL)

Extract, transform, and load, often abbreviated as ETL, is a process of copying and processing data from a data source to a managed system. First the data is extracted from its current system to make it accessible to the destination system. Next, the data is manipulated and modified to match the requirements and format of the new system. Finally, the reconstructed data is loaded into the new system.

F

Feature flags

A feature flag, or a feature toggle, is a programming strategy that involves gating functionality behind an external switch or control. The switch is typically first set to indicate that the feature should not be active. When the organization is ready, they can activate the switch and the program will start using its new functionality. This allows new features to be deployed without immediately activating them. It decouples the deployment of new software from the release of the software, offering greater control over how a change is introduced and for greater testing in a production environment.

Foreign key

A foreign key is a designated column or group of columns in a relational database that is used to maintain data integrity between two tables. A foreign key in one table refers to a candidate key, typically the primary key, in another table. Since a candidate key is referenced, each row in the database will be unique and the two tables can be linked together row for row. The values are of these designated columns is expected to remain identical across the two tables. The foreign key constraint allows the database system to enforce this requirement by not allowing the values to be out of sync.

Full-text search

Full-text search describes a family of techniques and functionality that allow you to search the complete text of documents within a database system. This is in direct opposition to search functionality that relies only on metadata, partial text sources, and other incomplete assessments. Full-text search relies on asynchronous indexing using natural language-aware parsers to analyze and categorize text within documents.

I

Index

A database index is a structure that is created to allow for faster record finding within a table. An index allows the database system to look up data efficiently by keeping a separate structure for the values of specific columns. Queries that target the indexed columns can identify applicable rows in the table quickly by using a more efficient lookup strategy than checking each row line by line. Indexed columns improve read operations but do add overhead to write operations since both the table and the index must be updated. It is important to balance these two considerations when designing table indexes.

L

Lexeme

Lexemes are language-level units of meaning that are relevant in natural language processing and full-text search contexts. Typically, when text is indexed, it is broken down into individual tokens which are then analyzed as lexemes using language-level resources like dictionaries, thesauruses, and other word lists to understand how to process them further.

L

Locale

In databases and computing in general, a locale specifies the region, language, country, and other pieces of contextual data that should be used when performing operations and rendering results. In databases, locale settings can affect things like column orderings, comparisons between values, spelling, currency identifiers, date and time formatting, and more. Defining the correct locale at the database server level or requesting the locale you need during a database session are essential for ensuring that the operations are performed will yield the expected results.

M

Microservice architecture

The microservices architecture is an application and service design that affects the development, deployment, and operation of the components. The microservices approach decomposes an application's functionality and implements each responsibility as a discrete service. Rather than internal function calls, the service communicates over the network using clearly defined interfaces. Microservices are often used to help speed up development as each component can be coded and iterated on independently. It also helps with scalability as each service can be scaled as needed, often with the help of service orchestration software.

Migration (database, schema)

Database or schema migrations are processes used to transform a database structure to a new design. This involves operations to modify the existing schema of a database or table as well as transforming any existing data to fit the new structure. Database migrations are often built upon one another and stored as an ordered list in version control so that the current database structure can be built from any previous version by sequentially applying the migration files. Often, developers must make decisions about how best to modify existing data to fit the new structure which might include columns that did not previously exist or changes to data that are difficult to easily reverse.

N

Neo4j

Neo4j is a high performance graph-oriented database system. It offers ACID-compliant transactions with a graph data structure and uses the Cypher querying language to manage and query stored data. Neo4j allows developers to scale graph-oriented data workloads easily and offers clients in many different languages.

NewSQL

NewSQL is a descriptor for a category of more recent relational database offerings that attempt to bridge the gap between the structure and well-ordered guarantees of a relational database system and the high performance and scalability associated with NoSQL databases. While NewSQL is a fairly loose categorization, it is generally used to refer to databases that allow SQL or SQL-like querying, transaction guarantees, and flexible scaling and distributed processing.

NoSQL

NoSQL databases, also sometimes called non-relational or not only SQL databases, are a broad category that covers any type of database systems that deviates from the common relational database model. While non-relational databases have long been available, the category generally is used to refer to newer generations of databases using alternative models like key-value, document-oriented, graph-oriented, and column family stores. They generally are used to manage data that is not suited for the relational model with a heavy focus on flexibility and scalability.

O

Optimistic concurrency control

Optimistic concurrency control, sometimes referred to as OCC, is a strategy used by database systems to handle conflicting concurrent operations. Optimistic concurrency control assumes that concurrent transactions will likely not interfere with each other and allows them to proceed. If a conflict occurs when a transaction attempts to commit, it will be rolled back at that time. OCC is an attractive policy if you think that most transactions within your workloads will not be in conflict with one another. Only transactions that do in fact have a conflict will suffer a performance penalty (they'll be rolled back and will have to be restarted) while all non-conflicting transactions can execute without waiting to see if a conflict will arise.

Outer join

An outer join is a type of relational database operation that joins two tables by returning all rows from each component table, even where there is not a matching record in the companion table. Join operations construct virtual rows by matching records that have identical values in specified comparison columns from each table.

The results for an outer join will contain the rows from both tables where the column values matched and will additionally contain all of the unmatched rows from each table. For these rows, the columns without a match in the other table will be padded with 'NULL' values to indicate that no matching row was found.

P

Primary key

A primary key is a type of database key that is designated as the main way to uniquely address a database row. While other keys may be able to pull individual rows, the primary key is specifically marked for this purpose with the system enforcing uniqueness and not 'NULL' consistency checks. A primary key can be a natural key (a key that is naturally unique across records) or a surrogate key (a key added specifically to serve as a primary key) and can be formed from a single or multiple columns.

Q

Query

In databases, a query is a formatted command used to make a request to a database management system using a query language. The database system processes the query to understand what actions to take and what data to return to the client. Often, queries are used to request data matching specific patterns, insert new data into an existing structure, or modify and save changes to existing records. In addition to targeting data items, queries can often manipulate items like table structures and the server settings, making them the general administrative interface for the system. SQL, or Structured Query Language, is the most common database querying language used with relational databases.

Query builder

A query builder is a database abstraction used in application development to make programming against databases easier. Similar to an ORM, a query builder provides an interface for working with a database system from within the application. However, instead of attempting to map application objects to database records directly, query builders focus on providing native functions and methods that translate closely to the database operations. This allows you to build queries programmatically in a safer and more flexible way than working with SQL (or other database language) strings directly.

R

Read operation

A read operation is generally defined as any operation that retrieves data without modification. Read operations should generally behave as if the underlying data were immutable. They may modify the retrieved data to change its format, filter it, or make other modifications, but the underlying data stored in the database system is not changed.

Read-through caching

Read-through caching is a caching strategy where the cache is deployed in the path to the backing data source. The application sends all read queries directly to the cache. If the cache contains the requested item, it is returned immediately. If the cache request misses, the cache fetches the data from the backing database in order to return the items to the client and add it to the cache for future queries. In this architecture, the application continues to send all write queries directly to the backing database.

Redis

Redis is a popular high performance key-value store that is frequently deployed as a cache, message queue, or configuration store. Redis is primarily an in-memory database but can optionally persist data to nonvolatile storage. It features a wide variety of types, flexible deployment options, and high scalability.

Relational database

A relational database is a database model that organizes data items according to predefined data structures known as tables. A table defines various columns with specific constraints and types and each record is added as a row in the table. The use of highly regular data structures provides relational database systems with many ways to combine the data held within various tables to answer individual queries. Relational databases take their name from algebraic relations which describes different operations that can be used to manipulate regular data. In most cases, relational databases use the SQL (Structured Query Language) to interact with the database system as it allows users to express complex queries in an ad-hoc manner.

Right join

A right join is a join operation for relational databases where all of the rows of the second table specified are returned, regardless of whether a matching row in the first table is found. Join operations construct virtual rows by matching records that have identical values in specified comparison columns from each table. The results for a right join will contain the rows from both tables where the column values matched and will additionally contain all of the unmatched rows from the second, or right, table. For these rows, the columns associated with the first, or left, table will be padded with 'NULL' values to indicate that no matching row was found.

Role-based access control (RBAC)

Role-based access control, also known as RBAC, is a security strategy that restricts the operations permitted to a user based on their assigned roles. Permissions on object and privileges to execute actions are assigned to roles, labels that make managing access easier. To grant the capabilities associated with a role to a user, the user can be made a member of the role. Users can be made a member of multiple roles to gain a union of the permissions each role provides. Roles are helpful as a way of standardizing the privileges required for various roles and making it simple to add or remove access to users.

S

Scaling

Scaling is the process of expanding the resources allocated to your application or workload to allow for better performance or to handle more concurrent activity. Scaling strategies generally fall into two categories: scaling out (also called horizontal scaling) and scaling up (also known as vertical scaling). Horizontal scaling involves adding additional workers to a pool that can handle the incoming work. This often means adding additional servers that can all perform the same operations, thus distributing the load. Scaling up involves adding additional resources like processors, RAM, or storage to the server already handling requests. Scaling allows you to handle more concurrent operations but it can potentially increase the complexity of your application architecture.

S

Shard

A database shard is a segment of records stored by a database object that is separated out and managed by a different database node for performance reasons. For example, a database table with 9 million records could be divided into three separate shards, each managing 3 million records. The data is typically divided according to a "shard key" which is a key that determines which shard a record should be managed by. Each shard manages its subset of records and a coordinating component is required to direct client queries to the appropriate shard by referring to the shard key. Sharding can help some types of performance in very large datasets but it often requires making trade-offs that might degrade other types of performance (for instance, on operations that need to coordinate between multiple shards).

SQL

SQL, or Structured Query Language, is the most common database querying language in use today. It is primarily used to work with relational data and allows users to create queries to select, filter, define, and manipulate the data within relational databases. While SQL is a common standard, implementation details differ widely, making it less software agnostic than hoped.

SQLite

SQLite is a relational management database system written as a C language library. Since it is implemented as a library, it does not conform to the traditional client / server separation model and instead relies on the library or client program to perform both roles to write to local files. It is extremely functional for its size and is especially suitable for embedded use. SQLite has bindings in many different languages and it is deployed widely in applications as an internal storage system.

Stale data

When working with data storage, stale data refers to any data that does not accurately reflect the most recent state of the data. This is often a concern primarily when caching, as pieces of data might potentially be preserved and used long after they been invalidated by changes.

Stop words

In full-text search contexts, stop words are a list of words that are considered inapplicable to search queries. These are typically the most common words in a language that lack much meaning on their own or are ambiguous to the point of irrelevancy. Some examples in English are words like "the", "it", and "a".

Storage engine

A storage engine is the underlying component in database management systems that is responsible for inserting, removing, querying, and updating data within the database. Many database features, like the ability to execute transactions, are actually properties of the underlying storage engine. Some database systems, like MySQL, have many different storage engines that can be used according to the requirements of your use case. Other systems, like PostgreSQL, focus on providing a single storage engine that is useful in all typical scenarios.

U

Upsert

An upsert is a database operation that either updates an existing entry or inserts a new entry when no current entry is found. Upsert operations consists of a querying component that is used to search for matching records to update and a mutation component that specifies which values should be updated. Often, additional values need to be provided for other fields to handle the case where a new record must be created.

W

Wide-column store

A wide-column store is a type of NoSQL database that organizes its data into rows and columns using standard and super column families. A row key is used to retrieve all of the associated columns and super columns. Each row can contain entirely different columns as the column definitions and values are stored within the row structure itself.

Write-through caching

Write-through caching is a caching pattern where the application writes changes directly to the cache instead of the backing database. The cache then immediately forwards the new data to the backing database for persistence. This strategy minimizes the risk of data loss in the event of a cache crash while ensuring that read operations have access to all new data. In high write scenarios, it may make sense to transition to write-back caching to prevent straining the backing database.

This glossary is provided as a reference guide.
Terms may vary by industry and context.

Page 1 of 5