background preloader

ACID

ACID
Set of properties of database transactions According to Gray and Reuter, the IBM Information Management System supported ACID transactions as early as 1973 (although the acronym was created later).[3] Characteristics[edit] The characteristics of these four properties as defined by Reuter and Härder are as follows: Atomicity[edit] An example of an atomic transaction is a monetary transfer from bank account A to account B. Consistency (Correctness)[edit] Isolation[edit] Durability[edit] Examples[edit] The following examples further illustrate the ACID properties. CREATE TABLE acidtest (A INTEGER, B INTEGER, CHECK (A + B = 100)); Atomicity[edit] Atomicity is the guarantee that series of database operations in an atomic transaction will either all occur (a successful operation), or none will occur (an unsuccessful operation). Consistency failure[edit] Consistency is a very general term, which demands that the data must meet all validation rules. Isolation failure[edit] Combined, there are four actions:

SQL Indexing and Tuning e-Book for developers: Use The Index, Luke covers Oracle, MySQL, PostgreSQL, SQL Server, ... Join strategies and performance in PostgreSQL - CYBERTEC © Laurenz Albe 2020 There are three join strategies in PostgreSQL that work quite differently. If PostgreSQL chooses the wrong strategy, query performance can suffer a lot. Terminology Relation A join combines data from two relations. the base relation a will be joined to the result of the join of b and c. Inner and outer relation The execution plan for any join looks like this: We call the upper of the joined relations (in this case the sequential scan on a) the outer relation of the join, and we call the lower relation (the hash computed from b) the inner relation. Join condition and join key A Cartesian product or cross join of two relations is what you get if you combine each row from one relation with each row of the other. If the join condition is of the form I call “a.col1” and “b.col2” join keys. Note that for inner joins there is no distinction between the join condition and the WHERE condition, but that doesn’t hold for outer joins. Nested loop join strategy Hash join strategy Conclusion

Demystifying JOIN Algorithms Joins are an important class of operations in databases and data pipelines. There are many kinds of joins in SQL databases, but this post will focus only on the INNER JOIN. It is possible to make the JOIN between two large tables very efficient if the result is going to be small. That being a common scenario in practice, makes JOINs an interesting topic of research. A naive JOIN algorithm can be very slow because JOINs are essentially nested loops. A better alternative to the algorithm above is the HashJoin. This code snippet above is very common in hand-written backend and UI code stitching data together to show something useful to users. A Practical Example Let’s look at an e-commerce database containing two tables: product and cart_item. The product table contains all the product the store sells. The cart_item contains products added to the cart of all the users. PostgreSQL might use a HashJoin to execute this query. Indexing Let’s disable HashJoin and see how our JOIN query is planned.

14: CREATE INDEX CREATE INDEX — define a new index Synopsis CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ INCLUDE ( column_name [, ...] ) ] [ WITH ( storage_parameter [= value] [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ] Description CREATE INDEX constructs an index on the specified column(s) of the specified relation, which can be a table or a materialized view. Indexes are primarily used to enhance database performance (though inappropriate use can result in slower performance). The key field(s) for the index are specified as column names, or alternatively as expressions written in parentheses. An index field can be an expression computed from the values of one or more columns of the table row. When the WHERE clause is present, a partial index is created. name Tip

14: 51.1. The Path of a Query Here we give a short overview of the stages a query has to pass to obtain a result. A connection from an application program to the PostgreSQL server has to be established. The application program transmits a query to the server and waits to receive the results sent back by the server. The parser stage checks the query transmitted by the application program for correct syntax and creates a query tree. The rewrite system takes the query tree created by the parser stage and looks for any rules (stored in the system catalogs) to apply to the query tree. In the following sections we will cover each of the above listed items in more detail to give a better understanding of PostgreSQL's internal control and data structures.

Indexes in PostgreSQL — 4 (Btree) : Postgres Professional We've already discussed PostgreSQL indexing engine and interface of access methods, as well as hash index, one of access methods. We will now consider B-tree, the most traditional and widely used index. This article is large, so be patient. Structure B-tree index type, implemented as "btree" access method, is suitable for data that can be sorted. In other words, "greater", "greater or equal", "less", "less or equal", and "equal" operators must be defined for the data type. As always, index rows of the B-tree are packed into pages. B-trees have a few important traits: B-trees are balanced, that is, each leaf page is separated from the root by the same number of internal pages. Below is a simplified example of the index on one field with integer keys. The very first page of the index is a metapage, which references the index root. Search by equality Let's consider search of a value in a tree by condition "indexed-field = expression". Search by inequality Search by range Example And by range: ? ?

Indexes in PostgreSQL — 1 : Postgres Professional This series of articles is largely concerned with indexes in PostgreSQL. Any subject can be considered from different perspectives. We will discuss matters that should interest an application developer who uses DBMS: what indexes are available, why there are so many different types of them, and how to use them to speed up queries. Development of new types of indexes is outside the scope. In this article we will discuss the distribution of responsibilities between the general indexing engine related to the DBMS core and individual index access methods, which PostgreSQL enables us to add as extensions. Before we start, I would like to thank Elena Indrupskaya for translating the articles to English. Indexes In PostgreSQL, indexes are special database objects mainly designed to speed up data access. At present, six different kinds of indexes are built into PostgreSQL 9.6, and one more index is available as an extension — thanks to significant changes in version 9.6. Indexing engine Index scan

Bruce Momjian: Postgres Technical Performance Presentations PostgreSQL Performance Tuning This talk is designed for advanced PostgreSQL users who want to know how to maximize PostgreSQL performance. It covers every aspect of performance: server settings, caching, sizing operating system resources, optimizer processing, problem queries, storage efficiency, and some hardware selection details. It includes how to size shared memory, how to understand the output of the optimizer, when to restructure queries, and how to configure storage for optimal performance. Duration: 3 hours, 4 hours with questions Video Tver.io Meetup, February 10, 2020 (video) PGCon, May 23, 2017 PG Day France, June 5, 2014 PostgreSQL Conference Europe, October 29, 2013 Southeast LinuxFest June 10, 2012 ConFoo, February 29, 2012 Gulev, December 8, 2006 NordU Usenix, January 29, 2004 International PHP Conference, November 2-5, 2003 Fosdem, February 8-9, 2003 O'Reilly Open Source Convention, July 22, 2002 SRA, December 10, 2001 Explaining the Postgres Query Optimizer

The Internals of PostgreSQL : Chapter 3 Query Processing Obtaining the cheapest path of a query involving three tables is given below: testdb=# \d tbl_a Table "public.tbl_a" Column | Type | Modifiers --------+---------+----------- id | integer | data | integer | testdb=# \d tbl_b Table "public.tbl_b" Column | Type | Modifiers --------+---------+----------- id | integer | data | integer | testdb=# \d tbl_c Table "public.tbl_c" Column | Type | Modifiers --------+---------+----------- id | integer | not null data | integer | Indexes: "tbl_c_pkey" PRIMARY KEY, btree (id) testdb=# SELECT * FROM tbl_a AS a, tbl_b AS b, tbl_c AS c testdb-# WHERE a.id = b.id AND b.id = c.id AND a.data < 40; Level 1: The planner estimates the cheapest paths of all tables and stores this information in the corresponding RelOptInfos: {tbl_a}, {tbl_b} and {tbl_c}. Level 2: Level 3: The planner finally gets the cheapest path using the already obtained RelOptInfos. {tbl_a,tbl_b,tbl_c}=min({tbl_a,{tbl_b,tbl_c}},{tbl_b,{tbl_a,tbl_c}},{tbl_c,{tbl_a,tbl_b}}).

ACID: An acronym for atomicity, consistency, isolation, and durability, which are the main requirements for guaranteed transaction processing.

Found in: Hurwitz, J., Nugent, A., Halper, F. & Kaufman, M. (2013) Big Data For Dummies. Hoboken, New Jersey, United States of America: For Dummies. ISBN: 9781118504222. by raviii Dec 31

Related: