Introduction
Data can be viewed as a collection of associated items.
These associations are interpreted as meaningful relationships.
For accessing and processing these relationships
they need to be represented on a computer.
The data relationships are machine-independent.
Their computer representations are not.
In 1965
ARPA initiated a research project
to explore the existence of a mathematical foundation
for defining and manipulating
data representations on computers.
ARPA considered current data structure development
too
application and platform dependent,
thus be unable to
support optimal function and performance
capabilities
of future systems.
It was ARPA's contention that
machine representations of data
should be transparent to
all applications no matter the platform,
i.e.
machine-independent.
Data relationships and mathematical sets
have something in common,
they are both devoid of any physical dependencies.
Set operations already exist for processing relationships
mathematically defined as relations.
Set operations are defined in terms of a set's
mathematical identity,
not in terms of a set's physical representation.
Set theory also supports a wealth of operations
for processing sets by their mathematical identities.
If data relationships had mathematical identities,
then set operations could be used to
access and process data.
This observation was used in 1965 to initiate
research on the feasibility of developing a
mathematically well-defined
set-theoretic data structure.
STDS was later used as the storage management system to support
an Information Management System developed at the University of
Michigan and commercially available from 1971
through 1998.(see
STDS-IMS)
Set processing in a network environment was implemented in 1975.
(see
Hardgrave)
Data as a Mathematical Object
Data is essentially a collection of
meaningful relationships of interest.
Exploiting this interest
with the aid of a computer
requires representing data relationships
in a machine-friendly format.
Thus, by definition,
data representations are machine-dependent.
A mathematical foundation
for defining and manipulating data relationships
requires that data relationships have mathematical identities.
Since any actual manipulation is done by a computer,
the machine representations of data relationships
must also have mathematical identities.
Binary relationships already have an
identity as binary relations defined by set theory.
The wealth of set operations already available
for well-defined relations
made set theory an attractive candidate as a
mathematical foundation for developing
future data processing systems.
Unfortunately, providing data relationships
with a unique mathematical identity was not
feasible with any then available set theory.
Records as n-Tuples
No matter what data model is chosen for a specific
data processing application, the data has to be
physically structured in order to be accessed.
To support transparency
for all user data models, machine-independence
was required at the I/O level.
At the I/O level all data structures are defined in terms of
records, fields, and files.
The structural similarity between
computer records
and
set-theoretic n-tuples
is difficult to ignore.
To model data structures at the I/O level
STDS was developed in terms of
tuples, domains, and sets.
Since records are a basic representation of data
and n-tuples are mathematical objects for representing
item relationships,
formally identifying the two
seemed to meet ARPA's research objective
for a mathematical foundation for
supporting data structures.
And in principle it did, but in practice it did not.
The fundamental problem was that manipulating records
relied on programming item access to the contents of records.
Even though n-tuples have a set-theoretic identity
it is declared instead of being defined by set-membership.
Specifically, the tuple < a, b, c > does not contain a, b, or c
as an item. While the record [ c, b, a ] contains all three.
Though the set { a, b, c } contains exactly
the three items, so does { c, b, a }.
Not surprising
since it is exactly the same set.
But < a, b, c > and < c, b, a >
are not the same set.
Though they belong to the same equivalence class,
they are inverses of each other.
This is not easily verified
using classical set notation.
Determining that two records are equal
is simple.
If for all i, the i-th items
of record r and record s
are the same,
the records are equal.
Determining if two n-tuples are equal
is not so simple.
The i-th item of an n-tuple
is not well-defined in set theory.
Without a formal mechanism to determine
if two n-tuples are equal,
representing records by n-tuples is not of much value .
Record Reordering
In set theory,
operations on sets are defined in
terms of set membership.
To define operations on n-tuples that
mimic record operations,
n-tuples need a well-defined membership condition.
Assume that n-tuples have a
suitably defined membership condition,
then define
ρi
( < x1, x2, .. , xn > ) = x i.
Item Value:
ρi
( < x1, x2, .. , xn > ) = xi
Given the i-th item value operation
defied by
ρi
( r ) = ri.
determining if two records, possibly on separate platforms,
are equal can now be defined set theoretically.
Theorem: Record r equals record s iff
(∀i)
( ρi
( r )
=
ρi
( s ) ).
Being able to determine whether two records on separate platforms
are equal is necessary, but nowhere near sufficient.
Field ordering does not affect the represented relationship,
but it does impact system data accessing and performance
dictated by resulting data structures.
Since a record of cardinality n, |n|,
provides n! different representations for the
same relationship.
For relationships to be shared across platforms
some mechanism is needed to detect relationship equivalence.
A re-mapping operation can be defined that allows
any tuple in an equivalence class
to be re-mapped to any other tuple in the same
equivalence class.
Re-map:
Given
σ with
|σ| = k
and record r,
let
ρσ(r)
=
<
ρσ1(r),
ρσ2(r),..,
ρσk(r)
>.
Example:
ρ< 3, 2, 4, 1 > (< a, b, c, d >)
= < c, b, d, a >
Since an equivalence class of n degree records
has n! different representations,
and since mapping directive σ
has exactly n!, all possible mappings are covered.
This provides platform independence.
Actually a little too much independence since
platforms have no way to recognize relationship equivalence.
Some commonality needs to be shared between platforms.
By sharing appropriate mapping directive,
σs,
relevant relationships can be easily shared across platforms.
Relevant Reordering
Fortunately, records come equipped with item Fields.
Field designations are required to be unique,
even if they represent the same set of item values.
Field orderings
determine item order and
provide a basis
for equating relationship equivalence between files
of the same equivalence class.
By extension
physically defined data Fields equate to
set-theoretically well-defined Domains.
Just as Field ordering determined record ordering,
Domain ordering and selection
dictate n-tuple orderings.
Since not all applications require
access to all relationship Domains,
I/O and processing performance can be dramatically improved by
transferring only those Domains actually
required by an application.
Re-map provides a real-time ability
to provide a record reduction and re-ordering
during data access.
This Domain selection capability
was included
in the 1968 version of STDS.(see
AFIP)
In 2005 Information Builders Inc,
compared the performance of an
STDS supported Information Access Accelerator
with two of the current leading
DBMSs.(see IAA)
The real-time application of set-theoretically
defined Domain selection gave typical
performances improvements between 40 to 90 times
that of commercial DBMSs.
The XST architecture preserves membership across environments while making it possible for
the data representation to vary from one environment to the next. Thus, performance at the
process level can remain independent of how relationships are expressed at the conceptual
level and how data is represented at the storage level.
This is the case for the XST approach in a nutshell:
(1) Set membership determines functionality.
(2) Data representation at the storage level determines performance.
(3) Strong data independence separates functionality concerns from performance
concerns(IAA-13)
Since extended set operations are
application independent,
program language independent,
platform independent, and
storage organization independent
the only performance dependencies are the program blueprint and
the programmer's
imagination.(
IAA-12,
IAA-14)
Though the industry has long believed that
index structures
are essential for
best performance, the exact opposite is
true.
Research has shown that over 90% of an application process can be indexed-access
overhead.(SIGMOD)
By eliminating index structures
in favor of direct data access using extended set operations
performance can be improved by orders of
magnitude.(IAA-30)
For more extensive performance results of two industry
benchmark studies
see IAA
report pages 23-39.
RELATIONS (set theoretic)
In set theorey, relations are
well-defined as subsets
of the Cartesian product.
The Cartesian product is
well-defined
in terms of ordered pairs.
The ordered pair is not well-defined.
It is declaired into existence
by an individual.
Not just one individual, but
by several. Each with a different agenda.
None of which allow the
Cartesian product
to be associative.
Under the axioms of extended set theory,
a definition of ordered pair
was well-defined that allowed the
Cartesian product to be associative.
This in turn allowed n-ary relations to
be well-defined subsets of a
well-defined
extended Cartesian product.
XSP: Smoke & Mirrors?
To some, extended set processing, XSP,
seems to be more "smoke & mirrors"
than
"sets & mathematics".
There is no such thing as an "extended set".
This is not a totally
unjustified view.
Without a deep understanding of
set membership condition
and the difficulties imposed by the
Kuratowski definition of an ordered pair,
it would not be obvious what role an
extended set membership condition
could have on
relations defined as subsets of the
Cartesian Product.(see Berztiss)
XML, Categories, λ-Calculus
The 1968 implementation of STDS only supported access to
homogeneous data arrays.
This was far short of ARPA's vision of a formal
foundation for accessing any and all possible
computer representations of data.
By 1998 an implementation of STDS
was equipped to recognize and process
XML structures as mathematical objects.
It
demonstrated XSP ability to establish
content equivalence between
RDM table-structures and
XML computer
representations.
(see
[T&P])
A mathematical foundation for developing
executable systems is of little value
without capable tools for exploiting its potential.
Category theory and the Lambda calculus
provide just such tools.
Both have known difficulty
associating with Classical set theory.
Extended set theory proved more
accommodating.
(see
Blass,
Kats)
Summary
Mathematically defined data structures
access data by its content.
Physically defined data structures
access data by its location.
Mathematically defined data structures
are intrinsically stable.
Physically defined data structures
are intrinsically fragile.
Mathematically defined data structures
are global.
Physically defined data structures
are local.
Mathematically defined data structures
are machine-independent.
Physically defined data structures
are machine-dependent.
Without a unifying mathematical foundation
data accessing is a disjoint universe
of physically intractable data structures.
With a unifying mathematical foundation
data accessing is a unified universe
of mathematically synergistic data structures.
References
-
[Skolem]
Two Remarks on Set Theory,
Skolem, T.: 2. The ordered n-tuples as sets, MATH. SCAND, 5 1957,
"It is still a problem how the ordered n-tuple can be defined in the most suitable way."
-
[Suppes]
Axiomatic Set Theory,
Suppes, P.:Van Nostrand, 1960,
"All relations are binary."[p. 57]
-
[AFIP]
Description of a Set-theoretic Data Structure: - 1968
-
[ARPA]
CONCOMP: Research in Conversational use of Computers: final report, December 1970
-
[Berztiss]
DATA STRUCTURES: Theory and Practice, Berztiss, A.T., Academic Press, New York, 1971,
"We have to develop a completely new theory of ordered objects."[p. 27]
-
[STDS-IMS]
UoM Information Management System,
STDS supported, 1971-1998
-
[Hardgrave75]
Set Processing in a Network Environment
Extended set theory is an effective method for describing
the complex data structures needed to support large-scale data base
applications.
[Hardgrave, W.T. 1975]
-
[Hardgrave76]
A technique for implementing a set processor,
Hardgrave, W. T. 1976
-
[VLDB77]
1977 VLDB Extended Set Theory:
A General Model For Very Large, Distributed, Backend Information Systems.
-
[LLL]
Set Theoretic Data Structures (STDS), Lawrence Livermore Lab.
Technical Report, 1977.
-
[NATO]
A Mathematical Foundation for Systems Development - NATO-ASI Series, Vol. F24, 1985
-
[VLDB84]
1984 VLDB Panel: Inexpensive Large Capacity Storage Will Revolutionize
Future Systems.
-
[Champion]
XSP: An Integration Technology for Systems Development, Champion, M. - 2001
-
[SIGMOD]
OLTP Through the Looking Glass, and What We Found There
Harizopoulos,~S.; Madden,~S.; Abadi,~D.; Stonebraker,~M.:
SIGMOD'08, 2008
♦
Substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.
Over 90% of an application process is indexed-access overhead.
-
[IAA]
Information Access Accelerator
iXSP Performance evaluation, IBI, Stout, R. - 2005
-
[MSLab]
Managing Data Mathematically:
Data As A Mathematical Object:
♦
"Using Extended Set Theory for High Performance Database Management"
Presentation given at Microsoft Research Labs. with an introduction by Phil Bernstein.
(video: duration 1:10:52) - 2006
-
[North]
Sets, Data Models and Data Independence
Ken North a Dr. Dobbs''s Blogger, March 10, 2010
-
[SETS?]
Why Not Sets? All computer processing is set-theoretic
in nature. - 2013
-
[C&C]
Content-Container Data Access Strategies Content for Functionality - Containers for Performance - 2016
♦
Since data represented for processing is not always ideal
for preservation, and since data represented for
preservation is not always ideal for processing,
accessing application-friendly data
from storage-friendly data
poses a genuine challenge.
-
[DDRvsPDS]
Set Processing vs. Record Processing Performance:
Dynamic Data Restructuring vs. Prestructured Data Storage - 2016
♦
Record processing systems and set processing systems are very different.
This paper attempts to clarify the major differences and demonstrate
the performance advantages of set processing.
-
[XSP]
XSP: Extended Set Processing:
Mathematically Managing Data Representations - 2016
♦
This paper presents a high level overview of why a mathematical model of
data representations is necessary, and how an extended set processing
model accomplishes the task.
«
Copyright © 2023
INTEGRATED INFORMATION SYSTEMS
« Version: 01/15/2023 v03»
EXTENDED SET THEORY: Historical Lineage
Functions, Relations, Processes, & Categories
Extended set theory, XST,
complements classical set theory, CST,
in the following ways:
arbitrarily large sets,
fixed rank for n-tuples of constant rank items,
associative Cartesian product,
self-application,
and
support of category theory.
References
-
[2T]
Two Theoremsf:
CST vs XST Functions - 2019
-
[PFAC]
Processes, Functions, Applications & Composition
♦ Lambda application & category composition of processes. - 2018
-
[CSTasXST]
XST Definitions, Operations, & Properties -Tuples, Graphs, Functions -
♦ CST operations as XST operations. - 2018
-
[Kats]
Kategories & Morfisims: - 2017
-
[FUN]
Functions As Set Behavior Essential Concepts: Conceptual &
Formal Modeling Foundations. - 2016
-
[Ellis]
Extended Set Theory: A Summary A clarification of extended set notation. - 2015
-
[Blass]
Axioms and Models for an Extended Set Theory
Blass, A., Childs, D L - 2011
♦
This paper presents the formal foundation for "extending" the axioms
of ZFC, which only addresses the property of membership, to include
a "scope" property for addressing the property of "structure".
-
[T&P]
XSP TECHNOLOGY: Theory & Practice Formal Modeling &
Practical Implementation of XML & RDM Systems - 2007
♦
The RDM works in spite of set theory, not because of it.
XML-Structures and operations on these structures are set-theoretically
sound under XST.
In this paper the formal foundations of RDM and XML systems are examined
in the light of XST to provide practical relevance of XSP Technology to
the field of software systems development.
-
[X-AXIOMS]
Summary of Axioms for Extended Set Theory: - 2005
-
[XSP-XML]
XSP for XML Systems Design & Development: - 2000
-
[XST]
Extended Axiomatic Set Theory:
A Focus on Functions (early draft) - 1999
-
[IFIP]
Feasibility of a Set-theoretic Data Structure:
A Reconstituted Definition of Relation, - 1968
|