Overview

I took a schema dump from postgres and used some reverse engineering tools to create the diagram.

Areas of to take note of:

  • comments/notes of 4 tables
  • 3 “orphan” tables
  • 1 polymorphic table
  • default schema dump creates 6 unconnected tables, 3 have missing FK constraints:
alter table tracking_summary
   add constraint tracking_summary_package_id_fk
      foreign key (package_id) references package;

-- Note: this one is to help demonstrate is relation but read the comment!
alter table task_status
   add constraint task_status_entity_id_fk
      foreign key (entity_id) references resource;

alter table activity
   add constraint activity_user_id_fk
      foreign key (user_id) references user;
      

Right click, open image in new tab to zoom in.

ckan 2.8.3 DDL physical diagram

Physical Data Model

activity

User activities in CKAN related to an object. Used to create an activity stream for users, packages, groups and organizations.

Field Description
id Record ID. Application generated e.g. uuid.uuid4().
timestamp Timestamp of when record was created. Database generated.
user_id The name or id of the user who carried out the activity, e.g. 'seanh'.
object_id The name or id of the object of the activity, e.g. 'my_dataset'
revision_id The id of the revision the activity is tied to.
activity_type The type of the activity, this must be an activity type that CKAN knows how to render, e.g. 'new package', 'changed user', 'deleted group' etc.
data Any additional data about the activity (this is a python dictionary)

activity_detail

Additional details for an activity.

Field Description
id Record ID. Application generated e.g. uuid.uuid4().
activity_id The id of the parent activity.
object_id The name or id of the object of the activity_detail.
object_type The object type of the object of the acitivity_detail
activity_type The type of the activity, this must be an activity type that CKAN knows how to render, e.g. 'new package', 'changed user', 'deleted group' etc.
data any additional data about the activity_detail (This is a python dictionary)

authorization_group

Deprecated Table. Authorization groups / roles in system that are available to users. https://github.com/ckan/ckan/commit/55e9c01479295ff413e10456b128c4c631261755

Field Description
id The ID.
name Name of authorization group / role.
created Timestamp group / role created.

authorization_group_user

Deprecated Table. Join table connecting user to authorization group. https://github.com/ckan/ckan/commit/55e9c01479295ff413e10456b128c4c631261755

Field Description
authorization_group_id Id of authorization group assigned to user.
user_id The name or id of the user
id The ID

dashboard

Dates a user was last notified of activity information. Used by user dashboard and email notifications. Used by dashboard_activity_list action and is similar to *_activity_list actions but includes an is_new flag for activities.

Field Description
user_id The name or id of the user
activity_stream_last_viewed Timestamp of last time activity stream was viewed by user.
email_last_sent Timestamp of last time an activity notification email was sent to a user.

group

Represents simple way to manage collections of packages by topic.

Field Description
id The ID
name Name of group
title Title of group
description Description of the group
created Date time group was created
state State of the group (e.g. active, inactive)
revision_id ID for associated revision
type Type of group (e.g. organization)
approval_status Status of approval (e.g. approved, draft)
image_url URL for related image for group
is_organization Boolean value to indicate if group is organization type

group_extra

Represents additional group details/metadata

Field Description
id The ID
group_id Associated group ID
key Key of additional values (acts as column field)
value Value associated to key
state State of key
revision_id Associated revision ID

group_extra_revision

Changes to group_extra records.

Field Description
id The ID.
group_id The id of the parent group.
key Key of additional values (acts as column field)
value Value associated to key
state State of key
revision_id Associated revision ID
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71

group_revision

Changes to group records.

Field Description
id The ID
name Name of group
title Title of group
description Description of the group
created Date time group was created
state State of the group (e.g. active, inactive)
revision_id ID for associated revision
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
type Type of group (e.g. organization)
approval_status Status of approval (e.g. approved, draft)
image_url URL for related image for group
is_organization Boolean value to indicate if group is organization type

member

Record membership of object (e.g. a user, dataset or group) to a group.

Field Description
id The ID
table_id The id of the object to be added to a group.
group_id The id or name of the group to add the object to
state Current membership of object to group state (e.g. ‘active’, ‘inactive’)
revision_id ID for associated revision
table_name The type of the object being added, e.g. 'package' or 'user'
capacity The capacity of the membership

member_revision

Changes to member records

Field Description
id The ID
table_id The id of the object to be added to a group.
group_id The id or name of the group to add the object to
state Current membership of object to group state (e.g. ‘active’, ‘inactive’)
revision_id ID for associated revision
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
table_name The type of the object being added, e.g. 'package' or 'user'
capacity The capacity of the membership

migrate_version

Used by sqlalchemy to track new database migrations from codebase. Tracks the schema versions.

Field Description
repository_id Used to identify which repository this database is versioned under. You can use the name of your project. (e.g. ‘CKAN’)
repository_path Path to migration files. (e.g. ‘ckan/migration’)
version Schema version ID (e.g. 087, 088). Tracks version of current database against migrations to determine pending migrations, rollbacks, etc.

package

Represents the dataset and related metadata.

Field Description
id The ID
name The unique name that is used in urls and for identification
title The title of the dataset (optional, default: same as name)
version Version of package
url Home page for this dataset
notes Description and other information about the dataset
license_id passed from WP). The license under which the dataset is made available
revision_id ID of associated revision
author Author of dataset
author_email Author email
maintainer Maintainer of dataset
maintainer_email Maintainer email
state The current state of the dataset, e.g. 'active' or 'deleted', only active datasets show up in search results and other lists of datasets, this parameter will be ignored if you are not authorized to change the state of the dataset (optional, default: 'active')
type The type of the dataset (optional), IDatasetForm plugins associate themselves with different dataset types and provide custom dataset handling behaviour for these types
owner_org Associated organization to group dataset under
private If True creates a private dataset
metadata_modified Date time last updated
creator_user_id Creating user ID
metadata_created Date time dataset was created

package_extra

Represents additional group details/metadata. Represents additional metadata for datasets (arbitrary, unlimited additional key/value fields).

Field Description
id The ID
package_id Associated package ID
key Key of additional values (acts as column field)
value Value associated to key
revision_id Associated revision ID
state State of key

package_extra_revision

Changes to package_extra records

Field Description
id The ID
package_id Associated package ID
key Key of additional values (acts as column field)
value Value associated to key
revision_id Associated revision ID
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
state State of key
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71

package_relationship

Represents how a dataset is related to other objects (i.e. package A depends on package B)

Field Description
id The ID
subject_package_id The first package (i.e. parent)
object_package_id The second package (i.e. child0)
type Type of relationship (e.g. depends_on, dependency_of, child_of, etc)
comment Additional comments about relationship
revision_id Associated revision ID
state State of relationship

package_relationship_revision

Changes to package_relationship records

Field Description
id The ID
subject_package_id The first package (i.e. parent)
object_package_id The second package (i.e. child0
type Type of relationship (e.g. depends_on, dependency_of, child_of, etc)
comment Additional comments about relationship
revision_id Associated revision ID
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
state State of relationship
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71

package_revision

Changes to package records

Field Description
id The ID
name The unique name that is used in urls and for identification
title The title of the dataset (optional, default: same as name)
version Version of package
url Home page for this dataset
notes Description and other information about the dataset
license_id (passed from WP). The license under which the dataset is made available
revision_id ID of associated revision
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
author Author of dataset
author_email Author email
maintainer Maintainer of dataset
maintainer_email Maintainer email
state The current state of the dataset, e.g. 'active' or 'deleted', only active datasets show up in search results and other lists of datasets, this parameter will be ignored if you are not authorized to change the state of the dataset (optional, default: 'active')
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
type The type of the dataset (optional), IDatasetForm plugins associate themselves with different dataset types and provide custom dataset handling behaviour for these types
owner_org Associated organization to group dataset under
private If True creates a private dataset
metadata_modified Date time last updated
creator_user_id Creating user ID
metadata_created Date time dataset was created

package_tag

Represents the tags for packages which are used to group/classify datasets, associates tags to packages.

Field Description
id The ID
package_id Associated package ID
tag_id Associated tag ID
revision_id Associated revision ID
state Sate of association

package_tag_revision

Changes to package_tag records

Field Description
id The ID
package_id Associated package ID
tag_id Associated tag ID
revision_id Associated revision ID
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
state Sate of association
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71

rating

User rating of a dataset (package)

Field Description
id The ID
user_id The name or id of the user that created the rating
user_ip_address Creating user IP address
package_id The name or id of the dataset to rate
rating The rating to give to the dataset, an integer between 1 and 5
created Timestamp that rating record was created

resource

Represents the structure data files and technical documents related to a package (dataset). A Dataset in CKAN has many Resources. They consist of links to the data itself or sometimes link to information about the Dataset.

Field Description
id The ID
url The key attribute of a resource (and the only required attribute). The url points to the location online where the content of that resource can be found. For a file this would be the location online of that file (or more generally a url which yields the bitstream representing the contents of that file – for example some “files” are only generated on demand from a database). For an API this would be the endpoint for the api.
format Human created format string with possible nesting e.g. zip:csv. See below for details of the format field.
description A brief description (one sentence) of the Resource. Longer descriptions can go in notes field of the associated Data Package.
position Position of resource used for ordering on the page
revision_id Associated revision ID
hash Hash (md5 or sha-1 hash) of resource inserted by datastore cron job
state State of resource
extras Additional resource information
name A name for this resource (could be used in a ckan url)
resource_type The type of the resource. One of: file | file.upload | api | visualization | code | documentation
mimetype Standard mimetype (e.g. for zipped csv would be application/zip)
mimetype_inner Mimetype of innermost object (so for example would be text/csv)
size Size of the resource (content length). Usually only relevant for resources of type file.
last_modified Date time resource was last updated
cache_url URL for cached resource
cache_last_updated Date time of cache last updated
webstore_url URL for webstore for resource
webstore_last_updated Date time webstore url was last updated
created Date time resource was created
url_type Type of URL for resource
package_id Associated package (dataset), all resources are assigned to a dataset

resource_revision

Changes to resource records

Field Description
id The ID
url The key attribute of a resource (and the only required attribute). The url points to the location online where the content of that resource can be found. For a file this would be the location online of that file (or more generally a url which yields the bitstream representing the contents of that file – for example some “files” are only generated on demand from a database). For an API this would be the endpoint for the api.
format Human created format string with possible nesting e.g. zip:csv. See below for details of the format field.
description A brief description (one sentence) of the Resource. Longer descriptions can go in notes field of the associated Data Package.
position Position of resource used for ordering on the page
revision_id Associated revision ID
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
hash Hash (md5 or sha-1 hash) of resource inserted by datastorer cron job
state State of resource
extras Additional resource information
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
name A name for this resource (could be used in a ckan url)
resource_type The type of the resource. One of: file, file.upload, api, visualization, code, documentation
mimetype Standard mimetype (e.g. for zipped csv would be application/zip)
mimetype_inner Mimetype of innermost object (so for example would be text/csv)
size Size of the resource (content length). Usually only relevant for resources of type file.
last_modified Date time resource was last updated
cache_url URL for cached resource
cache_last_updated Date time of cache last updated
webstore_url URL for webstore for resource
webstore_last_updated Date time webstore url was last updated
created Date time resource was created
url_type Type of URL for resource
package_id Associated package (dataset), all resources are assigned to a dataset

resource_view

Resource view record for resources that can be displayed/previewed in the system (e.g. a CSV can have a data explorer resource view).

Field Description
id The ID
resource_id Id of the associated (parent) resource
title The title of the view
description A description of the view (optional)
view_type Type of view
order Display order of resource view (used when multiple resource views are created for a resource)
config Options necessary to recreate a view state (optional)

revision

General revision information used by all *_revision tables to track additional metadata about the revision.

Field Description
id The Id
timestamp Date and time the revision is created
author Id or name of user that triggered revision (e.g. edited dataset)
message Message about the revision (e.g. “REST API: Create member object”)
state State of the revision
approved_timestamp Date and time the revision is approved, if necessary

system_info

General system information settings. This can be configured in the *.ini configuration file or in the database as key / value pairs.

Field Description
id The ID
key Key of additional values (acts as column field)
value Value associated to key
revision_id Associated revision ID
state State of configuration key/value pair

system_info_revision

Changes to system_info records

Field Description
id The ID
key Key of additional values (acts as column field)
value Value associated to key
revision_id Associated revision ID
continuity_id FK to base table. Based table columns are exactly duplicated. This ties base table to revision table.
state State of configuration key/value pair
expired_id Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71
revision_timestamp Timestamp of when revision record created.
expired_timestamp Timestamp of when revision record expired (no longer current).
current Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71

tag

Represents additional way to classify and group data by terms. Each tag either belongs to a vocabulary, or can be a free tag that doesn’t belong to any vocabulary (i.e. a normal CKAN tag).

Field Description
id The ID
name Name of tag
vocabulary_id Associated vocabulary the tag is part of (optional)

task_status

Polymorphic. Typically used by datapusher to track key/value pairs about resource task statuses. Could be used by any object that needs to track task status.

Field Description
id The id of the task status
entity_id The id of the object the task status is being performed on (e.g. resource.id, package.id)
entity_type The type of the object the task is being performed on (e.g. resource, package)
task_type The type of task being performed of the task status
key The key descriptor of the task status
value The value or actual data being stored as part of the task
state State of task (e.g. ‘active’, ‘complete’)
error Information about any error that occurred during processing.
last_updated The time at which this entry was last updated. Defaults to the current time.

term_translation

Used by the multilingual plugin to track term translations. e.g. term: 'Welcome', term_translation: 'Bienvenue', lang_code: 'fr'

Field Description
term The English term (key)
term_translation The translated version of the term
lang_code The language code of the term_translation (e.g. ‘fr’)

tracking_raw

Used to do basic page view tracking for basic stats. user_key is not the user.id but generated based on request information.

Field Description
user_key The generated key to identify the user. unique anonymized key for each user to prevent counting multiple clicks from the same user. Page view count limited to 1 page view per user per day.
url URL being tracked / interreacted with (page url)
tracking_type Type of page being clicked (e.g. ‘dataset’, ‘page’, etc
access_timestamp Timestamp of when record was created / page accessed.

tracking_summary

Basic page view tracking summary for package and it’s resources.

Field Description
url URL being tracked / interreacted with (page url)
package_id The ID of the package being tracked (if resource is being tracked this tracks it’s parent package).
tracking_type Type of page being clicked (e.g. ‘dataset’, ‘page’, etc
count Overall count of views used to calculate running_total and recent_views
running_total Running total of views based on tracking_date
recent_views Number of views in the last 14 days
tracking_date Start date of tracking. Defaults to 2011-01-01.

user

Represents a list of users in the system.

Field Description
id The ID
name The name of the user, a string between 2 and 100 characters in length, containing only lowercase alphanumeric characters, - and _
apikey Users/system generated API key to interact with the system via the API
created Date and time user was created
about A description of the new user (optional)
password The password of the new user, a string of at least 4 characters
fullname The full name of the new user (optional)
email The email address for the new user
reset_key Resent key used when user is changing password
sysadmin Indicates if user is a system administrator
activity_streams_email_notifications Indicates if user should get notifications about activity in system
state State of user

user_following_dataset

Represents the datasets a user if following. Modifies their notifications and activity streams.

Field Description
follower_id User id
object_id Object id (e.g. dataset id)
datetime Timestamp of when use started following dataset

user_following_group

Represents the groups a user if following. Modifies their notifications and activity streams.

Field Description
follower_id User id
object_id Object id (e.g. group id)
datetime Timestamp of when use started following dataset

user_following_user

Represents the users a user if following. Modifies their notifications and activity streams.

Field Description
follower_id User id
object_id Object id (e.g. user id)
datetime Timestamp of when use started following dataset

vocabulary

Tag vocabulary are a way of grouping related tags together into custom fields. Enables restrictions such as that each dataset must have a genre and a composer, and that no dataset can have two genres or two composers, etc.

Field Description
id The id of the vocabulary
name The name of the new vocabulary, e.g. 'Genre'