CKAN 2.8.3 Database Diagram
Overview
I took a schema dump from postgres and used some reverse engineering tools to create the diagram.
Areas of to take note of:
- comments/notes of 4 tables
- 3 “orphan” tables
- 1 polymorphic table
- default schema dump creates 6 unconnected tables, 3 have missing FK constraints:
Right click, open image in new tab to zoom in.
Physical Data Model
activity
User activities in CKAN related to an object. Used to create an activity stream for users, packages, groups and organizations.
Field | Description |
---|---|
id | Record ID. Application generated e.g. uuid.uuid4(). |
timestamp | Timestamp of when record was created. Database generated. |
user_id | The name or id of the user who carried out the activity, e.g. 'seanh'. |
object_id | The name or id of the object of the activity, e.g. 'my_dataset' |
revision_id | The id of the revision the activity is tied to. |
activity_type | The type of the activity, this must be an activity type that CKAN knows how to render, e.g. 'new package', 'changed user', 'deleted group' etc. |
data | Any additional data about the activity (this is a python dictionary) |
activity_detail
Additional details for an activity.
Field | Description |
---|---|
id | Record ID. Application generated e.g. uuid.uuid4(). |
activity_id | The id of the parent activity. |
object_id | The name or id of the object of the activity_detail. |
object_type | The object type of the object of the acitivity_detail |
activity_type | The type of the activity, this must be an activity type that CKAN knows how to render, e.g. 'new package', 'changed user', 'deleted group' etc. |
data | any additional data about the activity_detail (This is a python dictionary) |
authorization_group
Deprecated Table. Authorization groups / roles in system that are available to users. https://github.com/ckan/ckan/commit/55e9c01479295ff413e10456b128c4c631261755
Field | Description |
---|---|
id | The ID. |
name | Name of authorization group / role. |
created | Timestamp group / role created. |
authorization_group_user
Deprecated Table. Join table connecting user to authorization group. https://github.com/ckan/ckan/commit/55e9c01479295ff413e10456b128c4c631261755
Field | Description |
---|---|
authorization_group_id | Id of authorization group assigned to user. |
user_id | The name or id of the user |
id | The ID |
dashboard
Dates a user was last notified of activity information. Used by user dashboard and email notifications. Used by dashboard_activity_list action and is similar to *_activity_list actions but includes an is_new flag for activities.
Field | Description |
---|---|
user_id | The name or id of the user |
activity_stream_last_viewed | Timestamp of last time activity stream was viewed by user. |
email_last_sent | Timestamp of last time an activity notification email was sent to a user. |
group
Represents simple way to manage collections of packages by topic.
Field | Description |
---|---|
id | The ID |
name | Name of group |
title | Title of group |
description | Description of the group |
created | Date time group was created |
state | State of the group (e.g. active, inactive) |
revision_id | ID for associated revision |
type | Type of group (e.g. organization) |
approval_status | Status of approval (e.g. approved, draft) |
image_url | URL for related image for group |
is_organization | Boolean value to indicate if group is organization type |
group_extra
Represents additional group details/metadata
Field | Description |
---|---|
id | The ID |
group_id | Associated group ID |
key | Key of additional values (acts as column field) |
value | Value associated to key |
state | State of key |
revision_id | Associated revision ID |
group_extra_revision
Changes to group_extra records.
Field | Description |
---|---|
id | The ID. |
group_id | The id of the parent group. |
key | Key of additional values (acts as column field) |
value | Value associated to key |
state | State of key |
revision_id | Associated revision ID |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
group_revision
Changes to group records.
Field | Description |
---|---|
id | The ID |
name | Name of group |
title | Title of group |
description | Description of the group |
created | Date time group was created |
state | State of the group (e.g. active, inactive) |
revision_id | ID for associated revision |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
type | Type of group (e.g. organization) |
approval_status | Status of approval (e.g. approved, draft) |
image_url | URL for related image for group |
is_organization | Boolean value to indicate if group is organization type |
member
Record membership of object (e.g. a user, dataset or group) to a group.
Field | Description |
---|---|
id | The ID |
table_id | The id of the object to be added to a group. |
group_id | The id or name of the group to add the object to |
state | Current membership of object to group state (e.g. ‘active’, ‘inactive’) |
revision_id | ID for associated revision |
table_name | The type of the object being added, e.g. 'package' or 'user' |
capacity | The capacity of the membership |
member_revision
Changes to member records
Field | Description |
---|---|
id | The ID |
table_id | The id of the object to be added to a group. |
group_id | The id or name of the group to add the object to |
state | Current membership of object to group state (e.g. ‘active’, ‘inactive’) |
revision_id | ID for associated revision |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
table_name | The type of the object being added, e.g. 'package' or 'user' |
capacity | The capacity of the membership |
migrate_version
Used by sqlalchemy to track new database migrations from codebase. Tracks the schema versions.
Field | Description |
---|---|
repository_id | Used to identify which repository this database is versioned under. You can use the name of your project. (e.g. ‘CKAN’) |
repository_path | Path to migration files. (e.g. ‘ckan/migration’) |
version | Schema version ID (e.g. 087, 088). Tracks version of current database against migrations to determine pending migrations, rollbacks, etc. |
package
Represents the dataset and related metadata.
Field | Description |
---|---|
id | The ID |
name | The unique name that is used in urls and for identification |
title | The title of the dataset (optional, default: same as name) |
version | Version of package |
url | Home page for this dataset |
notes | Description and other information about the dataset |
license_id | passed from WP). The license under which the dataset is made available |
revision_id | ID of associated revision |
author | Author of dataset |
author_email | Author email |
maintainer | Maintainer of dataset |
maintainer_email | Maintainer email |
state | The current state of the dataset, e.g. 'active' or 'deleted', only active datasets show up in search results and other lists of datasets, this parameter will be ignored if you are not authorized to change the state of the dataset (optional, default: 'active') |
type | The type of the dataset (optional), IDatasetForm plugins associate themselves with different dataset types and provide custom dataset handling behaviour for these types |
owner_org | Associated organization to group dataset under |
private | If True creates a private dataset |
metadata_modified | Date time last updated |
creator_user_id | Creating user ID |
metadata_created | Date time dataset was created |
package_extra
Represents additional group details/metadata. Represents additional metadata for datasets (arbitrary, unlimited additional key/value fields).
Field | Description |
---|---|
id | The ID |
package_id | Associated package ID |
key | Key of additional values (acts as column field) |
value | Value associated to key |
revision_id | Associated revision ID |
state | State of key |
package_extra_revision
Changes to package_extra records
Field | Description |
---|---|
id | The ID |
package_id | Associated package ID |
key | Key of additional values (acts as column field) |
value | Value associated to key |
revision_id | Associated revision ID |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
state | State of key |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
package_relationship
Represents how a dataset is related to other objects (i.e. package A depends on package B)
Field | Description |
---|---|
id | The ID |
subject_package_id | The first package (i.e. parent) |
object_package_id | The second package (i.e. child0) |
type | Type of relationship (e.g. depends_on, dependency_of, child_of, etc) |
comment | Additional comments about relationship |
revision_id | Associated revision ID |
state | State of relationship |
package_relationship_revision
Changes to package_relationship records
Field | Description |
---|---|
id | The ID |
subject_package_id | The first package (i.e. parent) |
object_package_id | The second package (i.e. child0 |
type | Type of relationship (e.g. depends_on, dependency_of, child_of, etc) |
comment | Additional comments about relationship |
revision_id | Associated revision ID |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
state | State of relationship |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
package_revision
Changes to package records
Field | Description |
---|---|
id | The ID |
name | The unique name that is used in urls and for identification |
title | The title of the dataset (optional, default: same as name) |
version | Version of package |
url | Home page for this dataset |
notes | Description and other information about the dataset |
license_id | (passed from WP). The license under which the dataset is made available |
revision_id | ID of associated revision |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
author | Author of dataset |
author_email | Author email |
maintainer | Maintainer of dataset |
maintainer_email | Maintainer email |
state | The current state of the dataset, e.g. 'active' or 'deleted', only active datasets show up in search results and other lists of datasets, this parameter will be ignored if you are not authorized to change the state of the dataset (optional, default: 'active') |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
type | The type of the dataset (optional), IDatasetForm plugins associate themselves with different dataset types and provide custom dataset handling behaviour for these types |
owner_org | Associated organization to group dataset under |
private | If True creates a private dataset |
metadata_modified | Date time last updated |
creator_user_id | Creating user ID |
metadata_created | Date time dataset was created |
package_tag
Represents the tags for packages which are used to group/classify datasets, associates tags to packages.
Field | Description |
---|---|
id | The ID |
package_id | Associated package ID |
tag_id | Associated tag ID |
revision_id | Associated revision ID |
state | Sate of association |
package_tag_revision
Changes to package_tag records
Field | Description |
---|---|
id | The ID |
package_id | Associated package ID |
tag_id | Associated tag ID |
revision_id | Associated revision ID |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
state | Sate of association |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
rating
User rating of a dataset (package)
Field | Description |
---|---|
id | The ID |
user_id | The name or id of the user that created the rating |
user_ip_address | Creating user IP address |
package_id | The name or id of the dataset to rate |
rating | The rating to give to the dataset, an integer between 1 and 5 |
created | Timestamp that rating record was created |
resource
Represents the structure data files and technical documents related to a package (dataset). A Dataset in CKAN has many Resources. They consist of links to the data itself or sometimes link to information about the Dataset.
Field | Description |
---|---|
id | The ID |
url | The key attribute of a resource (and the only required attribute). The url points to the location online where the content of that resource can be found. For a file this would be the location online of that file (or more generally a url which yields the bitstream representing the contents of that file – for example some “files” are only generated on demand from a database). For an API this would be the endpoint for the api. |
format | Human created format string with possible nesting e.g. zip:csv. See below for details of the format field. |
description | A brief description (one sentence) of the Resource. Longer descriptions can go in notes field of the associated Data Package. |
position | Position of resource used for ordering on the page |
revision_id | Associated revision ID |
hash | Hash (md5 or sha-1 hash) of resource inserted by datastore cron job |
state | State of resource |
extras | Additional resource information |
name | A name for this resource (could be used in a ckan url) |
resource_type | The type of the resource. One of: file | file.upload | api | visualization | code | documentation |
mimetype | Standard mimetype (e.g. for zipped csv would be application/zip) |
mimetype_inner | Mimetype of innermost object (so for example would be text/csv) |
size | Size of the resource (content length). Usually only relevant for resources of type file. |
last_modified | Date time resource was last updated |
cache_url | URL for cached resource |
cache_last_updated | Date time of cache last updated |
webstore_url | URL for webstore for resource |
webstore_last_updated | Date time webstore url was last updated |
created | Date time resource was created |
url_type | Type of URL for resource |
package_id | Associated package (dataset), all resources are assigned to a dataset |
resource_revision
Changes to resource records
Field | Description |
---|---|
id | The ID |
url | The key attribute of a resource (and the only required attribute). The url points to the location online where the content of that resource can be found. For a file this would be the location online of that file (or more generally a url which yields the bitstream representing the contents of that file – for example some “files” are only generated on demand from a database). For an API this would be the endpoint for the api. |
format | Human created format string with possible nesting e.g. zip:csv. See below for details of the format field. |
description | A brief description (one sentence) of the Resource. Longer descriptions can go in notes field of the associated Data Package. |
position | Position of resource used for ordering on the page |
revision_id | Associated revision ID |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
hash | Hash (md5 or sha-1 hash) of resource inserted by datastorer cron job |
state | State of resource |
extras | Additional resource information |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
name | A name for this resource (could be used in a ckan url) |
resource_type | The type of the resource. One of: file, file.upload, api, visualization, code, documentation |
mimetype | Standard mimetype (e.g. for zipped csv would be application/zip) |
mimetype_inner | Mimetype of innermost object (so for example would be text/csv) |
size | Size of the resource (content length). Usually only relevant for resources of type file. |
last_modified | Date time resource was last updated |
cache_url | URL for cached resource |
cache_last_updated | Date time of cache last updated |
webstore_url | URL for webstore for resource |
webstore_last_updated | Date time webstore url was last updated |
created | Date time resource was created |
url_type | Type of URL for resource |
package_id | Associated package (dataset), all resources are assigned to a dataset |
resource_view
Resource view record for resources that can be displayed/previewed in the system (e.g. a CSV can have a data explorer resource view).
Field | Description |
---|---|
id | The ID |
resource_id | Id of the associated (parent) resource |
title | The title of the view |
description | A description of the view (optional) |
view_type | Type of view |
order | Display order of resource view (used when multiple resource views are created for a resource) |
config | Options necessary to recreate a view state (optional) |
revision
General revision information used by all *_revision tables to track additional metadata about the revision.
Field | Description |
---|---|
id | The Id |
timestamp | Date and time the revision is created |
author | Id or name of user that triggered revision (e.g. edited dataset) |
message | Message about the revision (e.g. “REST API: Create member object”) |
state | State of the revision |
approved_timestamp | Date and time the revision is approved, if necessary |
system_info
General system information settings. This can be configured in the *.ini configuration file or in the database as key / value pairs.
Field | Description |
---|---|
id | The ID |
key | Key of additional values (acts as column field) |
value | Value associated to key |
revision_id | Associated revision ID |
state | State of configuration key/value pair |
system_info_revision
Changes to system_info records
Field | Description |
---|---|
id | The ID |
key | Key of additional values (acts as column field) |
value | Value associated to key |
revision_id | Associated revision ID |
continuity_id | FK to base table. Based table columns are exactly duplicated. This ties base table to revision table. |
state | State of configuration key/value pair |
expired_id | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
revision_timestamp | Timestamp of when revision record created. |
expired_timestamp | Timestamp of when revision record expired (no longer current). |
current | Deprecated. https://github.com/ckan/ckan/blob/ebe9d8390e52458bc388fbaf15a9ab7bee2ae2d7/ckan/model/core.py#L71 |
tag
Represents additional way to classify and group data by terms. Each tag either belongs to a vocabulary, or can be a free tag that doesn’t belong to any vocabulary (i.e. a normal CKAN tag).
Field | Description |
---|---|
id | The ID |
name | Name of tag |
vocabulary_id | Associated vocabulary the tag is part of (optional) |
task_status
Polymorphic. Typically used by datapusher to track key/value pairs about resource task statuses. Could be used by any object that needs to track task status.
Field | Description |
---|---|
id | The id of the task status |
entity_id | The id of the object the task status is being performed on (e.g. resource.id, package.id) |
entity_type | The type of the object the task is being performed on (e.g. resource, package) |
task_type | The type of task being performed of the task status |
key | The key descriptor of the task status |
value | The value or actual data being stored as part of the task |
state | State of task (e.g. ‘active’, ‘complete’) |
error | Information about any error that occurred during processing. |
last_updated | The time at which this entry was last updated. Defaults to the current time. |
term_translation
Used by the multilingual plugin to track term translations. e.g. term: 'Welcome', term_translation: 'Bienvenue', lang_code: 'fr'
Field | Description |
---|---|
term | The English term (key) |
term_translation | The translated version of the term |
lang_code | The language code of the term_translation (e.g. ‘fr’) |
tracking_raw
Used to do basic page view tracking for basic stats. user_key is not the user.id but generated based on request information.
Field | Description |
---|---|
user_key | The generated key to identify the user. unique anonymized key for each user to prevent counting multiple clicks from the same user. Page view count limited to 1 page view per user per day. |
url | URL being tracked / interreacted with (page url) |
tracking_type | Type of page being clicked (e.g. ‘dataset’, ‘page’, etc |
access_timestamp | Timestamp of when record was created / page accessed. |
tracking_summary
Basic page view tracking summary for package and it’s resources.
Field | Description |
---|---|
url | URL being tracked / interreacted with (page url) |
package_id | The ID of the package being tracked (if resource is being tracked this tracks it’s parent package). |
tracking_type | Type of page being clicked (e.g. ‘dataset’, ‘page’, etc |
count | Overall count of views used to calculate running_total and recent_views |
running_total | Running total of views based on tracking_date |
recent_views | Number of views in the last 14 days |
tracking_date | Start date of tracking. Defaults to 2011-01-01. |
user
Represents a list of users in the system.
Field | Description |
---|---|
id | The ID |
name | The name of the user, a string between 2 and 100 characters in length, containing only lowercase alphanumeric characters, - and _ |
apikey | Users/system generated API key to interact with the system via the API |
created | Date and time user was created |
about | A description of the new user (optional) |
password | The password of the new user, a string of at least 4 characters |
fullname | The full name of the new user (optional) |
The email address for the new user | |
reset_key | Resent key used when user is changing password |
sysadmin | Indicates if user is a system administrator |
activity_streams_email_notifications | Indicates if user should get notifications about activity in system |
state | State of user |
user_following_dataset
Represents the datasets a user if following. Modifies their notifications and activity streams.
Field | Description |
---|---|
follower_id | User id |
object_id | Object id (e.g. dataset id) |
datetime | Timestamp of when use started following dataset |
user_following_group
Represents the groups a user if following. Modifies their notifications and activity streams.
Field | Description |
---|---|
follower_id | User id |
object_id | Object id (e.g. group id) |
datetime | Timestamp of when use started following dataset |
user_following_user
Represents the users a user if following. Modifies their notifications and activity streams.
Field | Description |
---|---|
follower_id | User id |
object_id | Object id (e.g. user id) |
datetime | Timestamp of when use started following dataset |
vocabulary
Tag vocabulary are a way of grouping related tags together into custom fields. Enables restrictions such as that each dataset must have a genre and a composer, and that no dataset can have two genres or two composers, etc.
Field | Description |
---|---|
id | The id of the vocabulary |
name | The name of the new vocabulary, e.g. 'Genre' |