Difference between revisions of "PD:Internal CCSDS 652.0-M-1 audit"
(Changed "Some requirements are not fulfilled" with more precise "Minor requirements...") |
m (Protected "PD:Internal CCSDS 652.0-M-1 audit" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))) |
(No difference)
|
Revision as of 15:10, 26 July 2016
Date of audit report: June 2016
Version of audit report: 1.0
This is the first audit of the public domain project. This audit is the starting point for a long term program to develop and install the required organizational and technical methods to fulfill the requirements for a long term digital archive.
This audit was done according to the recommended practice 652.0-M-1 AUDIT AND CERTIFICATION OF TRUSTWORTHY DIGITAL REPOSITORIES from 2011 published by the Consultative Committee for Space Data Systems (CCSDS). The same committee that published the Reference Model for an Open Archival Information System (OAIS).
It was clear from the beginning, that this first audit will show a lot of weak points, not addressed problems and essential requirements that are not fulfilled.
Therefor the huge amount of requirements marked as not fulfilled (red) should not lead to the verdict that the public domain project is completely not trustworthy. The existence of this audit is more than many other archives provide as publicly available information source to evaluate the trustworthiness.
The result of this first audit is the fundamental work for the development of requirements for future processes, technical methods and investments. It helps also to manage this development projects as it provides a metric to measure the impact of a proposal.
This audit will be replaced by a more recent audit after the implementation of serious improvements. So it is possible to track the efforts the project invests into the longterm preservation and its trustworthiness.
This audit documentation is structured in a similar way as the CCSDS 652.0-M-1 Recommended Practice document:
- introduction
- overview of audit and certification criteria
- conclusion
- catalog of requirements
Contents
- 1 OVERVIEW OF AUDIT AND CERTIFICATION CRITERIA
- 2 CONCLUSION AND FIELDS OF NON CONFORMANCE
- 3 ORGANIZATIONAL INFRASTRUCTURE
- 3.1 GOVERNANCE AND ORGANIZATIONAL VIABILITY
- 3.1.1 The repository shall have a mission statement that reflects a commitment to the preservation of, long term retention of, management of, and access to digital information.
- 3.1.2 The repository shall have a Preservation Strategic Plan that defines the approach the repository will take in the long-term support of its mission.
- 3.1.2.1 The repository shall have an appropriate succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.
- 3.1.2.2 The repository shall monitor its organizational environment to determine when to execute its succession plan, contingency plans, and/or escrow arrangements.
- 3.1.3 The repository shall have a Collection Policy or other document that specifies the type of information it will preserve, retain, manage, and provide access to.
- 3.2 ORGANIZATIONAL STRUCTURE AND STAFFING
- 3.2.1 The repository shall have identified and established the duties that it needs to perform and shall have appointed staff with adequate skills and experience to fulfill these duties.
- 3.2.1.1 The repository shall have identified and established the duties that it needs to perform.
- 3.2.1.2 The repository shall have the appropriate number of staff to support all functions and services.
- 3.2.1.3 The repository shall have in place an active professional development program that provides staff with skills and expertise development opportunities.
- 3.2.1 The repository shall have identified and established the duties that it needs to perform and shall have appointed staff with adequate skills and experience to fulfill these duties.
- 3.3 PROCEDURAL ACCOUNTABILITY AND PRESERVATION POLICY FRAMEWORK
- 3.3.1 The repository shall have defined its Designated Community and associated knowledge base(s) and shall have these definitions appropriately accessible.
- 3.3.2 The repository shall have Preservation Policies in place to ensure its Preservation Strategic Plan will be met.
- 3.3.3 The repository shall have a documented history of the changes to its operations,
- 3.3.4 The repository shall commit to transparency and accountability in all actions supporting the operation and management of the repository that affect the preservation of digital content over time.
- 3.3.5 The repository shall define, collect, track, and appropriately provide its information integrity measurements.
- 3.3.6 The repository shall commit to a regular schedule of self-assessment and external certification.
- 3.4 FINANCIAL SUSTAINABILITY
- 3.4.1 The repository shall have short- and long-term business planning processes in place to sustain the repository over time.
- 3.4.2 The repository shall have financial practices and procedures which are transparent, compliant with relevant accounting standards and practices, and audited by third parties in accordance with territorial legal requirements.
- 3.4.3 The repository shall have an ongoing commitment to analyze and report on financial risk, benefit, investment, and expenditure (including assets, licenses, and liabilities).
- 3.5 CONTRACTS, LICENSES, AND LIABILITIES
- 3.5.1 The repository shall have and maintain appropriate contracts or deposit agreements for digital materials that it manages, preserves, and/or to which it provides access.
- 3.5.1.1 The repository shall have contracts or deposit agreements which specify and transfer all necessary preservation rights, and those rights transferred shall be documented.
- 3.5.1.2 The repository shall have specified all appropriate aspects of acquisition, maintenance, access, and withdrawal in written agreements with depositors and other relevant parties.
- 3.5.1.3 The repository shall have written policies that indicate when it accepts preservation responsibility for contents of each set of submitted data objects.
- 3.5.1.4 The repository shall have policies in place to address liability and challenges to ownership/rights.
- 3.5.2 The repository shall track and manage intellectual property rights and restrictions on use of repository content as required by deposit agreement, contract, or license.
- 3.5.1 The repository shall have and maintain appropriate contracts or deposit agreements for digital materials that it manages, preserves, and/or to which it provides access.
- 3.1 GOVERNANCE AND ORGANIZATIONAL VIABILITY
- 4 DIGITAL OBJECT MANAGEMENT
- 4.1 INGEST: ACQUISITION OF CONTENT
- 4.1.1 The repository shall identify the Content Information and the Information Properties that the repository will preserve.
- 4.1.2 The repository shall clearly specify the information that needs to be associated with specific Content Information at the time of its deposit.
- 4.1.3 The repository shall have adequate specifications enabling recognition and parsing of the SIPs.
- 4.1.4 The repository shall have mechanisms to appropriately verify the identity of the Producer of all materials.
- 4.1.5 The repository shall have an ingest process which verifies each SIP for completeness and correctness.
- 4.1.6 The repository shall obtain sufficient control over the Digital Objects to preserve them.
- 4.1.7 The repository shall provide the producer/depositor with appropriate responses at agreed points during the ingest processes.
- 4.1.8 The repository shall have contemporaneous records of actions and administration processes that are relevant to content acquisition.
- 4.2 INGEST: CREATION OF THE AIP
- 4.2.1 The repository shall have for each AIP or class of AIPs preserved by the repository an associated definition that is adequate for parsing the AIP and fit for long- term preservation needs.
- 4.2.2 The repository shall have a description of how AIPs are constructed from SIPs.
- 4.2.3 The repository shall document the final disposition of all SIPs. In particular the following aspect must be checked.
- 4.2.4 The repository shall have and use a convention that generates persistent, unique identifiers for all AIPs.
- 4.2.4.1 The repository shall uniquely identify each AIP within the repository.
- 4.2.4.1.1 The repository shall have unique identifiers.
- 4.2.4.1.2 The repository shall assign and maintain persistent identifiers of the AIP and its components so as to be unique within the context of the repository.
- 4.2.4.1.3 Documentation shall describe any processes used for changes to such identifiers.
- 4.2.4.1.4 The repository shall be able to provide a complete list of all such identifiers and do spot checks for duplications.
- 4.2.4.1.5 The system of identifiers shall be adequate to fit the repository's current and foreseeable future requirements such as numbers of objects.
- 4.2.4.2 The repository shall have a system of reliable linking/resolution services in order to find the uniquely identified object, regardless of its physical location.
- 4.2.4.1 The repository shall uniquely identify each AIP within the repository.
- 4.2.5 The repository shall have access to necessary tools and resources to provide authoritative Representation Information for all of the digital objects it contains. In particular the following aspects must be checked.
- 4.2.5.1 The repository shall have tools or methods to identify the file type of all submitted Data Objects.
- 4.2.5.2 The repository shall have tools or methods to determine what Representation Information is necessary to make each Data Object understandable to the Designated Community.
- 4.2.5.3 The repository shall have access to the requisite Representation Information.
- 4.2.5.4 The repository shall have tools or methods to ensure that the requisite Representation Information is persistently associated with the relevant Data Objects.
- 4.2.6 The repository shall have documented processes for acquiring Preservation Description Information (PDI) for its associated Content Information and acquire PDI in accordance with the documented processes. In particular the following aspects must be checked.
- 4.2.7 The repository shall ensure that the Content Information of the AIPs is understandable for their Designated Community at the time of creation of the AIP. In particular the following aspects must be checked.
- 4.2.7.1 Repository shall have a documented process for testing understandability for their Designated Communities of the Content Information of the AIPs at their creation.
- 4.2.7.2 The repository shall execute the testing process for each class of Content Information of the AIPs.
- 4.2.7.3 The repository shall bring the Content Information of the AIP up to the required level of understandability if it fails the understandability testing.
- 4.2.8 The repository shall verify each AIP for completeness and correctness at the point it is created.
- 4.2.9 The repository shall provide an independent mechanism for verifying the integrity of the repository collection/content.
- 4.2.10 The repository shall have contemporaneous records of actions and administration processes that are relevant to AIP creation.
- 4.3 PRESERVATION PLANNING
- 4.3.1 The repository shall have documented preservation strategies relevant to its holdings.
- 4.3.2 The repository shall have mechanisms in place for monitoring its preservation environment.
- 4.3.3 The repository shall have mechanisms to change its preservation plans as a result of its monitoring activities.
- 4.3.4 The repository shall provide evidence of the effectiveness of its preservation activities.
- 4.4 AIP PRESERVATION
- 4.5 INFORMATION MANAGEMENT
- 4.5.1 The repository shall specify minimum information requirements to enable the Designated Community to discover and identify material of interest.
- 4.5.2 The repository shall capture or create minimum descriptive information and ensure that it is associated with the AIP.
- 4.5.3 The repository shall maintain bi-directional linkage between each AIP and its descriptive information.
- 4.6 ACCESS MANAGEMENT
- 4.1 INGEST: ACQUISITION OF CONTENT
- 5 INFRASTRUCTURE AND SECURITY RISK MANAGEMENT
- 5.1 TECHNICAL INFRASTRUCTURE RISK MANAGEMENT
- 5.1.1 The repository shall identify and manage the risks to its preservation operations and goals associated with system infrastructure.
- 5.1.1.1 The repository shall employ technology watches or other technology monitoring notification systems.
- 5.1.1.1.1 The repository shall have hardware technologies appropriate to the services it provides to its designated communities.
- 5.1.1.1.2 The repository shall have procedures in place to monitor and receive notifications when hardware technology changes are needed.
- 5.1.1.1.3 The repository shall have procedures in place to evaluate when changes are needed to current hardware.
- 5.1.1.1.4 The repository shall have procedures, commitment and funding to replace hardware when evaluation indicates the need to do so.
- 5.1.1.1.5 The repository shall have software technologies appropriate to the services it provides to its designated communities.
- 5.1.1.1.6 The repository shall have procedures in place to monitor and receive notifications when software changes are needed.
- 5.1.1.1.7 The repository shall have procedures in place to evaluate when changes are needed to current software.
- 5.1.1.1.8 The repository shall have procedures, commitment, and funding to replace software when evaluation indicates the need to do so.
- 5.1.1.2 The repository shall have adequate hardware and software support for backup functionality sufficient for preserving the repository content and tracking repository functions.
- 5.1.1.3 The repository shall have effective mechanisms to detect bit corruption or loss.
- 5.1.1.4 The repository shall have a process to record and react to the availability of new security updates based on a risk-benefit assessment.
- 5.1.1.5 The repository shall have defined processes for storage media and/or hardware change (e.g., refreshing, migration).
- 5.1.1.6 The repository shall have identified and documented critical processes that affect its ability to comply with its mandatory responsibilities.
- 5.1.1.6.1 The repository shall have a documented change management process that identifies changes to critical processes that potentially affect the repository's ability to comply with its mandatory responsibilities.
- 5.1.1.6.2 The repository shall have a process for testing and evaluating the effect of changes to the repository's critical processes.
- 5.1.1.1 The repository shall employ technology watches or other technology monitoring notification systems.
- 5.1.2 The repository shall manage the number and location of copies of all digital objects.
- 5.1.1 The repository shall identify and manage the risks to its preservation operations and goals associated with system infrastructure.
- 5.2 SECURITY RISK MANAGEMENT
- 5.2.1 The repository shall maintain a systematic analysis of security risk factors associated with data, systems, personnel, and physical plant.
- 5.2.2 The repository shall have implemented controls to adequately address each of the defined security risks.
- 5.2.3 The repository staff shall have delineated roles, responsibilities, and authorizations related to implementing changes within the system.
- 5.2.4 The repository shall have suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an offsite copy of the recovery plan(s).
- 5.1 TECHNICAL INFRASTRUCTURE RISK MANAGEMENT
OVERVIEW OF AUDIT AND CERTIFICATION CRITERIA
A TRUSTWORTHY DIGITAL REPOSITORY
Definition of a trustworthy digital repository as given in the CCSDS 652.0-M-1 Recommended Practice document:
A trustworthy digital repository will understand threats to and risks within its systems. Constant monitoring, planning, and maintenance, as well as conscious actions and strategy implementation will be required of repositories to carry out their mission of digital preservation. All of these present an expensive, complex undertaking that depositors, stakeholders, funders, the Designated Community, and other digital repositories will need to rely on in the greater collaborative digital preservation environment that is required to preserve the vast amounts of digital information generated now and into the future.
DEFINITIONS
Each requirement is marked with a color, to show its status of fulfillment:
- Requirements fulfilled
- Minor requirements are not fulfilled
- Essential requirements not fulfilled
These definitions from the original audit document all apply to this internal audit:
For a better understanding some paragraphs of the CCSDS 652.0-M-1 Recommended Practice are reproduced here.
CONFORMANCE
Original text: An archive that conforms to this Recommended Practice shall have satisfied the auditor on each of the requirements.
EVIDENCE
Each metric in the Recommended Practice has associated with it informative text under the heading Examples of Ways the Repository Can Demonstrate It Is Meeting This Requirement providing examples of the evidence which might be examined to test whether the repository satisfies the metric. These examples are illustrative rather than prescriptive, and the lists of possible evidence are not exhaustive.
NOMENCLATURE
The following conventions apply for the normative specifications in this Recommended Practice:
- a) the words ‘shall’ and ‘must’ imply a binding and verifiable specification;
- b) the word ‘should’ implies an optional, but desirable, specification;
- c) the word ‘may’ implies an optional specification;
- d) the words ‘is’, ‘are’, and ‘will’ imply statements of fact.
ACRONYMS AND ABBREVIATIONS
AIP | Archival Information Package (defined in reference [1]) |
CCSDS | Consultative Committee for Space Data Systems |
DEDSL | Data Entity Specification Language |
DIP | Dissemination Information Package (defined in reference [1]) |
FITS | Flexible Image Transport System |
GIS | Geographic Information System |
ISO | International Organization for Standardization |
OAIS | Open Archival Information System (see reference [1]) |
PDI | Preservation Description Information (defined in reference [1]) |
SIP | Submission Information Package (defined in reference [1]) |
TEI | Text Encoding Initiative |
UML | Unified Modeling Language |
XML | Extensible Markup Language |
REFERENCES
[1] Reference Model for an Open Archival Information System (OAIS).
For convenience the full text of the recommended practice CCSDS 652.0-M-1 AUDIT AND CERTIFICATION OF TRUSTWORTHY DIGITAL REPOSITORIES is readable on this wiki page: PD:CCSDS_652.0-M-1. Every requirement is directly linked to the corresponding explanation in the CCSDS 652.0-M-1 Recommended Practice.
The original document is published on the CCSDS Website: CCSDS Recommended Practices (Magenta Books)
CONCLUSION AND FIELDS OF NON CONFORMANCE
OVERVIEW
Of the 108 normative metrics the final status is the following:
Metrics with all requirements fulfilled (green): 16
Metrics where Minor requirements are not fulfilled (orange): 15
Metrics with essential requirements not fulfilled (red): 77
FIELDS OF NON CONFORMANCE
ESSENTIAL DEFINITIONS
In the project and between the project members there is a common sense of the designated communities but it is not precisely defined together with the knowledge base of this communities.
The same is true for the definition of the content information that has to be preserved.
REPRESENTATION INFORMATION
An example of an underdeveloped area, because the whole knowledge of the underlying problems and the concepts to handle them was missing at the beginning of this audit, is the field of representation information. The consequent use of open standards an open source software makes it a bit less critical. But because any representation information is missing it still creates a large risk for the repository.
This whole topic has to be addressed in the near future.
MANAGEMENT AND PRESERVATION PLANNING
The area of management tasks, strategic planning, development of policies and tracking if they are applied is also underdeveloped. Also there is no risk assessment installed and therefor there are also no processes to observe the technical and legal environment of the repository.
Missing is a system to plan, manage and track work packages, milestones, issues etc. to support the further development of management and preservation planning.
Missing is a system which enables end users and producers to submit feedback where submitters can observe the reactions and actions in response to there feedback.
DIGITAL OBJECT MANAGEMENT
It was already known that the handling of the digital objects in the repository has its risks that are not handled yet. The audit clearly showed that.
The digital objects are at risk because there is no system to prevent unintended deletion of objects, there is no off-site backup and there is no system and associated monitoring to guarantee the bit-level correctness of the digital objects now and in the future.
Also the system to create identifiers for AIPs is not documented and is not ideal for the scalability of the repository.
CONCLUSION
As it was expected a lot of requirements are not fulfilled. But the value of this audit is high because it detected very underdeveloped areas inside the project and as such raises the awareness of this problems. Twelve of them can be easily fixed by just documenting what is currently implemented.
Of high importance is to fix the lack of the essential definitions about content information and designated community.
A large field to work on are the regular maintenance and observation tasks of the management and preservation planning. They have to be defined, documented, executed and reviewed.
On the technical side the completely missing representation information is a no-go for a longterm repository. If this information is collected in the near future and maintained thereafter the real risk of loosing understandability is still low. But there is no excuse to wait.
ORGANIZATIONAL INFRASTRUCTURE
With this chapter the catalog of requirements starts. Every requirement is explained in the CCSDS 652.0-M-1 document, this explanation can be reached directly by clicking on the heading of the requirement.
GOVERNANCE AND ORGANIZATIONAL VIABILITY
The repository shall have a mission statement that reflects a commitment to the preservation of, long term retention of, management of, and access to digital information.
Requirements fulfilled
Bylaws §2 of the Swiss Foundation Public Domain
The repository shall have a Preservation Strategic Plan that defines the approach the repository will take in the long-term support of its mission.
Essential requirements not fulfilled
The repository shall have an appropriate succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.
Essential requirements not fulfilled
There is no succession plan to address the case the repository ceases to operate or the governing or funding institution substantially changes its scope.
Escrow arrangements are not needed because of the consequent use of free and open source software.
The repository shall monitor its organizational environment to determine when to execute its succession plan, contingency plans, and/or escrow arrangements.
Minor requirements are not fulfilled
Financial monitoring is done every year in retrospect to fulfill the accounting requirements for a charity foundation in Switzerland. This includes an external audit by certified layers and is supervised by the Eidgenössische Stiftungsaufsicht.
Monitoring the organizational environment is not in place and fiscal planning is underdeveloped.
The repository shall have a Collection Policy or other document that specifies the type of information it will preserve, retain, manage, and provide access to.
Requirements fulfilled
Bylaws §2 of the Swiss Foundation Public Domain
ORGANIZATIONAL STRUCTURE AND STAFFING
The repository shall have identified and established the duties that it needs to perform and shall have appointed staff with adequate skills and experience to fulfill these duties.
Essential requirements not fulfilled
The repository shall have identified and established the duties that it needs to perform.
Essential requirements not fulfilled
The repository shall have the appropriate number of staff to support all functions and services.
Essential requirements not fulfilled
The repository shall have in place an active professional development program that provides staff with skills and expertise development opportunities.
Essential requirements not fulfilled
PROCEDURAL ACCOUNTABILITY AND PRESERVATION POLICY FRAMEWORK
The repository shall have defined its Designated Community and associated knowledge base(s) and shall have these definitions appropriately accessible.
Essential requirements not fulfilled
The repository shall have Preservation Policies in place to ensure its Preservation Strategic Plan will be met.
Essential requirements not fulfilled
The repository shall have mechanisms for review, update, and ongoing development of its Preservation Policies as the repository grows and as technology and community practice evolve.
Essential requirements not fulfilled
The repository shall have a documented history of the changes to its operations,
Essential requirements not fulfilled
The repository shall commit to transparency and accountability in all actions supporting the operation and management of the repository that affect the preservation of digital content over time.
Minor requirements are not fulfilled
Reports of financial and technical audits:
Publications of the Foundation do not include yet financial reports because the foundation is quite new and the first report covering 2014 is just finished. But it is planned to publish them.
The first technical audit is the one that is presented in this document. It is published publicly on this page: Internal Audit (CCSDS_652.0-M-1)
Disclosure of governance documents:
As can be seen from other requirements there are no governance documents yet, so there is nothing to be publicly available.
Contracts and agreements with providers of funding and critical services:
They are not publicly available yet.
The repository shall define, collect, track, and appropriately provide its information integrity measurements.
Essential requirements not fulfilled
The repository shall commit to a regular schedule of self-assessment and external certification.
Essential requirements not fulfilled
FINANCIAL SUSTAINABILITY
The repository shall have short- and long-term business planning processes in place to sustain the repository over time.
Essential requirements not fulfilled
The repository shall have financial practices and procedures which are transparent, compliant with relevant accounting standards and practices, and audited by third parties in accordance with territorial legal requirements.
Requirements fulfilled
The auditor has witnessed audited annual financial statements for the year 2014 and 2015. As shown above, these statements are not publicly available which should be changed.
The repository shall have an ongoing commitment to analyze and report on financial risk, benefit, investment, and expenditure (including assets, licenses, and liabilities).
Essential requirements not fulfilled
CONTRACTS, LICENSES, AND LIABILITIES
The repository shall have and maintain appropriate contracts or deposit agreements for digital materials that it manages, preserves, and/or to which it provides access.
Essential requirements not fulfilled
The repository shall have contracts or deposit agreements which specify and transfer all necessary preservation rights, and those rights transferred shall be documented.
Essential requirements not fulfilled
The repository shall have specified all appropriate aspects of acquisition, maintenance, access, and withdrawal in written agreements with depositors and other relevant parties.
Essential requirements not fulfilled
The repository shall have written policies that indicate when it accepts preservation responsibility for contents of each set of submitted data objects.
Essential requirements not fulfilled
The repository shall have policies in place to address liability and challenges to ownership/rights.
Requirements fulfilled
The repository shall track and manage intellectual property rights and restrictions on use of repository content as required by deposit agreement, contract, or license.
Requirements fulfilled
The goal of the Public Domain Project is to make accessible digitized audio content. This requires a thoroughly checking of the intellectual property rights of each work.
The result of this effort can be seen on every detail information page like this example (Gramophone-14678-b45142). Every work includes information about the copyright status in Switzerland, the European Union and the United States including the year when the work enters the public domain. With this information it is possible to track once a year which works entered the public domain (Relevant is only the year, so once a year is enough) and can be made accessible.
DIGITAL OBJECT MANAGEMENT
INGEST: ACQUISITION OF CONTENT
The repository shall identify the Content Information and the Information Properties that the repository will preserve.
Essential requirements not fulfilled
The repository shall have a procedure(s) for identifying those Information Properties that it will preserve.
Essential requirements not fulfilled
The repository shall have a record of the Content Information and the Information Properties that it will preserve.
Essential requirements not fulfilled
The repository shall clearly specify the information that needs to be associated with specific Content Information at the time of its deposit.
Minor requirements are not fulfilled
The wiki template Audio_fileshows all needed information. But there is limited documentation about it and how to handle it. The section on references is appropriate but the problems start with date fields because there is no date format specified. Also problematic is, that there is no information about the vocabulary to use, should it be a controlled vocabulary (which one) is it free, are there different requirements for each field?
The repository shall have adequate specifications enabling recognition and parsing of the SIPs.
Essential requirements not fulfilled
The repository shall have mechanisms to appropriately verify the identity of the Producer of all materials.
Minor requirements are not fulfilled
To upload AIPs (Flac files) to the archival storage the FTP protocol is used with user authentication. The user name of the person who uploaded a certain file is visible on the file system of the archival storage as the UNIX owner of the file. This information is not visible and therefor not verifiable by the designated community.
The history of the associated PDI and the identity of its creator is publicly visible and can be verified by everyone. The PDI is stored and edited in wiki pages and MediaWiki requires a password protected user login to edit this information.
The repository shall have an ingest process which verifies each SIP for completeness and correctness.
Essential requirements not fulfilled
The repository shall obtain sufficient control over the Digital Objects to preserve them.
The repository shall provide the producer/depositor with appropriate responses at agreed points during the ingest processes.
Requirements fulfilled
With the depositors there are no agreed points where responses are necessary. But it is possible to get a lot of information about ingested objects by the category listing (each depositor has it's own category to list all his objects), the recent changes page of the wiki and other ways (eg. watchlists). Reports about the ingestion process are done usually annually for the financial supporters.
The repository shall have contemporaneous records of actions and administration processes that are relevant to content acquisition.
Essential requirements not fulfilled
INGEST: CREATION OF THE AIP
The repository shall have for each AIP or class of AIPs preserved by the repository an associated definition that is adequate for parsing the AIP and fit for long- term preservation needs.
Essential requirements not fulfilled
The repository shall be able to identify which definition applies to which AIP.
Essential requirements not fulfilled
The repository shall have a definition of each AIP that is adequate for long- term preservation, enabling the identification and parsing of all the required components within that AIP.
Essential requirements not fulfilled
The repository shall have a description of how AIPs are constructed from SIPs.
Essential requirements not fulfilled
The process is not documented but essentially the SIP is the Flac file that is uploaded to the storage server and forms together with the detailed description in the wiki the AIP. So the AIP consists of the wiki page and the linked Flac file.
The repository shall document the final disposition of all SIPs. In particular the following aspect must be checked.
The repository shall follow documented procedures if a SIP is not incorporated into an AIP or discarded and shall indicate why the SIP was not incorporated or discarded.
Minor requirements are not fulfilled
The repository shall have and use a convention that generates persistent, unique identifiers for all AIPs.
Essential requirements not fulfilled
Each AIP is identified by a string composed in the following way: <Label>-<Catalog number>-<Order number>
Example:
- Label: Homocord
- Catalog number: B 367
- Order number: M 17234
This results in the URL for the detailed information page: http://pool.publicdomainproject.org/index.php/Homocord-b367-m17234
And the according Flac file name is: homocord-b367-m17234.flac
The repository shall uniquely identify each AIP within the repository.
Requirements fulfilled
The repository shall have unique identifiers.
Requirements fulfilled
Given that there are no conflicting catalog/order numbers used by a label. This is unlikely but it could happen.
The repository shall assign and maintain persistent identifiers of the AIP and its components so as to be unique within the context of the repository.
Requirements fulfilled
The described naming scheme is unique in the context of 78rpm records which are the only informations that are currently preserved.
Documentation shall describe any processes used for changes to such identifiers.
Essential requirements not fulfilled
There is no documentation.
The repository shall be able to provide a complete list of all such identifiers and do spot checks for duplications.
Minor requirements are not fulfilled
A complete list of all used identifiers is accessible via the category listing Audio file licenses.
There is no automated way to check for duplication. In the wiki it should not be possible to generate duplicates because the page names would create a conflict. There can be only one page with a certain name because no hierarchy is in use. But on the storage server it would be possible to accidentally create duplicates because of the manual upload process and the hierarchical organization (Folder structure by genre/artist).
The system of identifiers shall be adequate to fit the repository's current and foreseeable future requirements such as numbers of objects.
Essential requirements not fulfilled
It is obvious that this naming scheme is dependent on the naming of the collected items and is tailored to released records. The result are several problems:
- It is unclear how to handle unreleased records (No order/catalog number)
- The archive is open for other recording formats like cylinders, open reel tape and even motion pictures where this naming scheme is not usable
- The naming scheme does not describe how to handle retouched versions (clean master) of the raw digitization (master) where both have to be searchable, distinguishable and accessible
The repository shall have a system of reliable linking/resolution services in order to find the uniquely identified object, regardless of its physical location.
Minor requirements are not fulfilled
The naming convention in use is suitable to meet this requirement if only shellac records are archived. Problematic is the missing documentation.
The repository shall have access to necessary tools and resources to provide authoritative Representation Information for all of the digital objects it contains. In particular the following aspects must be checked.
Essential requirements not fulfilled
The repository shall have tools or methods to identify the file type of all submitted Data Objects.
Requirements fulfilled
The Unix tool file and other more format specific tools are available on the servers.
The repository shall have tools or methods to determine what Representation Information is necessary to make each Data Object understandable to the Designated Community.
Essential requirements not fulfilled
No tools or methods in use.
The repository shall have access to the requisite Representation Information.
Requirements fulfilled
Due to the fact that the Public Domain Project only uses Free and Open Source Software (FOSS) access to all requisite Representation Information is guarantied.
The repository shall have tools or methods to ensure that the requisite Representation Information is persistently associated with the relevant Data Objects.
Essential requirements not fulfilled
This strong requirement is not fulfilled.
The repository shall have documented processes for acquiring Preservation Description Information (PDI) for its associated Content Information and acquire PDI in accordance with the documented processes. In particular the following aspects must be checked.
Essential requirements not fulfilled
The repository shall have documented processes for acquiring PDI.
Essential requirements not fulfilled
The repository shall execute its documented processes for acquiring PDI.
Essential requirements not fulfilled
The repository shall ensure that the PDI is persistently associated with the relevant Content Information.
Requirements fulfilled
At the moment the PDI is stored inside the MediaWiki and is permanently linked to the audio file (Which does not contain PDI). Both the file name of the content information and the wiki page use the same naming scheme so the association is obvious.
The repository shall ensure that the Content Information of the AIPs is understandable for their Designated Community at the time of creation of the AIP. In particular the following aspects must be checked.
Essential requirements not fulfilled
Repository shall have a documented process for testing understandability for their Designated Communities of the Content Information of the AIPs at their creation.
Essential requirements not fulfilled
The repository shall execute the testing process for each class of Content Information of the AIPs.
Essential requirements not fulfilled
The repository shall bring the Content Information of the AIP up to the required level of understandability if it fails the understandability testing.
Essential requirements not fulfilled
The repository shall verify each AIP for completeness and correctness at the point it is created.
Essential requirements not fulfilled
There is no verification process in use. Essentially the SIP is created by the same person that will ingest it and creates the AIP. For example there is no four-eyes principle in use.
The repository shall provide an independent mechanism for verifying the integrity of the repository collection/content.
Essential requirements not fulfilled
The repository shall have contemporaneous records of actions and administration processes that are relevant to AIP creation.
Minor requirements are not fulfilled
For the PDI there the records of actions are automatically captured. Every change on a wiki page is logged, the difference to the previous version can be inspected and the old version can be restored if needed. Here is an example how the version history looks like: Version history of Columbia-a3996-81215
But there is no such thing or other processes for the content information (the Flac files) to capture records of actions.
PRESERVATION PLANNING
The repository shall have documented preservation strategies relevant to its holdings.
Essential requirements not fulfilled
There are no documented preservation strategies.
The repository shall have mechanisms in place for monitoring its preservation environment.
Essential requirements not fulfilled
There are no formal mechanisms for monitoring the preservation environment. But the active people in the project are in regular contact with groups of the designated communities. This is done by attending conferences, assemblies, regular meetings of user groups. Also the recommendations on formats and media published by the associations of archives or libraries are observed in a informal way.
The repository shall have mechanisms in place for monitoring and notification when Representation Information is inadequate for the Designated Community to understand the data holdings.
Essential requirements not fulfilled
The repository shall have mechanisms to change its preservation plans as a result of its monitoring activities.
Essential requirements not fulfilled
This and the next requirement fail because there are no monitoring activities in place on which a reaction could be defined.
The repository shall have mechanisms for creating, identifying or gathering any extra Representation Information required.
Essential requirements not fulfilled
The repository shall provide evidence of the effectiveness of its preservation activities.
Essential requirements not fulfilled
AIP PRESERVATION
The repository shall have specifications for how the AIPs are stored down to the bit level.
Minor requirements are not fulfilled
All file formats used for AIPs or other relevant information are well documented open standards down to the bit level. The representation information is not available locally and it's not linked to the AIPs.
The repository shall preserve the Content Information of AIPs.
Essential requirements not fulfilled
No documented work flows.
The repository shall actively monitor the integrity of AIPs.
Essential requirements not fulfilled
The repository shall have contemporaneous records of actions and administration processes that are relevant to storage and preservation of the AIPs.
Essential requirements not fulfilled
The repository shall have procedures for all actions taken on AIPs.
Essential requirements not fulfilled
The repository shall be able to demonstrate that any actions taken on AIPs were compliant with the specification of those actions.
Essential requirements not fulfilled
INFORMATION MANAGEMENT
The repository shall specify minimum information requirements to enable the Designated Community to discover and identify material of interest.
Requirements fulfilled
All the descriptive information can be searched by the free text search function of the MediaWiki software. For example if someone is interested in instrumental music it can be found with the search term, according to the metadata attribute Vocal range with the value instrumental.
Search results for Vocal range instrumental
Another option to discover material of interest is by using the category system. Every work is added to several categories like genres, country of origin, creation year, recording formats, digitalization devices etc. The example recording above is in several categories, one is the recording label Decca Records. This information can be used to show all recordings of Decca Records in the public domain archive:
The repository shall capture or create minimum descriptive information and ensure that it is associated with the AIP.
Minor requirements are not fulfilled
The minimum information requirements are specified by the Audio file template in the wiki. This template acts as the input mask when the SIP is built. The template page also includes the documentation about the usage of this template.
Audio file template:
http://pool.publicdomainproject.org/index.php/Template:Audio_file
The template page also contains the available documentation how to use this template. There is no documentation about the vocabulary that should be used to fill the metadata information.
Responsibility for the procurement of metadata lies in the ingestion process where the SIP is assembled.
Capturing provenience and context information with help of this template is done manually and is done until all needed information is found. The minimum level is determined by the requirement that the public domain project is only allowed to publish works that are in the public domain (copyright free). So at least the information to decide on the legal status of the work must be present. This includes title, all authors and there living dates, first release date and publishing label. Technical metadata is also captured with this template like format of the analog recording, devices used for digitalization, catalog and stamper numbers and track length.
A finished SIP could look like this example: Decca-wa782-kwa5215
Additional to the descriptive information captured with the Audio file template the SIP gets also context information by categorization. The public domain project uses a polyhierarchical category tree that contains differentiation between genres, country of origin, creation year, recording formats, digitalization devices etc.
Missing is a documentation on the categorization process for an AIP.
As shown in The repository shall have and use a convention that generates persistent, unique identifiers for all AIPs. there is a naming scheme in use that provides persistent, unique identifiers for all AIPs and descriptive information as long as only shellac records are archived.
There is no detailed process work flow documentation.
There is no system and technical architecture documentation.
The repository shall maintain bi-directional linkage between each AIP and its descriptive information.
Minor requirements are not fulfilled
One direction is from the descriptive information to the AIP. This is achieved by a link on the wiki page withe the descriptive information to the Flac file. It is also achieved by the use of a unique, persistent identifier for each work. This identifier is used to name the wiki page containing the descriptive metadata and also the file name of the Flac file.
Example:
- Descriptive Metadata for Hmv-d1388-08247
- The associated AIP with the file name hmv-d1388-08247.flac
For the other direction the unique, persistent identifier is used to locate the descriptive metadata in the wiki. This can be done by directly entering the URL http://pool.publicdomainproject.org/index.php/ and the identifier at the end or by using the search function of the wiki.
Beneficial would be a URL (web link) to the descriptive information inside the Flac metadata tags.
For the physical records this bidirectional linking also holds as the context information (category) in the wiki contains the physical location of the record. In the opposite direction the catalog or stamper number can be used to find its descriptive information as well the number of the container to find the context information about the grouped items.
As with other requirements there is a lack of documentation about the process work flow and technical architecture.
The repository shall maintain the associations between its AIPs and their descriptive information over time.
Essential requirements not fulfilled
ACCESS MANAGEMENT
The repository shall comply with Access Policies.
Requirements fulfilled
The public domain project allows free unlimited access to its collection. So there is no need for user accounts or access management to use the available items.
This is documented for the designated communities on the landing page for the media pool and also on the multi language frequently asked questions (FAQ) page:
From Media pool main page:
Creative works of literature, science and art are subject to copyright law. Works in the public domain are those whose intellectual property rights have expired. With the help of volunteers, our team cleans, cataloged and digitized hundreds of gramophone records. After the clearing of copyrights, free works are available inside our media pool and Wikimedia Commons, compressed in Flac without any loss in quality (24-bit/192kHz).
And further down on the same page:
Permission: distributing, reproducing, streaming, sampling, remixing
From the FAQ:
Question: What I'm allowed to do with the music files?
Answer: There is no restriction. You can for ex. redistribute it, copy it, modify it, use it in your own productions, use it as background music and so on.
Unusual for other archives is the fact, that everybody can contribute to the project by supplying additional metadata, context information or by submitting SIPs to the archive. The project is and should be driven by volunteers as it is the base principle of this (and related) archives.
To be able to do so, a user needs to create a user account and a wiki administrator needs to give writing rights to this user. This is more complicated for a wiki than usual but it had to made that strict because of severe spamming problems. Without more wiki administrators the project is not able to maintain easy writing access and keeping the wiki free from spam.
The repository shall log and review all access management failures and anomalies.
Minor requirements are not fulfilled
Related to access management failures are two logging systems. The first are the logging features of the MediaWiki software. This logs can be accessed via the Special pages link in the wiki: Logs from the data pool wiki
The second logging system are the logs from the web server application (apache2) and a user front end (piwix) to create statistics and analyze this logs:
- Webserver logs for the media data
- Webserver logs for the error pages. Status code 404 are Page not found errors
Not fulfilled is the requirement that written notes should exist of of reviews undertaken or action taken as a result of reviews.
From the discussion in the CCSDS_652.0-M-1 document one important concern is such as valid users’ being denied access.
Due to the nature of the public domain project this requirement is fulfilled when the requirement The repository shall maintain the associations between its AIPs and their descriptive information over time. is met because if it is possible to access the AIP from the descriptive metadata it is possible for the designated communities to get the AIP too. But this requirement is not yet fulfilled.
The repository shall follow policies and procedures that enable the dissemination of digital objects that are traceable to the originals, with evidence supporting their authenticity.
Requirements fulfilled
From the discussion section of this requirement: This requirement is concerned only with the relation between DIPs and the AIPs from which they are derived; elsewhere the link between the originals SIPs and the AIPs is considered.
The public domain project delivers as DIP directly the Flac file from the archival storage without modification. The designated community is able to check the authenticity of this Flac file because there are CRC and hashes included in the Flac file to detect transmission errors.
Additionally it would be helpful for the designated community to include the hash values of each AIP in the metadata details web page.
The repository shall record and act upon problem reports about errors in data or responses from users.
Minor requirements are not fulfilled
The repository acts quickly on problem reports from members of the designated community or from internal staff. Most of the time problems are reported by e-mail and then forwarded to the responsible person.
There are no formal processes for problem reports and the reports from the last years are not archived in a central place where they can be reviewed or tracked if they are solved.
INFRASTRUCTURE AND SECURITY RISK MANAGEMENT
TECHNICAL INFRASTRUCTURE RISK MANAGEMENT
Essential requirements not fulfilled
The repository shall identify and manage the risks to its preservation operations and goals associated with system infrastructure.
Essential requirements not fulfilled
The repository shall employ technology watches or other technology monitoring notification systems.
Essential requirements not fulfilled
The repository shall have hardware technologies appropriate to the services it provides to its designated communities.
Minor requirements are not fulfilled
Maintenance of up-to-date Designated Community technology, expectations, and use profiles; provision of bandwidth adequate to support ingest and use demands; systematic elicitation of feedback regarding hardware and service adequacy; maintenance of a current hardware inventory.
The server hardware for hosting the ingest, search and delivery services where upgraded in spring 2015. There performance is very good compared to the current workload. They are ready to handle many more users.
The archival storage system is still fast enough for the current demands and has also still enough storage capacity for the next time. In the archival storage system there are 50% spare slots for additional hard drives.
The Internet connectivity is a symmetrical 1 Gbit/s connection without traffic limitation. For the current user numbers this is enough to achieve short download times.
This requirement is not fulfilled because there exists no current hardware inventory and there is no procedure or system to ask for and receive feedback from the designated communities.
The repository shall have procedures in place to monitor and receive notifications when hardware technology changes are needed.
Essential requirements not fulfilled
No written procedures but monitoring systems in place to observe server workloads, memory usage, network traffic and free archival storage space.
The repository shall have procedures in place to evaluate when changes are needed to current hardware.
Essential requirements not fulfilled
The repository shall have procedures, commitment and funding to replace hardware when evaluation indicates the need to do so.
Essential requirements not fulfilled
The repository shall have software technologies appropriate to the services it provides to its designated communities.
Minor requirements are not fulfilled
The examples of ways the repository can demonstrate it is meeting this requirement clearly shows that several requirements have to be met:
Maintenance of up-to-date Designated Community technology, expectations, and use profiles
At the time of writing the expectations of the designated community is moving towards mobile use on smart phones and tablets. The MediaWiki software is not very well suited yet for this devices. This is a known problem and the MediaWiki community is working on this. On desktop and laptop computers the project gives access in a usefull way. The finding aids could be improved for the global community.
The on-line radio streams and the page to access this streams seems to fulfill the needs.
Provision of software systems adequate to support ingest and use demands
Software support for ingest is weak.
Systematic elicitation of feedback regarding software and service adequacy
There is no systematic elicitation of feedback about software topics.
Maintenance of a current software inventory
Software inventory is managed via the package manager apt used in Debian GNU/Linux. This covers most of the used software like operating system, common services, web server, on-line radio software etc. MediaWiki and the statistics tool piwix are maintained separately. To help the system administrator MediaWiki provides a inventory of installed extensions and version numbers of them together with the version number of the dependencies: Version number of MediaWiki and its extensions
The repository shall have procedures in place to monitor and receive notifications when software changes are needed.
Essential requirements not fulfilled
The repository shall have procedures in place to evaluate when changes are needed to current software.
Essential requirements not fulfilled
The repository shall have procedures, commitment, and funding to replace software when evaluation indicates the need to do so.
Essential requirements not fulfilled
The repository shall have adequate hardware and software support for backup functionality sufficient for preserving the repository content and tracking repository functions.
Essential requirements not fulfilled
The repository shall have effective mechanisms to detect bit corruption or loss.
Essential requirements not fulfilled
The repository shall record and report to its administration all incidents of data corruption or loss, and steps shall be taken to repair/replace corrupt or lost data.
Essential requirements not fulfilled
The repository shall have a process to record and react to the availability of new security updates based on a risk-benefit assessment.
Requirements fulfilled
To be informed which security updates are available the public domain project is subscribed to this two mailing lists:
Log files from the package manager are available on the server to check what software and patches was installed.
The risk-benefit analysis is done by the Debian security team. This volunteers monitor the newly discovered security problems in the Debian stable systems (GNU/Linux operating system and important application software). They prepare, test and provide patches against this problems and try to make sure that the expected behavior does not change.
For the MediaWiki software the risk-benefit analysis is done by the system administrator. But normally MediaWiki security patches are well tested and if there is any side effect it is documented by the developers: Archive of MediaWiki-announce mails
The repository shall have defined processes for storage media and/or hardware change (e.g., refreshing, migration).
Essential requirements not fulfilled
The repository shall have identified and documented critical processes that affect its ability to comply with its mandatory responsibilities.
Essential requirements not fulfilled
The repository shall have a documented change management process that identifies changes to critical processes that potentially affect the repository's ability to comply with its mandatory responsibilities.
Essential requirements not fulfilled
The repository shall have a process for testing and evaluating the effect of changes to the repository's critical processes.
Essential requirements not fulfilled
The repository shall manage the number and location of copies of all digital objects.
Essential requirements not fulfilled
The repository shall have mechanisms in place to ensure any/multiple copies of digital objects are synchronized.
Essential requirements not fulfilled
SECURITY RISK MANAGEMENT
The repository shall maintain a systematic analysis of security risk factors associated with data, systems, personnel, and physical plant.
Essential requirements not fulfilled
The repository shall have implemented controls to adequately address each of the defined security risks.
Essential requirements not fulfilled
Essential requirements not fulfilled
The repository shall have suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an offsite copy of the recovery plan(s).
Essential requirements not fulfilled