CltrAltDelicius's avatar
CltrAltDelicius
Regular Visitor
10 years ago
Status:
Accepted for Discussion

Removal of uploaded material of completed reviews

In a regulated environment the compliance with development processes need to be proved. An organization needs to be able to access all artifacts for about 10-15 years depending on the regulation requirements.

Therefore version control systems are used to guarantee the access for all versions of saved elements.

If there is the requirement to perform reviews prior to check-in of content, the review data like comments, status and review material must be available during the whole time. But the review material should already be stored in the version control system so that it's storage in the Collaborator cache is only needed as long as the review is not completed.

Especially Office documents that are converted to images use a lot of space on disk. It would help a lot to have a button like "Remove uploaded material of completed reviews that are completed since xxx date".

12 Comments

  • MrDubya's avatar
    MrDubya
    Occasional Contributor

    This used to be a feature in CodeCollaborator, can anyone clarify why this has been removed and if there is any other way to archive / purge uploaded files from old reviews?

  • rmcfatter's avatar
    rmcfatter
    New Contributor
    Spoiler
     

    That was removed in 7.0.7027 according to the release notes for that version:

     

    •   fixed --- Archive Content Cache should go away (Case 61901)

    (A contender for "most pointless release note" if ever there was one.)

     

    I can guess at a few reasons for this decision-- this function was achingly slow and kept the DB locked for a long time; disk space isn't as expensive as it used to be; the cache layout has changed and the archive function was no longer compatible; despite the UI's claims to the contrary there was no good way to restore "archived" content. (Good luck chasing those MD5 filenames through several layers of database pointers.) But these are only guesses.

  • MrDubya's avatar
    MrDubya
    Occasional Contributor

    Thanks for the info!  Now that Collaborator supports review of both code and documents, I think SmartBear needs to reconsider their decision to remove the archiving feature.  Supporting document reviews means an increase in uploaded file size, multiplied by the increase in the number of reviews conducted, eventually resulting in a very large file cache that will be a burden to maintain.

  • kcorbin's avatar
    kcorbin
    Occasional Visitor

    This is a feature we sorely miss as well.  We have years of old reviews that we don't care about but can't easily clean out.  Disk space may be cheap, but it's still unreasonable to never be able to control the growth.

  • francois_roux's avatar
    francois_roux
    Occasional Contributor

    Last year sometime I created a SQL script which would do just this, essentially go through reviews meeting a certain criteria (date/age/status etc) and move any physical files to a networked backup location.

    Removed this when I rebuilt the server with more storage and migrated to 64bit.

     

    I'll try to dig out the scripts I used, nothing too complicated, but did the trick pretty well.

     

    chances are however that this feature will be reinstated, especially, as you said, the review material is getting more and larger. I do think though, that certain types of review you may not want to keep the files hanging around as it would not make sense, so perhaps just archive certain types of files.

     

  • ssmorgan's avatar
    ssmorgan
    Senior Member

    The idea with being a 21 CFR Part 11 validated system is that Review content does not go away.  So from my point of view this is a bad idea.  However, there are opportunities for database segregation within a massive collaborator system that is used by many differing groups and many differing projects.

     

    For example: Project A has been canceled. Project C has entered end-of-life status.

    It this is appropriate to "Archive and Delete" all reviews associated with Project A and C.

     

    I would like to be able to have different groups on different DBs using the same Collaborator instance. So that I can allocate 100 Terabytes for Group X and 500 GB for Group S.

     

    Storage is becoming cheaper and cheaper each day.  When an FDA auditor comes to call, they time how long it takes to retrieve their request.  If my review in collaborator isn't available immediately, I have serious problems.

  • rmcfatter's avatar
    rmcfatter
    New Contributor
    In the purest sense, there are only two ways to do a review: "pre-merge", which allows for parallel reviews but makes no guarantee that the code ultimately committed is the same as what was reviewed, or "post-merge", where all the reviews and commits are necessarily serialized (only one review and commit at a time). In a post-merge world-- where every commit is absolutely traceable to a review-- it's the version control system, not Collaborator, that's the sensible authority for the source code itself. In that case, there's no real need for Collaborator to hold onto its own copy forever. Even Collaborator calls it a "content cache", not a "repository"; and it's not very efficient at that. It should be able to reach out to the VCS and retrieve whatever version of any file that it needs (if the file's no longer in your version control system, you've got bigger problems.) As you point out, projects get cancelled, reach end-of-life, and so on. What would be very nice is an "archive" function that actually works: You set up search terms matching the reviews you'd like to archive (including searches on custom fields), Collaborator previews the result set, and then generates an archive file (.zip, .tbz., whatever) containing all of the necessary database information, label (table of contents) information, plus all relevant files from the content cache. When you certify that the archive file has been safely copied off to long-term storage, you notify Collaborator of that fact and it then cleans up the database and cache appropriately. To actually be an "archive", it would also need to be able to restore an archive back into the active database like it was never gone. From a programming standpoint that's not a trivial effort, but it's certainly possible.
  • ssmorgan's avatar
    ssmorgan
    Senior Member

    While technically you are correct that it should be able to reach out to the VCS and retrieve the review versions; not all review documents are version controlled, and often necessarily temporary files like Static Analysis and Unit/Verification test reports. In clearcase streams can be deleted preventing/" or making it extremely difficult"  the retrieval of the review versions. So the approach SmartBear has implemented seems correct.

     

    I was trying to point out that the delete content store request as submitted here, is a somewhat miopic view of the larger content store issues.  I was suggesting a different perspective to the problem that I think has a wider appeal and is inherently a better solution.

  • I would like to see the ability to delete a single review as well (especially from the server content-cache).  Sometimes a review is created on sensitive source code and then we want to wipe all trace of the source code from the server.  I had to do this recently by going into the database to get the md5 value for the files to be deleted and then find them in the content-cache to manually delete them.  There should be an easier way for an administrator to do this.  Thanks.

  • John_D's avatar
    John_D
    Champion Level 0

    in 14.x you can achieve an individual review which would be used to record your long term audit support.  Agree that collab is not the repository of record, but it does have the review data so at a minimum a report of the review needs to be captured for posterity.