Remository File Storage

Database Storage

The files in the repository managed by Remository can be stored in the database. When that happens, a file is split up into approximately 64 KB chunks for manageability, and stored as a series of “blobs” in the xxx_downloads_blob table.

Database storage is the recommended option, and is the default when Remository is first installed. It provides excellent security. Even if a file did have malicious content, it is incapable of doing anything while stored in a database table. (Obviously you still have to guard against serving malicious content to users of your web site).

Storing files in the database is secure and flexible, as it does not depend on quirks of hosting. Moving a site can be achieved entirely by moving the database tables and installing Remository.

When a user uploads a new file and automatic approval is not allowed, the new file is always stored in the database, irrespective of its final destination. That way, it cannot be used for an attack on your site. Only after the file is approved is it placed in the destination indicated by the file's container.

Access Controls

Each container in Remository has provision for setting groups that can do various things. The four options are upload, download, edit and auto-approve. You can enter as many groups as you wish. It is possible for Remository to manage the groups, but since Joomla introduced more flexible groups, it is normally more sensible to use the CMS groups. This can be selected in the “Options” for Remository. There is a pseudo-group called “Nobody” which never has any members and disables the relevant facility. It is important to use “Nobody” to block an operation - having no groups selected makes the operation available to everyone.

The first three settings are self explanatory. Auto approve is a little more complex. What it means is that when a user uploads a file, if the user is a member of one of the groups set for auto-approve, then the file will be immediately published and available for download. If the user is not a member of any of the auto-approve groups, the uploaded file is held for approval by an administrator through the Remository admin interface.

Storing files in the disk system

But it is possible to have the file store in the disk system. You can set this as a default in the “Options”. There is a default path to the file store which is shown in the control panel (front page) of the Remository administrator interface. When a new container is created, if you do not give it a specific absolute path, it will be set with the default path. You can choose a different path. It is important to do this carefully, both to avoid conflicts with other activities on the system and to ensure security. The file repository should not be accessible by the web server. This can be achieved either by ensuring that it is outside the web root, or by web server configuration directives. Remository attempts to write a suitable .htaccess file, but this cannot be guaranteed.

Where files are stored in the file system, Remository will move them if the container is updated with a different absolute path. If the container's absolute path is deleted altogether, Remository will move the files into the database. If an absolute path is stored for a container that previously used the database, Remository will move the files out of the database and into the file system.

A further choice can be made for files stored in the file system with the “Real With ID” selection in the “Options”. Each file in the repository is allocated a unique ID number by Remository. If “Real with ID” is set to yes, Remository will insert the file ID into the file name before the final extension. So a file with ID 65 and called my.example.file.pdf would then be stored as my.example.file.65.pdf. Users will not see this - when the file is downloaded or displayed, the number is removed. If the option is changed to “no”, Remository will remove the ID numbers from all the files.

Amazon S3 file storage

A new option is the ability to add more choices for storing the repository. The choice that is currently available is to use the Amazon S3 data storage system. Other similar storage systems could be implemented, including those by Google and Microsoft.

To use Amazon S3 you must create an account with Amazon for AWS services. For Remository, you need to set four fields in the “Options” under the tab “Cloud”. The first is a region identifier. For example, the region identifier for London is “eu-west-2”. The next two fields are the key and the secret which are supplied by Amazon when you set up AWS services. Finally, there is a default bucket.

To store data in Amazon S3, buckets are used. They are simply names. In fact, you can go further and use names that look like paths on a Unix-style disk system. So you could call your bucket “example-bucket” but specify that you want your Remository files to go into “example-bucket/myrepository”.

The Amazon naming system, like the normal disk system, does not prevent name clashes. Just as with the file system, you can guarantee to avoid them by specifying “Real with ID” as described above. If that option is turned on, then each file will be placed in its own “directory” that has a name based on the file ID.

Using Amazon S3 gives you a lot of flexibility over storage (although you have to pay for it!). It also means that when a user downloads a file from your repository, it is sent to them directly from Amazon. That will normally be faster than sending from your web server. Again, there is a cost, but it is small unless you have a very large quantity of downloads. The use of Amazon clearly reduces the load on your web server, which may be a helpful move. Amazon claims that files stored in S3 have a very low probability of being lost.

Documentation by Black Sheep Research

Table of Contents

Remository File Storage

Database Storage

Access Controls

Storing files in the disk system

Amazon S3 file storage