Extra

Production Configuration

When it comes to setting up your app in a production environment, there are a lot of configuration modifications that can be made to optimize the way it is hosted and served. Here’s some information on some things we’ve tested out...

Storing Uploaded Files on AWS Simple Storage Service (S3)

About S3

Amazon Simple Storage Service, or S3, is one of the many services available through Amazon Web Services. S3 can very basically be described as a limitless (for all practical purposes) external hard-drive in the cloud. It’s a simple system that is secure and also cheap. To read more about S3, check out the official documentation.

To use S3, you will need an AWS account, which is just an extension of a normal Amazon account. Here’s some information on how to get started with AWS.

Why Use S3 with Arches

By the time you are in a production environment, you will have configured Arches with a web server, such as Apache or nginx. While you need a web server to serve the app itself, there are two pieces of the app that can be separated from the web server and served independently. These are the ‘static’ files (the css, javascript, and logos that are used throughout the app) and the ‘media’ files (any user uploaded files, such as images or documents).

Many of the existing tutorials are concerned with serving both static and media files, because the more load you can take off of your web server the better. However, there are some advantages to S3 that do not have to do with giving your web server some relief:

  • S3 is cheap: As per the price chart, it costs just $.03 per gb/month. So a database with 10gb of photos will have a media storage cost of $3/month, plus a small amount per transaction ($0.004 per 10,000 GET requests, e.g.).
  • S3 is scalable: You only pay for the amount of data you have stored, and you have no real limit on how much you can store. This allows for an Arches deployment on a small server, either in-house or a small AWS EC2 or DigitalOcean instance to store hundreds of gigabytes of media–photos, audio, video, documents–without having to restructure to accommodate more data.

The main potential drawback is transfer speed to and from the S3 bucket, but that depends most on your what your default storage option is. If you look around, you’ll find some comparison articles like this one that may be helpful.

Note

You should be able to use S3 regardless of where your app is hosted, whether on an internal server, an AWS EC2 instance, a DigitalOcean droplet, etc.

Warning

We’ve found that by following the steps below, deleting an Information Resource will not automatically remove the file from your S3 bucket. You can manually delete files from the bucket for now, or the intrepid developer may check out the answer to this question on the Arches forum.

Steps to Follow

Having worked through a number of existing tutorials (mostly here, here, here, and here), we’ve distilled these steps to show how you can use S3 in conjunction with your Arches app. Before beginning, you will need to have set up and logged into your AWS account.

#1. Create some new user credentials for your Arches app

The first step is to create some new AWS credentials that your Arches app will use to access the S3 bucket

  1. Access the AWS Identity and Access Management (IAM) Console
  2. Create a new user (named something like “arches_media”), and download the new credentials. This will be a small .csv file that includes an Access Key ID and a Secret Key.
  3. Also, go to the new user’s properties, and record the User ARN

#2. Create a new bucket on S3

Next, you’ll need to create a new bucket and give it the appropriate settings.

  1. Create a bucket, named something like “your-app-media”

  2. In the new bucket properties, under Permissions, create a new bucket policy with the following text, inserting your own BUCKET-NAME, and the User ARN for your new user:

    {
        "Statement": [
            {
              "Sid":"PublicReadForGetBucketObjects",
              "Effect":"Allow",
              "Principal": {
                    "AWS": "*"
                 },
              "Action":["s3:GetObject"],
              "Resource":["arn:aws:s3:::BUCKET-NAME/*"
              ]
            },
            {
                "Action": "s3:*",
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::BUCKET-NAME",
                    "arn:aws:s3:::BUCKET-NAME/*"
                ],
                "Principal": {
                    "AWS": [
                        "USER-ARN"
                    ]
                }
            }
        ]
    }
    
  3. Also, make sure that the CORS configuration (click “Add CORS Configuration”) looks like this:

    <CORSConfiguration>
        <CORSRule>
            <AllowedOrigin>*</AllowedOrigin>
            <AllowedMethod>GET</AllowedMethod>
            <MaxAgeSeconds>3000</MaxAgeSeconds>
            <AllowedHeader>Authorization</AllowedHeader>
        </CORSRule>
    </CORSConfiguration>
    

#3. Update the Virtual Environment

In order to configure Arches to use your new bucket, you need to install a couple of extra Django modules in your virtual environment. These will augment Django’s flexibility in how it stores uploaded media.

  1. Once you have activated your virtual environment, use this command:

    (ENV) $: pip install boto django-storages
    

#4. Update settings.py

Finally, you need to tell your app to use these new modules, give it the necessary credentials, and tell it where to store (and find) the uploaded media. Open the your settings.py file...

  1. Find the line that defines the settings “INSTALLED_APPS” and add ‘storages’ to it. It should look like this:

    INSTALLED_APPS = INSTALLED_APPS + (PACKAGE_NAME,'storages',)
    
  2. Next, add the following lines, replacing the AWS settings values with information from earlier steps (remember the “credentials.csv” file you downloaded?):

    DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
    AWS_STORAGE_BUCKET_NAME = 'aws_bucket_name'
    AWS_ACCESS_KEY_ID = 'aws_access_key_id'
    AWS_SECRET_ACCESS_KEY = 'aws_secret_access_key'
    S3_URL = 'http://%s.s3.amazonaws.com/' % AWS_STORAGE_BUCKET_NAME
    MEDIA_URL = S3_URL
    
  3. Restart your web server.

You should be good to go! To test, create a new Information Resource in your installation and upload a file. Now go back to check out your S3 bucket through the AWS console. Your file should show up in a new folder called files within the bucket. If you are encountering issues, be sure to let us know on the forum.

Converting to .arches / .relations

From MS Excel Workbook

It’s fairly common for legacy data to be stored in an Excel workbook. Or, perhaps it will be easier for you to aggregate multiple existing data sources into a single workbook. Although MS Excel is not the friendliest program to work with, we’d like to provide the following steps to assist with this specific conversion process. (If you are using an MS Access or MySQL database, the query in the Using SQL section should be useful.)

Using SQL

One way to convert your data is to use tools already included in MS Excel. To show you the basic steps of this process, we’re going to take a simple spreadsheet–with columns for resource ID, resource name, and spatial coordinates–and turn it into a .arches file. When you are ready to add attributes and make more complex resources, you should be able to work from these building blocks. Be sure to read up on the .arches format in the Arches Documentation before continuing.

Note

Most of the action here takes place in MS Excel, which you already have, but the final step will be achieved with Notepad++–a lightweight but super powerful text editor. Download and install it now or whenever you get to the later steps. In fact, we highly recommend it for all of your other Arches work as well!

#1. Standardize Your Data

For this example, we’re using a very basic spreadsheet that looks like this:

id name nametype lat long
1 119 Sidney NAME_TYPE:1 31.75357 -93.08813
2 135 Sidney NAME_TYPE:1 31.75359 -93.08825

Later steps depend on these specific column names, so if you are following along, make sure that yours match (case-sensitive). Also, make sure that your original lat/long coordinates use the WGS84 datum. If you need a refresher on what NAME_TYPE:1 means, be sure to go check out all of the available documentation on Controlled Vocabularies.

#2. Run a SQL Query in MS Excel

Once you have your data in columns and headers that match the example above, save your workbook. Now you’re ready to to use a SQL query to convert it to the .arches format.

  1. Open MS Excel (2010), click on the Data tab, click From Other Sources, and choose From Microsoft Query.

  2. For Data Source, choose Excel Files and click OK.

  3. Navigate to the workbook with your data (even if you already have it open), select it, and click OK.

  4. Select the sheet that your table is in, and click > to add all of the columns from that sheet.

  5. Click Next three times, until you are given the option to “View data or edit query in Microsoft Query”. Select this option, and click OK. You should see, in the Microsoft Query window, a reproduction of your original table. Now let’s modify it.

  6. Click the SQL button, and you’ll be presented with the simple SQL query that is being used to create the currently visible table. Don’t remove it just yet. Paste the following text in below it:

    SELECT
        id AS RESOURCEID,
        'HERITAGE_RESOURCE.E18' AS RESOURCETYPE,
        'NAME.E41' AS ATTRIBUTENAME,
        name AS ATTRIBUTEVALUE,
        'group1' AS GROUPID
    FROM `your\path\here\workbook.xlsx`.`Sheet1$` `Sheet1$`
    UNION ALL
    SELECT
        id,
        'HERITAGE_RESOURCE.E18',
        'NAME_TYPE.E55',
        nametype,
        'group1'
    FROM `your\path\here\workbook.xlsx`.`Sheet1$` `Sheet1$`
    UNION ALL
    SELECT
        id,
        'HERITAGE_RESOURCE.E18',
        'SPATIAL_GEOMETRY_COORDINATES.E47',
        'POINT (' & long & ' ' & lat &')',
        'group2'
    FROM `your\path\here\workbook.xlsx`.`Sheet1$` `Sheet1$`
    ORDER BY RESOURCEID, GROUPID;
    
  7. As you can see, you need to enter the path to your own workbook and reference the correct sheet. To do this, copy the original FROM statement, and use it to replace the three new FROM statements. Now delete the entire original query, so that all you have is the modified code from above.

  8. Click OK. If you get errors, troubleshoot by double-checking the following:
    • Are your field names exactly the same as in the example table above?
    • Did you update the path to your own workbook and reference the correct sheet?
    • Did you delete the entire original SQL query?

If you have followed all the steps correctly, you should be greeted by a table that looks like this:

RESOURCEID RESOURCETYPE ATTRIBUTENAME ATTRIBUTEVALUE GROUPID
1 HERITAGE_RESOURCE.E18 NAME.E41 119 Sidney group1
1 HERITAGE_RESOURCE.E18 NAME_TYPE.E55 NAME_TYPE:1 group1
1 HERITAGE_RESOURCE.E18 SPATIAL_COORDINATES_GEOMETRY.E47 POINT (-93.08813 31.75357) group2
2 HERITAGE_RESOURCE.E18 NAME.E41 135 Sidney group1
2 HERITAGE_RESOURCE.E18 NAME_TYPE.E55 NAME_TYPE:1 group1
2 HERITAGE_RESOURCE.E18 SPATIAL_COORDINATES_GEOMETRY.E47 POINT (-93.08825 31.75359) group2

Look familiar? That is your original data, reformatted to match the .arches schema. As you can see, three rows have been made from each row in our original table. In the SQL query, each new row is defined by a separate SELECT statement. You may notice that a little bit of processing was done in the third SELECT statement to rewrite your lat/long coordinates–they are now formatted as WKT, or Well-known text.

#3. Export and Touch-up

Now that we’ve reformatted the table, we’re ready to export it and make the final changes.

  1. Click the RD button (return data), and choose a blank sheet in your workbook for the output. (Your data may now be styled as a table, which is fine but not necessary.)
  2. Go to File > Save As... and for “Save as type:” choose Text (Tab delimited) (*.txt).
  3. Click Save, and when you encounter warnings, choose Yes.
  4. For the next steps, we’re using Notepad++, as mentioned above. We need to convert the tabs to pipe characters (|). Open your newly created .txt file in Notepad++ and use ctrl + F to open the Find/Replace window.
  5. Go the the Replace tab, and enter “Find what:” = \t, and “Replace with:” = |. Make sure that you have checked the box for Extended Search Mode, and click Replace All. If everything worked correctly, click OK.
  6. For one final check, open the Encoding menu, and make sure that “Encode in UTF-8 without BOM” is selected (select it if it isn’t).
  7. Now go to File > Save As..., choose this type (at the top of the list): All types (*.*), and name your file <yourname>.arches. Click Save.

#4. Make a .relations File

You must have a .relations file to accompany your .arches file when you load it into Arches. However, the file need not contain any more than this single line:

RESOURCEID_FROM|RESOURCEID_TO|START_DATE|END_DATE|RELATION_TYPE|NOTES

If you’d like to define some resource-to-resource relationships between the resources in your .arches file, you would do so in the .relations file. You can learn more about this in the Arches Documentation.

To make a .relations file with Notepad++, just open a new blank document, paste in the line shown above, and save it as <yourname>.relations. Your .arches and .relations files must have the same name (e.g., buildings.arches and buildings.relations), and should always be kept together.

Congratulations! You’ve successfully changed a sheet from a standard MS Excel workbook into a .arches/.relations file, and you’re ready to upload it to Arches. You can find guidance on how to do so in the Loading Data section above.