Status: draft
2016-11-23

What Kind of Website?

The first decision is “What kind of website will this be?”, which can be subdivided into “What kind of website will this start out to be?” and “What kind of website will this become after it evolves?”

Description

From the point of view of security, these are not abstract or idealistic questions. They don’t ask anything about the quality or aesthetics of the website. Instead, these questions focus on several characteristics of the 1) information received from server to client; 2) information sent from client (user) to the server; and 3) the functionality available to the client (user).

Goals

My goal is to have an “authoring workbench” where I can create and update documents while they are in the middle of the authoring process. I also imagine (vaguely) a “reader’s/student’s workbench” where readers and students can interact with the content in a way that allows them to create new content that represents their insights, understanding, and significations.

As one role model and template, I am drawing on Agentmodels.org. They use Jekyll static webpage generation plus WebPPL interactive editor/interpreter. I imagine expanding on this template to add other editors/interpreters, other code

Alternatives

There are five alternatives (at least), order in increasing functionality and decreasing security:

  1. Simple HTML
  2. Static HTML
  3. Active Server Pages
  4. Interpreted Client Pages
  5. Executable Plug-in Client Pages

Spoiler: I chose Static HTML+Javascript as my model for content + authoring.

1. Simple HTML (most secure, least functional)

The simplest and perhaps most secure website has primative Hypertext Markup Langauge (HTML). The very first Worldwide Web pages were pure HTML. The only things you see in your browser are text, hyperlinks (blue, with underlines), and images downloaded from the web server. In this sort of page, the only information that goes from client (browser) to web server are the requests for new web pages (GET requests, with HTTP or HTTPS Universal Resource Locators (URLs)).

Simple web pages are authored in a text editor or something slightly fancier that highlights the HTML syntax. Everything that appears on the client (browser) screen has origins in this authoring process, with no other “fancy stuff” going on. (The original WWW “Mosaic” browser simply interpreted HTML code and rendered within scrolling windows on the user’s screen.)

Why is this most secure? Because there are few if any opportunities for things to “go wrong”, from a security point of view. By “go wrong”, I mean opportunities for threat agents to find vulnerabilities that are exploitable.

Simple HTML is not perfectly secure. Nothing is. A threat agent could compromise the web server by some back channel (e.g. an administrator login left with a default username/password) A threat agent could also get “in between” the web server and the browser client. This attack is called, generically, “Man in the Middle”. There is a variation of this where the threat agent compromises the browser client, and this is called “Man in the Browser”. These attack patterns might alter the HTML content, and might add malicious content (i.e. executable code that looks like simple hypertext links)

2. HTML Pages with Javascript

The next step up is to author HTML pages that are augmented with Javascript files. Every modern web browser has a Javascript interpeter, which is a sort of “virtual computing machine” that operates inside your browser. It can do a lot but not quite everything that any computer program that might run on your computer.

Whenever a program is sent from the web server to the browser to for the browser to interpret/execute, the security risks rise significantly. Why? Because any interpreter/execution machine (computer) is potentially susceptable to malicious exploitation. The more capable the interpreter/computer, the more intrinsically vulnerable you are. There are simple languages, e.g. compound boolean expressions “A and NOT B OR C” and also there are fairly simple languages, e.g. the sort of state machine that operates a soft drink dispensing machine: “if more than $1.00 deposited, AND choice selected THEN dispense one unit of the chosen beverage”. These simple languages and their interpreters/computers are not very exploitable.

But, if instead the language is a full computer language* then the possibilities for explotiation grow dramatically, and the difficulty of proving that there are no vulnerabilities also grow dramatically. As a consequence, you end up with an interpreter/computer that is beyond anyone’s power to secure with sufficient confidence.

*The fancy terms for this are "context-free grammar" and "Turing-complete Language", which means that the interpreter/computer is capable of Universal Computation.

The advantage of Javascript, compared to Java other full-privilege languages that are interpreted and executed on the client machine, is that Javascript is designed to be limited to execute within the Browser, rather than to have full access to the operating system (e.g. Windows, OS X, Linux) and the file system. It is not perfectly secure (see this for summary of Javascript vulnerabilities), but it is less insecure than Java and other full-privilege languages.

3. Static Web Pages with Javascript

Static web pages are almost like simple HTML web pages (with or without Javascript). The main difference is in the authoring process. Static web pages are authored using tools that produce a file (e.g. a “markdown” file) that is then interpreted to produce a simple HTML page (maybe with Javascript).

Static web pages provide easier authoring than Simple HTML, and almost as easy as Active Web Pages (see next section). The key difference is that with Active Web Pages, any user is a potential editor of a web page, while with a static web page only people who have authorization to update the “archive” are potential editors of web pages. This capability is not available through the rendered web page.

4. Active Web Pages

Active web pages are generated by the web server by interpreting programs. There are several varieties, including “Active Server Pages (ASP)” (a Microsoft language), PHP (an open source language, part of the LAMP stack, and others. Every web page that is displayed on in the client browser is generated programmatically by the server. This is both the strength and weakness of Active web pages. In the Wikipedia model of content and interaction, it allows all users to be potential editors and authors of web pages.

My previous experience plus the experience of colleages has influenced my decision.

I have previous experience with installing a MediaWiki website, the same engine as Wikipedia (it is free and open source). It took some doing to install, but once it was installed it worked nicely. The problem is on-going maintenance, and unknown-unknowns. Because of the nature of active web pages, and the requirement to take user input and interpret it (e.g. for database queries, at least), there is constant need to update all the software components – the language (PHP), its libraries, the database (MySQL), and many other components of MediaWiki. Though Wikipedia has done an admirable job staying ahead of hacker and other threat agents, it’s not a job for a “gentleman farmer”.

I have colleagues at Securitymetrics.org and others who have switched from a Wiki-type website to a static web page site, for security reasons and ease of maintenance.

Also, I have previous experience hosting a Wordpress blog website, which is another active web page platform. Wordpress, Drupal, and other active web page content management services have been frequent targets of attack (see here and here). The advantage of Wordpress, from an author’s point of view, is that I and other authors could log into a web site, edit web pages using a web-based editor, and publish the results.

This is the same authoring environment that Google provides (xxx.blogspot.com), but all of the hosting/serving is mangaged by Google. My previous blog was hosted by Google: Exploring Possibility Space. Presumably, Google have extra security controls that the typical Wordpress site (hosted or self-hosted) does not have to block or eliminate malicous scripts or codes.

5. Interpreted Client Pages (most functional, least secure – tie)

Moving up the capability/vulnerability ladder, we get to web pages that are completely generated on the client side by interpreting a program sent from the server. Examples include Java and Adobe Flash™. Eight or ten years ago, the best way to get “flashy” web page effects to write many components using Adobe Flash™, or even the whole web site. “Flashy” includes animations, sound effects, multimedia, etc.

The downside from a security point of view is that Adobe Flash™ and ActiveX™ have too much capability. It is just too hard to design a Universal Computing platform that do not also have exploitable vulnerabilities. It has nothing to do with vendor intention or effort. Some jobs are just intractable.

As it happens, many of the features that were once only possible with interpreted client pages are now available through HTML5 and CSS3, which are implemented through primative functions in the browser – much more secure.

There are some applications that still need to be run as full interpreted programs in the client (browser), but these are few and far between, and should only be permitted on an exception basis.

6. Executable Plug-in Pages (most functional, least secure – tie)

Similar to interpreted client pages, executable plug-ins like ActiveX™ (Microsoft), pose a security risk because the interpreter/computer provides functionality equivalent to any program run on your local computer. Here is a hint why ActiveX was such a significant nexus of vulnerabilities:

“[Activex was introduced in 1996, and…] if the browser encountered a page specifying an ActiveX control via an OBJECT tag, it would automatically download and install the control with little or no user intervention.” (emphasis added) Wikipedia

For more see: “What ActiveX Controls Are and Why They’re Dangerous

Conclusion

As hinted above, I decided on a content + interaction model that is 3. Static Web Pages + Javascript. It remains to be seen what capabilities I will want or need for “reader’s/student’s workbench”, but I am hoping (and hopeful) that they will be possible with this content+interaction ecosystem: Markdown + Liquid ⇒ HTML5 + CSS3 + Javascript.