Technology Readiness Tracker

The goal of this project is to create a tool that can be used for conducting research on the maturity of a technology development. This tool will provide ways to search for and analyze trends in technology to provide a better picture about the current state of a given technology allowing allowing a company to make better decision about whether to invest in this technology, or to pursue it as a product line.

Background:
The idea for this project originally came from a method of classifying technologies created by NASA called Technology Readiness Levels. These levels track the progression of a technology all the way from its initial conception, to it being actively and successfully used. This is useful because it provides an indication as to how mature a technology is, and if has developed to the point where it is reliable enough to be used in a project.

The original goal of this project was to automate the mapping of a given technology to its readiness level, and predict when it would advance to the next levels. Predicting this would have a multitude of uses, for example, it could give an indication of whether a newly developed technology would be worthwhile to license. Since then, the goals of our project have shifted to what is described below.

Data Sets
To meet our specifications, we plan to focus on US Patent Data, as it contains a large amount of information useful to understanding the state of a technology, and is easily accessible through various data sources.

As this project is largely data based, finding adequate sources is extremely important. When evaluating potential data sources, we looked at the following criteria:
 * 1) Accessibility: How easy is this data source to access? Do we need specialized tools? Can it be queried without downloading the entirety of the data set? Is it quick to query?
 * 2) Completeness: How complete is this data source? Is it up to date? Does it enough information to make it a worthwhile data source?
 * 3) Compatibility: How easy would this data source be to combine data from this source with other types of data?

Patent data
Based on the above analysis, we have decided to use the USPTO PAIR API for the majority of our data. It is the only data source that is feasible to use to get data on a large number of patents, and the speed at which it can return data is a huge plus. In addition to this, we will use the PatFT database to selectively supplement the data returned by the PAIR API for individual patents.

Additionally, we are using the Google Geocoding API, and the OpenStreetMap API as mapping data source for one of our visualizations.

Technology Choices

 * Python- a general purpose programming language that we picked to use as the main programing language for our project. We decided this because:
 * All of the team members in our project have experience with Python
 * Python has many great data processing libraries that will be useful for achieving our first requirement (Find trends in dataset).
 * Python code is quick to write, so we can focus on the meat of our project.


 * Web Application (HTML/CSS/Javascript)- Two of our major requirements are Graphing Data, and a User Interface. We decided to develop a web application in order to fulfill these.
 * Interactive visualizations are far easier to create as a web application, then as a desktop application.
 * Web applications are much easier for an end user to use- All they need to do is navigate to a specific page in a web browser.
 * There are many useful Javascript libraries for visualization and user interfaces

Backend
Our backend has three major layers: the communication layer, the data pipeline layer, and the data analysis and retrieval layer.

Communication layer

The communication layer’s main job is managing the flow of information between the front end and the backend. The process starts when the frontend opens a new websocket (a type of two-way communication channel) between itself and the backend, and sends the backend a search term. The communication layer then receives this request, and splits it up into several jobs which it then passes into the data pipeline layer. Some of these jobs are dependent upon the completion of previous jobs, so the communication layer is responsible for making sure that each job is run in the correct order, and paralyzing jobs whenever possible. For example, the one of the first jobs that is run retrieves the full list of patents for each query. After the completion of that job, two other jobs to retrieve the location of each of each of the patents, and to retrieve the full text of each patent are started in parallel. While each job is running, the communication layer sends the frontend any data that is generated, and any status updates or errors generated.

Data pipeline layer

The data pipeline layer is responsible for managing and storing the data in the backend. Each job started by the communication layer, and many parts of the data analysis/retrieval layer requires information from the data pipeline. For each of these requests, the data pipeline first checks to see if it has the data already. If it does, it returns the stored data, if not it passes the request down to the data analysis/retrieval layer, takes the response and stores it in the database, and then returns it to the requester. Having this central layer for data allows us to cache data between different queries or data analysis steps.

Data analysis and retrieval layer

Lastly, the data analysis/retrieval layer contains all of the code necessary to communicate with our data sources, and perform any data or manipulation functions.

Frontend
Our frontend has two major layers: the communication/data layer, and the visualization layer.

Communication/data layer

The communication/data layer is responsible for opening a new websocket for every search query, and receiving data from the backend. For each query, it keeps track of the current status as reported by the backend, and the data for each visualization type. Every time an update is received from the server, it is processed, and any affected visualizations are notified.

Visualization layer

The visualization layer consists of several self contained components-- one for each visualization we have. These receive their data from the communication/data layer and are responsible for drawing the visualization, or interacting with any data visualization libraries. While this architecture could be simpler (and it was much simpler when we first started our project), this separation allows us to create a much more interactive and efficient end product. Much of the complexities arise from the fact that retrieving data from the data sources can take a very long time, so we need be able to cache whatever data we can, and we need to be able to send multiple progress updates to the client throughout the data collection process. The end result is a better user experience as we can have visualizations that are quick to display and continually improve as more data is retrieved and process, and similar or identical queries can share data whenever possible.