- Recommender Service Architecture
- Recommendation Algorithms
- Inference Engine
- Recommendations Combiner
- JIRA Issues
Recommender Service Architecture
Collaborative Filtering Recommendation Algorithm
Collaborative filtering techniques (e.g. (Goldberg et al. 1992), (Resnick et al. 1994), (Shardanand & Maes,1995), (Hill et al. 1995), etc.) predict user's affinity for items on the basis of the ratings that other users have made to these items in the past. Therefore, the steps taken to make recommendation in such systems consist in finding people with similar tastes to the user (or items with similar rating patterns as the one that the user has rated) by means of its past ratings; and by means of their ratings extrapolate the user future ratings. User information in a collaborative system consists of a vector of items and their associated ratings; finding similar users translates into finding similar vectors. The main advantages of collaborative techniques is that they are completely domain independent
- Cross-genre niches identification. Collaborative filtering has proven to be very effective at thinking out-of the box
- Domain independence. Domain knowledge is not needed (e.g. the same algorithm that rates movies can be used to recommend whatever)
- The quality of its results improves over time and implicit user feedback sufficient
The disadvantages of collaborative filtering:
Quality dependent on large historical data set, causing:
- Cold-start problems.
- New User. When a new user arrives at the system, there is no sufficient rating information to sketch user's preferences; and there might be also a lack of information about the user itself. Both situations must be tacked by the recommender system.
- New Item. Every time a new Research Object is created the recommender system must recommend this new item and make to any of the users of the system that might be interested on it. Unlike the case of the new user problem, the possibility of not having enough information about the Research Object is less probable, since we assume that the information about the Research Object is accessible following the Linked Data principles  . Nevertheless, we have a problem regarding the estimation of the Research Object reputation given by the user community.
- Gray sheep problem. This problem is related with the new user problem. Some users are mere observers in a social scenario; they don't rate items nor provide any means to extract their taste form their social interactions. Therefore, the system hasn't got enough information about them, such in the case of new users.
- The sparsity problem. The sparsity problem typically occurs in systems with large number of items in which there are plenty of items rated only by few users, and many users which rated only few. The set of items rated but just few users would unlikely be recommended, no matter how high its reputation might be. The recommender system should minimize as much as possible this specific situation.
With the inclusion of the collaborative filtering recommendation algorithm we generally address the (Discoverable-Req) but more importantly, the (Reputation-Req)
Keyword Content-Based Recommendation Algorithm
Content-based recommender systems (e.g. (Belkin and Croft, 1992), (Lang, 1995), (Schafer et al., 1999), etc. ) make use of information retrieval and filtering techniques. A content-based recommender tries to infer users future items of interest on the basis of the features of the objects that the users rated in the past. These object features are items of interest such as keywords that define the object, a summary of its content, etc.. Content-based techniques have similar advantages to collaborative filtering approaches (without the ability of detecting cross-genre niches), and they do not exhibit the new item problem. Nonetheless, they still rely in a large historical data set.
Content-based recommenders recommend items based upon:
- A description of the content of the item (i.e. ROs or resources)
- A profile of the user's interest
In the concrete case of the presented recommender system:
- The user's profile is embodied by a set of keywords that have been previously proposed by the user (and its assigned tags)
- The RO (or resource) description, being of great importance:
- The title and description (see the Content-Req)
- The tags that have been applied to the item by the user community (see the Reputation-Req)
For the matching we plan to use one of the best known Information Retrieval measures, the TF-IDF (Term Frecuency Inverse Document Frecuency)(Spärck, 1972) a statistical measure which is valid to evaluate how important a word is to a document in the context of a concrete corpus.
The advantages of content-based recommendation algorithms are:
-No new item problem!
-Solely ratings provided by the active user to build her own profile, no need for data on other users
The main disadvantage of content-based recommendation algorithms,
-The new user handling problem, as the system stills don't have a well-formed user's profile. Nevertheless, this technique doesn't rely on statistical information, just needs that the user provides a small set of keywords that represent
With the inclusion of the keyword content-based recommendation algorithm addresses (Discoverable-Req); and more specially the (Content-Req) and partially the (Cold-req)
The inference of new recommendations is made by means of the constrained spreading activation mechanism. Constrained activation techniques have been well studied in the Information Retrieval field. Initially defined by (Quillian, 1968) and (Collins and Loftus, 1975). Upon activation of a number of specific nodes, their activation is spread iteratively to adjacent nodes until some termination criterion is met. In the concrete case of the recommendation inference engine the activation equals to item recommendation with a given strength.
Following the approach presented in (Crestani, 1997) we adopt a constraint approach that introduces:
- Distance constraints.
- Path constraints
- Fan-out constraints
The inclusion of the constrained spreading activation technique we effortlessly introduce:
- Resource aggregation handling
- Research Object evolution handling
This knowledge-based techniques also brings the benefits of knowledge-based recommenders benefits:
- No cold start problems
- Can include features that are not present in the items (e.g. Research Object, resource, etc.)
The inclusion of the inference engine (or more precisely, the use of the constrained activation technique) is associated with the following drawbacks:
- Static behaviour. The propagation of new recommendation is constrained by the relations defined among concepts in the used ontologies. This kind of is not easily nor usually changed.
- Knowledge engineering required. The use of the constrained spreading activation mechanism assumes the pre-existence of a formally and explicitly defined model of the domain.
As for the Year 1 Demonstrator the first version of the Inference Engine will be implemented and used to practically depict how Inference Engine handles the recommendation of aggregated resources. The Recommender Service will be able to make recommendations of Research Objects (myExperiment packs) taking into account the recommendations given to its possible constituent resources (i.e. worlflows and files).
The recommendations inference engine addresses the requirements (Evolution-Req) (Repurposeable-Req) (Model-Req) (Cold-Req)
Recommender systems are inherently vertical and configured to provide recommendations in a single and specific domain. We need of means for tailoring specific recommendations in terms of each research community that in the future wishes to make use of the recommender system
We address this tailoring activity when we combine the recommendations obtained with different recommendation algorithms. The implementations of State of the Art hybrid recommendation systems (see (Burke, 2002) for a survey of such techniques) combination decision is usually:
Therefore we initially propose:
- An explicit declarative way of expressing such policies.
- A combiner that detects when these policies are applicable and enacts them.
The recommendations combiner module tackles the (Policy-Req) requirement.
API function overview
The Recommender Service provides a set of recommendations of scientific resources such as myExperiment files and workflows and research papers. The recipients of such recommendations must be myExperiment users, since the data used to create them is based on user's myExperiment profile and uploaded data. The interface is a REST API that basically can be used as follows:
- <PATH> the path where the Recommender Service is deployed
- <userID> the myExperiment id of the user.
- <itemType> the class of objects that we are interested in receiving recommendations.
- <max> the maximum number of recommendations that we want to retrieve. Recommendations are ordered by their strength, so when the max parameter is specified, the more relevant recommendations up to max are retrieved.
First the client retrieves service document:
The client parses the service document, extracts the URI template for the recommender service and assembles URI for the desired recommendations set:
The client can also assemble the URI for creating the desired recommendation context for a later use of the contextualized recommender:
After creating the recommendation context the client can request a recommendations obtained using the provided context.
@@describe link relations that are central to this API <filteredRecommendationSet>
- <recommendationsSet> The set of recommendations for the user identified as user (the integer that represents the user in myExperiment). Its cardinality may be restricted up to a number (max)
- <filteredRecommendationsSet> The set of recommendations for the user identified as userID of the item type itemType (i.e.workflows, files, users, packs). Its cardinality may be restricted up to a number (max)
- <recommendationContext> The recommendation context must be set up in case that the user may be interested in receiving recommendations based in a group of myExperiment resources or keywords. The recommendation context is composed by the set of resources (0..N resources defined by the resource query parameter), the set of keywords (0..N keywords defined by the keyword query parameter), and the URI of the user that is associated with the context (user query param)
- <contextualizedRecommendationsSet> The set of contextualized recommendations for the user identified as user (user query param) of items of a type (type query param)(i.e.workflows, files, users, packs). Its cardinality may be restricted up to a number (max query param)
The service description is obtained in response to an HTTP GET to a Recommender Service URI.
The Recommender Service responds to an HTTP GET with the results of a recommendationsSet, using the URI defined by expanding the template provided by the service description.
Resources and formats
A RecommendationsSet represents a set of recommendations for a given user
Ther recommendationContext resource contains the group or resources and keywords that are considered in the provisioning of contextualized recommendations for a given user.
The recommendationsSet that are not dependent of the user context are precalculated and cached.
The recommendations provided by the Recommender Service are based in publicly available data and its functionin is a read-only function. There are no privacy or unintended data modification risks.
- GET This operation initializes the Recommender Service and provides a the textual information of its state.
We will get the a message similar to this:
The Recommender Service has been initialized!
It is available at: http://localhost:8015/recommender
It took 58.487to initialize the recommender system
- of recommendations 6345
- of inferred recommendations 225
- of users> 6914
- of workflows> 1759
- of files> 856
- of users with at least one rating 88
- of users that have uploaded a workflow 278
- of users with at least one favourite workflow 74
- of favourited workflows (they may be repeated) 168
- of users with at least one favourite and rating 20
- of users that have received a recommendation 608
- of items that have been recommended 1046
- of recommendations by collaborative filtering algorithm 177
- of recommendations by content based algorithm 4369
- of recommendations by social network algorithm 1799
- of users that have uploaded a file 120
- of users with at least one favourite file 6
- of favourited files (they may be repeated) 7
- of file ratings 17
- of workflow ratings 132
- of tags 3325
- of users with at least one tag 285
- of the average tag per user 0.4809083
- of the average tag per user that has tags 11.666667
- of packs 370
The technical requirements regarding the recommender system are classified into different categories depending of their nature. We have classified them in the following groups of technical requirements:
Research Object Dimensions Requirements
These technical requirements are those that are closely related with the different dimension and properties of Research Objects. These properties and dimension, which are based on some of the R's described in Error! Reference source not (Bechhofer et al., 2010a) , and later extended on (Bechhofer et al., 2010b) , focus primarily on reuse; describing the ways in which information within a Research Object is, or might be, reused (and how that reuse might occur). Nonetheless, other concerns of great importance such as provenance, evolution, consistency, etc. are also covered. We remit the reader to deliverable D2.1 Workflow Lifecycle Management Initial Requirements, where the whole description of the dimensions is included. The distilled Research Object Dimensions Requirements are:
Discoverable Research Object requirement. The recommender system must provide the necessary mechanisms in order to discover proactively ROs (or resources) that might be of interest to the user. This activity should be performed considering the following properties:
- Safety. The recommender system should avoid recommendation flooding at all costs.
- Completeness. The recommender system should provide the wider as possible set of relevant Research Object. This property is particularly important since the recommender system will in some cases be used to provide an overview of the state of the art to new scientist.
Repurposeable ROs requirement. The recommender system must take into consideration the resources that compose a RO and not only ROs as a whole. Therefore, when making recommendations to a given user the system might suggest new ROs; or just resources that might a useful addition/alternative to the ones already aggregated by the ROs that the user is currently using or creating
Projected User Requirements
These requirements are projected almost unaltered from the users requirements identified in user stories. The difference with Research Object dimensions requirements is that they have no relation with the properties identified for Research Objects, but nonetheless, they must be taken into account (and perhaps might result in future properties of or functionalities around Research Objects). The distilled Projected User Requirements are:
Reputation requirement. The trust assigned the RO recommendations generated by the recommender system must take into account reputation. The measure of reputation must be multidimensional; the RO must not only rated as a whole but also its constituent resources must be also considered
Content-based requirement. The recommender system should provide content-based recommendations based on the way that search and retrieving of scientific content is already performed by researchers, allowing search in fields such as authors, abstract, keywords, publication dates, etc.
User feedback requirement. The recommender system must consider user feedback in order to improve its future recommendations
Policy-based recommendation requirement. The criterion for determining the relevance and suitability of an item (either RO or resource) is shared among a group of individuals (researchers of the same scientific field, researchers that belong to a concrete lab, etc.); but is not necessarily valid outside this community. The recommender system must able to provide recommendations tailored in terms of the concrete policies of different research communities
Activity Specific Requirements
Though not explicitly extracted from user stories, these requirements further restrict the characteristics of the target system in order to provide the expected functionality to the user. They cover specifics related with research concerns or issues that are well identified in the sate of the art of the activities that the recommender system will perform. These requirements have also been heavily influenced by our previous experience in the myExperiment project (meter referencia) were we got a clear picture about how researchers interact in a social e-science environment. The distilled Activity Specific Requirements are the following:
ROs model aware requirement. The recommender system must use techniques that exploit the formal RO models that will be developed in the context of Wf4Ever.
Cold Start Requirement. Many recommendation techniques rely in historical information, the addition of new elements that are neither reflected nor referenced in this background knowledge causes:
The new user problem
The new item problem (RO or resource)
The recommender system should minimize as much as possible these situations
Sparse ratings problem handling requirement. The sparsity problem typically occurs in systems with large number of items in which there are plenty of items rated only by few users, and many users which rated only few. The set of items rated but just few users would unlikely be recommended, no matter how high its reputation might be. The recommender system should minimize as much as possible this undesirable situation
Low-level requirements depict specific details about the final implementation of the system. As such, they should be transparent to the user, and they should neither restrict neither interfere with any of the user-related requirements. The main sources of low-level requirements are standard and technological compliance issues, and the description of work of Wf4Ever.
Linked data principles compliant requirement. Recommendations provided by the recommender system must be accessible using linked data principles (Bizer et al., 2009). Basically they are:
-Use URIs as names for entities
-Use HTTP URIs so that external agents (either software agents or people) can look up those names
-When an external agent looks up a URI, provide useful information, using the standards RDF and SPARQL
-Include links to other URIs, so that they can discover more things
(Bechhofer et al., 2010a) Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
(Bechhofer et al., 2010b) (Bechhofer Bechhofer, S., Ainsworth, J., Bhagat, J., Buchan, I., Couch, P., Cruickshank, D., Delderfield, M., Dunlop, I., Gamble, M., Goble, C., Michaelides, D., Missier, P., Owen, S., Newman, D., De Roure, D. and Sufi, S. (2010) Why Linked Data is Not Enough for Scientists. In: Sixth IEEE e--Science conference (e-Science 2010), December 2010, Brisbane, Australia.
(Belkin and Croft, 1992) Belkin, N. J. and Croft, W. B.: 1992, 'Information Filtering and Information Retrieval: Two Sides of the Same Coin?' Communications of the ACM 35(12), 29-38.
(Bizer et al., 2009) Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems, 5(3), 1-22. Igi Publ. Retrieved from http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/jswis.2009081901
(Collins and Loftus, 1975) A. Collins and E. Loftus. A spreading-activation theory of semantic processing. Psychological Review, 82(6):407--428, 1975.
(Crestani, 1997) Crestani, F. (1997). Application of Spreading Activation Techniques in Information Retrieval. Artificial Intelligence Review 11, 453-482. Available at: http://www.springerlink.com/index/G11T185158667418.pdf.
(Goldberg et al., 1992) Goldberg, D. Nichols, D., Oki, B. M., and Terry, D. Using collaborative filtering to weave an information tapestry. Commun. ACM 35, 12 (Dec.1992), 61---70.
(Hill et al., 1995) Hill, W., Stead, L., Rosenstein, M. and Furnas, G.: 1995, 'Recommending and evaluating choices in a virtual community of use'. In: CHI '95: Conference Proceedings on Human Factors in Computing Systems, Denver, CO,pp. 194-201.
(Lang, 1995) Lang, K. (1995) Newsweeder: Learning to filter news, In: Proceedings of the 12th International Conference on Machine Learning, Lake Tahoe, CA, pp. 331-339. Littlestone, N. and Warmuth, M.: 1994
(Quillian, 1968) Quillian M. (1968) Semantic memory. In M. Minsky, editor, Semantic Information Processing. MIT Press, 1968.
(Resnick et al., 1994) Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P. and Riedl, J.: 1994, 'GroupLens: An Open Architecture for Collaborative Filtering of Netnews'. In: Proceedings of the Conference on Computer Supported Cooperative Work, Chapel Hill, NC, pp. 175-186.
(Schafer et al., 1999) Schafer, J. B., Konstan, J. and Riedl, J.: 1999, 'Recommender Systems in E-Commerce'. In: EC '99: Proceedings of the First ACM Conference on Electronic Commerce, Denver, CO, pp. 158-166
(Shardanand and Maes, 1995) Shardanand, U. and Maes, P.: 1995, 'Social Information Filtering: Algorithms for Automating "Word of Mouth"'. In: CHI '95: Conference Proceedings on Human Factors in Computing Systems, Denver, CO, pp. 210-217.
(Spärck, K, 1972) A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1): 11--21. doi:10.1108/eb026526