PhD topic

Well I guess it’s time for another english post (finally) — with the intention to reach a broader audience and the hope for feedback on my ideas. In this post I want to show a first inside on my PhD topic, since it’s getting really „serious“ now (again finally!) and I hope for any comments at this early stage of my work.

As you might know, I work in the project „eduComponents“ at the department of computer science of the OvGU. The eduComponents are a collection of components for the open source CMS Plone aiming at enhancing Plone in that way, that Plone can be used as a learning management system. This approach has many advantages like everything’s document-based and you don’t have to care for things like user and rights management. I don’t want to elaborate on these points here, but if you have any questions or comments feel free to contact me or browse our project’s publications.

The eduComponents include components for class management (ECLecture), multiple-choice tests (ECQuiz), assignment submission (ECAssignmentBox) with automatic testing for program code (ECAutoAsssessmentBox and ECSpooler) and a component for peer review of programming assignments (ECReviewBox). These components have been developed since 2005 and are used for courses at our workgroup and other institutions around the world (components are released under the GPL). Credit for development and implementation mostly go to Mario and Michael (with help of numerous student assistants), my part here (and also my entry to the Plone world) was porting the components to Plone 3 (again with enormous help from Mario and students) and maintainance since winter term 2008/2009.

Well, this has nothing to do with my PhD work in the first place, but I thought this is a good „place“ to say a public thank you

OK, so the situation is as follows: Students who attend our classes mostly have to hand in a number of (programming) assignments in order to get admission to write the exam for the course at the end of term. They are given tasks like „Write a Haskell function that computes the Fibonacci number for a given input.“ (yes, that is not that difficult, but for the sake of this post this example will do). Maybe that student has not been to the lecture where the professor talked about how to computer the Fibonacci number or he has no clue how to write a function in Haskell. So what does he do? Right, he searches for „Haskell“ and „Fibonacci“ at Google. Google will return approx. 7.5 million hits for „Haskell“ and approx. 2.3 million hits for „Fibonacci“. Of course he could refine his query to „haskell tutorial“ (which still yields to 285,000 hits) but all this takes time and doesn’t take into account, that the student maybe already knows how to program SML, which is a functional programming language, just like Haskell. How could Google know?

So here comes my idea into play. I want to develop and implement a recommender engine, that can offer a number of URLs to pages that might help the student to solve the recent assignment. These URL list is a tailor-made list for an individual user, since the recommendation engine takes into account what the learner already knows and which assignments he has already solved respectively. The user again can rate the links („This was helpful / not helpful“) which invokes a re-computation of the link list (and the score is stored for future reference). Let me show you an image that helps to illustrate my idea (click to enlarge):

overview_eduSuggest_enThe core is „eduSuggest“ – the recommendation engine. As input it takes a query Qo(„I search for HASKELL and FIBONACCI“). This query is sent by the frontend, in my case there is a Plone content type that generates this query from the assignment. The second input is a FOAF file describing the user profile. Why FOAF? Well it seems very promising to be able to do what I need. It’s an RDF vocabulary which I can mix with other RDF vocabularies and it’s XML, so good for further processing. I guess there will be other posts about using FOAF to model learners here in the future, I have to spend some thought on that.

OK, now we have the input and the learner’s profile, eduSuggest is doing some preprocessing to the query (e.g. lookup in an ontology to find related terms, e.g. „SML“ is also a functional programming language like „Haskell“)  and sends the query  Qp to associated sources like del.icio.us, digg and other social bookmarking services as well as Wikipedia or even Google. Which resources eduSuggest queries is freely configurable, but the intention is to use resources which contain a collection of collectively tagged and rated links. The results from the resources will be again processed by eduSuggest. This time the processing includes „shaping“ the query results according to the learner’s profile and a lookup in eduSuggest’s database to yield the URL’s scores (if any user rated that link before) plus an additional query for URLs that might have been stored in the service by users themselves. Then a ranking algorithm is applied – I am thinking of utilizing Bayesian Networks here („Which URL is most likely to satisfy the user?“). An individual list with links, that is shaped to the learner’s profile and the recent assignment is then returned. The learner can then still rate the links („Hey that link you gave me was not helpful!“) which again yields to a re-computation of the link list (i.e. the probability scores in the Bayesian network are changed and the net itself will be recomputed returning an altered linked list).

Well that’s roughly the main idea. Phew, I think that was the longest blog post I have ever written.

Comments, questions, anything? Thinking I am crazy, stupid, smart, <insert appropriate term here>? Feel free to drop a comment!

image credits: Piled Higher & Deeper

4 Kommentare

  1. FOAF is great for modelling learners. But don’t say „it’s XML“ – it’s not! From my experience, you don’t even care about the specific RDF serialization (N3, Turtle, RDF/XML), but let your RDF library do this. Will you use dbpedia.org for learning concepts? And just in case you start implementing RDF stuff in python, try to find an object triple mapper (OTM), or implement one on your own. OTM saves so much trouble. Cheers!

  2. Hey thanks for your comment. OK, noted, FOAF is not XML (could yield into a nice unix-like acronym: FINX). Also thanks for pointing me to dbpedia, this looks very intersting! Hope you are proceeding with your PhD thesis!

  3. Well, to be honest, I have the feeling that this project is a little bit overdesigned for the targeted audience: a student has to learn, how to do research themselves, to grasp concepts and organize. A recommendation engine, which just gives back a plain list, might narrow your view and prevent exploratory learning. It is a different use case as for example Amazon recommendations, which are based on similiarity – learning is about recalling, exploration and applying.

    Some raw ideas:
    1. Add tags to an assignment which reflect: 1) assumed knowledge 2) focus of this assignment 3) further reading. This could also help the teacher to design relevant assignments to the course.
    2.) Build a dependency tree of concepts . E.g. Haskell -> Functional Programming -> Algorithms and offer multiple paths of learning. Some might want to start with a concrete Haskell programs; other want to start with a conceptual background article. Recommend lecture slides or other assignments which are relevant to the topics. I am thinking about extending faceted search with a temporal component.
    3.) Add an option for learners to publish their own ideas of extending an assignment, like adding questions or tasks: „how would Fibonacci look like in an imperative language?“, „what would you have to do to catch invalid input?“ ….
    4.) Model the „user knowledge“ in a realistic way. A student usually does not „know“ or „not know“ a certain unit (tag). Some things need repetiton or review.
    5.) Open question: how to model search queries that expand on the given information (tags) and do not narrow it down to (too) specific documents ?
    e.g.: SML, algebra, fibonacci, C -> what should be the desired search result ?

    JM2C

  4. Hi Rene,
    thanks a lot for your long comment, I really appreciate that you took some time to respond to my ideas. I perfectly agree with you when you state that „a student has to learn how to do research themselves, to grasp concepts and organize“ them. Nevertheless it can be cumbersome to learn these abilities and my approach just should give some assistance. Visually I think of some kind of „panic button“ you can push when you are lost, the system then offers you that individually generated and ranked link list. But I see I have to work on my argumentation 🙂
    What I also have to work on is the „workflow“ with input and output data of the different components (see figure). The assignments itself are tagged, though I am not sure if these will be just keywords (e.g. „haskell“, „fibonacci“, „recursive“ etc.) or something like „assumed_knowledge: recursive_programming“. Intuitively I’d rather go for the first option. The assumed knowledge can be obtained by looking up previous (in a timely manner) slides and assignments. In our system, the courses are designed rather linear, since they accompany a university lecture. Every class is kind of based on the ones that have been held before. Thus I can retrieve the „assumed prior knowledge“ as well as the „individual prior knowledge“ by looking up solved previous assignments (i.e. those assignments that have been accepted by the system). Another part is of the „individual prior knowledge“ is information about already passed courses. You are right, it’s not only about „knowing“ and „not knowing“, but at least I can hint some weighting factor here.
    I have to think about your idea with the dependency tree of concepts and its application. But another thing I want to do is including domain ontologies so that I can retrieve similarities and context of a concept – e.g. something like „haskell“ is a „functional programming language“ –> „SML“ is also a „functional programming language“ and so on.
    Your question #5 is another great question which I have to think about. I don’t have an answer yet.

    Thanks again for reading this long article and writing a long comment. It’s always nice to see who is actually reading this stuff, especially when it’s someone you know 😉
    Katrin

Kommentar verfassen