Semantics vs Models

December 24th, 2008

The idea of a semantic web or just in general providing semantics is essentially to augment data with additional data providing more information about what the original data is about (and standardise this to enable communication/interfacing). This is supposed to enable algorithms to process the original data better and in some sense understand that data. Calling this meta data seems to be justified in as far as that the additional data is actually not directly used or useful to humans, but it is intended for the algorithms. Humans can go quite far with interpreting data without this additional information, as far as their knowledge, experience and intelligence actually allows them to.

However, it seems that this means the data provider also has to provide all useful semantic information and that simply seems to be impossible. The data provider can provide this information for the data in the original context, but the data itself may have a far wider use. This would be missed by any algorithm as it would not reinterpret the data in a new, unknown context, but only use the original interpretation(s). More generally it seems to suggest a purely extrinsic notion of understanding independent of the observer. While semantics information may be useful in a very narrowly defined, specific context, a single project or precisely defined subject area, in a wider context it seems hardly achievable or useful.

Instead, a system which builds models (of any sort, not just statistically, even if statistical models seem to be important, especially for human interpretation) based on the data available seems to enable reinterpretation of the data and enable using it in different contexts. The resulting models derived from the data may then still be augmented with semantics information to make them accessible. This would base semantics on actual data instead of trying to make the data fit a particular view. But this is not always necessary or needed, depending simply on the use of the resulting models. Of course a certain bias can still be present depending on the type of models used. It also makes semantics more intrinsic, depending on the observer and how it/she/he builds the model. Many different models may even be build to explore concepts and these models may not be easily translatable into each other, if at all. Of course the question now becomes how to (reasonably quickly) build such models.

This is not just related to the semantic web, but also, say, to interpreting geometric models in the sense of describing it in terms that are meaningful to someone wishing to process (create, edit, analyse, etc.) the model. A fixed description of a model’s design intent in this context via history, geometric constraints, regularities, etc. seems to be restricting in a similar way than providing rather universal semantics to data on the web and does not allow for reinterpretation and with that reuse.

Categories: General | No Comments

Indentifying Non-Smooth Boundary Curve Segments

December 24th, 2008

Let P be a closed, non-self-intersecting, piecewise linear curve given by the points P_l, which lie on a continuous surface. We seek, in a simplified sense here, a closed, non-self-intersecting piecewise linear curve Q given by the points Q_l, which still lie on the surface S (and Q is within a small distance from the surface) such that Q’s distance to P on the outside of the region bounded by P is at most \epsilon^+ and to the inside at most \epsilon^-. Q should be in some sense smoother or fairer than P. One approach to compute Q is to first identify non-smooth segments of P which can then be improved by some algorithm. Also see http://www.langbein.org/research/curves/smoothing/boundary-smoothing/.

In general a smooth or fair curve can be characterised by having a minimal number of segments with monotonic curvature (in our case geodesic curvature), i.e. minimise inflection points. Furthermore, the curve should overall have rather low curvature, i.e. minimise the curvature energy \int \kappa^2\;ds, as far as this can be achieved under the constraints (see Euler’s elastica where the curve is only constrained by its end-points).

For a piecewise-linear curve a discrete curvature can be defined via the turning angles at its vertices. The turning angle values, however, depend on the sampling density as they are effectively the integrals over the curvature (d\phi = \kappa ds). Using Taylor expansion of the curve the curvature at a vertex can be approximated by \hat{\kappa}_l = 2\phi_l / (\|P_l-P_{l-1}\|+\|P_{l+1}-P_l\|), which is optimal for three-point approximations in the linear terms and converges of fourth order for elastica if all edges have equal length  (Langer et al, 2005, http://tinyurl.com/a25thy).

Using the above approximation we can determine inflection points as sign changes and using a maximum (absolute) curvature limit can furthermore label curve vertices as non-smooth if they exceed the limit. However, this alone does not identify non-smoothness of a curve on a larger scale than the local sampling distance. Moreover, the sampling distance should be reasonably uniform, which may or may not be the case for P. If P is noisy it may also result in rather noisy local curvature estimates, which can over- and under-estimate the curvature behaviour on a larger scale.

One way of identifying non-smooth curve segments on a larger scale is to ask how far the curve can be smoothed within the tolerance zone set by \epsilon^+ and \epsilon^- in terms of how much \int \kappa^2\;ds can be minimised within the tolerance zone. An efficient way to do this without actually solving the optimisation problem may be to estimate how far \hat{\kappa}_l can be minimised by moving the three P_l within the tolerance zone. Obviously one can then always minimise this to 0 by arranging the points along a (very short) line segment, so further restrictions are needed. One can restrict to move only P_{l-1}, P_{l+1} along the curve to a maximum distance and find the minimal curvature to estimate the improvement possible by taking the difference between the minimum and the actual local estimate. Or move the points “orthogonal” to the curve in the tolerance zone again to find the best improvement. Or possibly find the (approximately) longest line segment through P_l lying inside the tolerance zone to set the scale along which P_{l-1},P_{l+1} can be moved along the curve to estimate the curvature improvement.

In general estimating the variation of curvature possible over a certain length range seems to indicate quite well how much the curve may be smoothed locally under the constraints and hence either label vertices with high potential for smoothing as non-smooth or even derive a smoothing priority for the vertices. See the figures below where the curve is plotted in black, the maximal improvement of the absolute curvature in magenta and curve vertices which can be improved by more than a certain amount are marked blue. Underneath the estimated vertex curvature is plotted in red, the maximal and minimal curvature achieved by moving the two adjacent points along the curve a certain maximal distance are plotted in green and the mean curvature is plotted in blue. The range the points could be moved is about four times as big in the first figure compared to the second figure.

Discrete Curvature Plot and Non-Smooth Points at Large Range.

Discrete Curvature Plot and Non-Smooth Points at Large Range.

Discrete Curvature Plot and Non-Smooth Points at Shorter Range

Discrete Curvature Plot and Non-Smooth Points at Shorter Range.

Categories: Curves | Tags: | No Comments

A Beginning…

November 22nd, 2008

This notebook or blog is intended to become a collection of notes and ideas based on my work. I still have to see how this actually develops before I can say more. At the moment it is an experiment to see how far publishing raw thoughts, ideas and results is at all useful and practicable. There are also other sites associated with my work relating to more complete results and ongoing projects. Some of these are similarly experimental and undefined while others aim more at making the final results freely accessible.

Ex Tenebris Scientia is my personal home page which provides an overview of my work and some other activities. Most of its contents related to final results and published work, rather than work in progress or initial ideas. It’s also focused on me, rather than on projects/work.

Astarte is a development site based on trac with various version control systems, continuous integration, etc. It is mainly aimed at developing, testing and releasing software. As most of the work I’m doing relates to algorithms and software, most of the ongoing projects and related information is likely to go on this site. The name is based on the name of a rather ambitious project of revising the way I (or maybe we) are using computers by devising a new programming centric framework for using a computer. I will, however, not discuss more details about this in the near future.

X=10Z is a wiki site and also hosts this blog. The wiki site has been created for documentation purposes, but what precisely this means I still have to define. One idea at the moment is to put material relating to the courses I’m teaching there to develop something like a textbook. Research results and related information may be documented there in a similar way. Associated with this sites are also blogs, like this one (currently the only one), to keep notes, etc. related to a person, project or topic.

Beyond that I’ll have to see how things are developing. Being busy with loads of things I do not expect to quickly provide a lot of content for any of this.

One final note, in particular for this blog: the content here may be messy, unorganised, incomplete or simply wrong. So do not hold me responsible for any of these comments… I’ll check the content elsewhere more carefully than here. But feel free to comment on anything…

Categories: General | No Comments