Category Archives: Code & Development

Efficient Fuzzy Matching at Word Level

Posted on 16 January 2015 by Andrew

I’ve just solved a tricky problem with what I think is quite an elegant solution, and thought it would be interesting to share it.

I’m building a system in which I have to process fault data. Sometimes this comes with a standard fault code (hallelujah!), but quite often it comes with the manufacturer’s own fault code and a description which may (or may not) be quite close to the description against one of the standard faults. If I can match the description up, I can treat the fault as standard.

The problem is that the description matching is not exact. Variations in punctuation are common, but the wording can also change so that, for example, “Evaporative emission system incorrect purge flow” in one system is “Evaporative emission control system incorrect purge flow” in another. To a human reader this is fine, but eliminates simplistic exact matching.

I spent some time Googling fuzzy matching, but most of the available literature focuses on character or even bit-level matching and looks both complex and compute-intensive. However finally I found the Jaccard similarity coefficient. This is designed for establishing the “similarity” between two objects with similar lists of attributes, and I had a “lights on” moment and realised I could apply a similar algorithm, but to the set of words used in the pair of descriptions.

The algorithm to calculate the coefficient for a given pair is actually very simple:

Convert Text1 to a list of words/tokens, excluding spaces and punctuation. In VB.NET the string.split() function does this very neatly and you can specify exactly what counts as punctuation or white space. For simplicity it’s a good idea to convert both strings to uppercase to eliminate capitalisation variations.
Convert Text2 to a list of tokens on the same basis.
For each token from Text1, see if it appears in the list of tokens from Text2. If so, increment a counter M
For each token from Text2, see if it appears in the list of tokens from Text1. If so, increment M
Calculate the coefficient as M / (total number of tokens from both lists)

This produces a very intuitive result: 1 if the token sets are an exact match, 0 if they are completely disjoint, and a linearly varying value between. The process does, however, ignore transpositions, so that “Fuel rail pressure low” equates to “Fuel rail low pressure”. In my context this matches what a human assessor would do.

Now I simply have to repeat steps 2-5 above for each standard error description, and pick the one which produces the highest coefficient. If the value is below about 80% I treat the string as “matched”, and I can quote the coefficient to give a feel for “how good” the match is.

Hopefully that’s useful.

Posted in Agile & Architecture, Code & Development | 1 Comment

Caught by The Law!

Posted on 18 July 2014 by Andrew

Don’t get too excited. Those of you hoping to see me carted off in manacles and an orange jumpsuit will be sadly disappointed…

No, the law to which I refer is Moore’s Law, which states effectively, if you need reminding, that computing power doubles roughly every eighteen months.

Recently I’ve been doing some work to model a system in which two sub-systems collaborate by exchanging a very large number of relatively fine-grained web services. (I know, I wouldn’t have designed it that way…) The two partners disagree about how the system will scale, so it fell to me to do some modelling of the behaviour. I decided to back my analysis up with a practical simulation.

Working in my preferred environment (VB.Net) it didn’t take long to knock up a web service simulating the server, and a client which could load it up with either synchronous or asynchronous calls on various threading and bundling models. To make the simulation more realistic I decided that the service should wait, with the processing thread under load, for a given period before returning, to simulate the back-end processing which will occur in reality. The implementation should be simple: note the time when the service starts processing, set up the return structures and data required by my simulation, check the time, and then if necessary sit in a continuous loop until the desired total time has elapsed.

It didn’t work! I couldn’t get the system to recognise the time taken by the internal processing I had done, which threw out the logic for the loop. Effectively the system was telling me this was taking zero time. The problem turned out to be that I had assumed all processing times should be measured in ms. 5ms is our estimate of the average internal processing time. 6ms is our estimate of the round trip time for the web services. It seemed reasonable to allow a few ms for the processing in my simulation. Wrong!

It turns out that VB.Net now measures time in Ticks, which are units of 100ns, or one tenth of a microsecond. So I rewrote the timing logic to use this timing granularity, but still couldn’t quite believe the results. My internal processing was completing in approximately 1 Tick, or roughly 10,000 times faster than I expected.

Part of this is down to the fact that my simulation doesn’t require access to external resources, such as a database, which the real system does. But much of the difference is down to Moore’s Law. The last time I did something similar was around 10 years ago, and my current laptop must at least 100 times faster.

The moral of the story: beware your assumptions – they may need a refresh!

Posted in Agile & Architecture, Code & Development, PCs/Laptops, Thoughts on the World | Leave a comment

Webkit, KitKat and Deadlocks!

Posted on 17 June 2014 by Andrew

I don’t know what provision Dante Alighieri made, but I’m hoping there’s a special corner of Hell reserved for paedophiles, mass murderers and so-called engineers from big software companies who think there might ever be a justification for breaking backwards compatibility. I suspect that over the past 10-15 years I have wasted more computing effort trying to keep things working which a big company has broken without providing an adequate replacement, than is due to any other single cause.

The latest centre of incompetence seems to be Google. Hot on the heels of my last moan on the same topic, I’ve just wasted some more effort because of a major Google c**k-up in Android 4.4.X, AKA KitKat. My new app, Stash-It!, includes a web browser based on the “Webkit” component widely used for that purpose across the Android, OSX and Linux worlds. On versions of Android up to 4.3, it works. However when I released it out into the wild I started getting complaints from users running KitKat that the browser had either frozen altogether, or was running unusably slowly.

It took a bit of effort to get a test platform running. In the end I went for a VM on my PC running the very useful Androidx86 distribution (as the Google SDK emulator is almost unusable even when it’s working), and after a bit of fiddling reproduced the problem. Sometimes web pages would load, sometimes they would just stop, with no code-level indication why.

After various fruitless attempts to fix the problem, I discovered (Google.com still has some uses) that this is a common problem. In their “wisdom” Google have replaced the browser component in KitKat with one which is a close relative of the Chrome browser, but seem to have done so without adequate testing or attention to compatibility. There are wide reports of deadlocks when applications attempt any logic during the process of loading a web page, with the application just sticking somewhere inside the web view code. That’s what was happening to me.

The fix eventually turned out to be relatively simple: Stash-It feeds back progress on the loading of a web page to the user. I have simply disabled this feedback when the app is running under KitKat, which is a slight reduction in functionality but a reasonable swap for getting the app working… However it’s cost a lot of time and aggro I could well have done without.

Can anyone arrange a plague of frogs and boils for Google, please?

Posted in Agile & Architecture, Android, Code & Development, Thoughts on the World | Leave a comment

My First Android App: Stash-It!

Posted on 10 April 2014 by Andrew

After a couple of months of busy early morning and late night programming, my first Android app has finally been released. Please meet Stash-It!

Stash-It! responds to an odd side-effect of the difference between the iOS and Android security models. On the iPad, there are a large number of applications which offer an “all in one” approach to managing a group of related content. These are a bit frustrating if you want to share files transparently and seamlessly between applications, but there are times when you want to manage a group of files securely, and then the iOS approach is great.

Android is the original way around. The more open file system and component model encourages the use of specialist applications which do one job well, but it can be a challenge to keep related files of different types together, and hide them if you don’t want private client files or the like turning up un-announced in your gallery of family photos!

Stash-It! tries to plug this gap, by providing an “all in one” private file manager, tabbed browser and downloader for Android. You can get all these functions independently in other apps, but Stash-It! is the only one which brings them together in one place. It’s the ideal place to keep content you want safe from prying eyes: financial and banking records, health research, client documents. I suspect a few will even use it for a porn stash, but that’s not its only use! 🙂

There are built in viewers for most common image and movie formats, plus PDF and web files, so you don’t have to move these outside the application to view them. However when you do need to use an external application Stash-It! has a full suite of import and export functions to move your files or open them with other applications.

It took a while to design the security model. Stash-It! encrypts the names of files so that they can’t be read, and won’t be visible to the tablet’s gallery and similar applications, but the content of your files is untouched, so there’s little risk of losing data. Hopefully this strikes a sensible balance between privacy and risk.

Even if you’re not too worried about privacy Stash-It! is a great place to collect material related to as particular project, with all your different file types and web research in one place. You can bookmark web links, but also positions in video files or PDF documents. Web pages can be saved intact for reference or offline reading. Again you can do a lot of these things in separate apps, but I believe Stash-It! is the first one to bring all these functions together where you might want them.

I’ve got a lot of ideas in the pipeline to improve it further, but its now time to test the market and see whether I’ve spotted a gap which needed plugging or not.

Take a look and let me know what you think!

Here’s the Google Play Page. You can also read the helpfile.

Posted in Agile & Architecture, Android, Apps, Code & Development, My Publications, Thoughts on the World | Leave a comment

Developing for Android

Posted on 13 February 2014 by Andrew

Regular readers will realise that I’ve been rather quiet recently. The reason is that over the last couple of weeks I’ve bitten the bullet and started seriously developing an “app” for Android. As always when I have a programming project in progress other uses of my “project” time tend to take very much a back seat, so apologies if you’ve been watching for photos or words of wisdom… 🙂

I don’t want to say too much about the application itself until I have something ready to put on the market place. Suffice to say I think I’ve spotted an odd gap in the market where the weaknesses of iOS force a number of good solutions to one problem of information management, whereas Android’s more flexible architecture ironically mean the problem goes unsolved. Watch this space.

I was initially a bit worried that the learning curve for Android development might be very steep, especially when I started working through the standard Java-based examples in the official Google development toolkit. Like all Java development that approach seems to require a vast amount of “scaffolding” code, which must be constructed with very little environmental help, to achieve very simple results. This didn’t look good.

Then, thankfully, I discovered Basic4android. This is a remarkable toolkit developed by a small group in Israel which allows the development of Android software using a powerful but very accessible language and IDE based on Visual Basic. Behind the scenes, this is compiled into standard Android Java code, so ongoing delivery of applications is standard, but the coding and design process is close to “pure VB”.

The development environment has all the features you could reasonably ask for, including code completion, syntax highlighting, background compilation and the like. Remote debugging extends to devices connected over the Internet as well as via cable or local networks, and has a cunning feature where you can “hot swap” the code behind a running application allowing a range of changes to a running test application without restarting it. These are very impressive abilities for a product from a relatively small company.

Just as with the original VB, Basic4android has a model which allows developers to supplement the platform capabilities with shareable components, libraries and code snippets, and a very active community has rapidly built a library of “donationware” which provides easy access to the majority of Android features. I’ve had to be a bit ingenious in a few cases, but even as a newbie on my first project I haven’t yet found a requirement which can’t be met with a few lines of code.

On a slightly more negative note, Basic4android doesn’t seem to provide a good solution to the problem of supporting multiple screen sizes and orientations, except by writing multiple hard-coded scripts for the various options. This problem has been solved for websites with the concept of the “responsive grid”, and it ought to be possible to arrange the UI of an Android app with similar logic (e.g. “arrange these two controls side by side with the label taking 75% of the width, unless the screen is narrower than X, in which case arrange them vertically”). If this can be done in Basic4android I haven’t yet worked out how.

Debugging on a physical device connected directly to the PC is very straightforward, but of course limited to the devices you own, and a bit clumsy if you fancy doing a spot of work when travelling. While the Android development kit includes an emulator for the PC, it runs so slowly as to be completely unusable, even on a high-spec machine like my AlienWare M17x. I may have discovered a better compromise, in Android-x86, a port of Android which runs happily in a VMWare virtual machine. Installation was easy, but there are a few foibles I haven’t yet conquered. Again, watch this space.

Overall my adventure into Android development is shaping up well. More news later.

Posted in Android, Code & Development, Galaxy Note, VMWare | Leave a comment

The Micro Four Thirds Lens Correction Project

Full Article ------ Permalink and Comments

Posted on 29 August 2012 by Andrew

Although most Micro Four Thirds (MFT) lenses are tiny, the cameras produce great JPG files with apparently little or no geometric distortion. They do this by applying corrections in camera, and the correction parameter data is also stored with the RAW file. Unfortunately this data is only useful if you can read it, and most RAW processors can’t.

Although there’s no obvious reason why not, Panasonic and Olympus have not published the specification for this data. That leaves those of us who want to use a RAW processor other than LightRoom or SilkyPix struggling to get decent results with our MFT images.

Building on some excellent work done by “Matze” (thinkfat.blogspot.co.uk/2009/02/dissecting-panasonic-rw2-files.html)and Raphael Rigo (syscall.eu/#pana) I decided to have a go at implementing a parser in my CAQuest plug-in for Bibble/AfterShotPro. However although getting the raw data is fairly straightforward I have discovered that the algorithm is more complex than we thought, and seems to vary from lens to lens.

I have therefore decided to open up the exercise to a “crowd-sourcing” model to try and get several eyes on the problem. As we uncover algorithms which work well for one lens or another I’ll publish them here, and also build them into CAQuest. Over time we may come to completely understand the complete MFT algorithm, and our work will then be done. Of course, if one of the MFT partners wants to help by publishing the algorithm, that would also be perfectly acceptable :).

The project pages are here: www.andrewj.com/mft/mftproject.asp, with a discussion hosted at the Corel AfterShotPro forum.

Read the full article

Posted in Code & Development, Micro Four Thirds, Photography | Leave a comment

Macs Are Really Easy? Ha!

Posted on 13 April 2012 by Andrew

There is a myth. The myth goes “Windows is complicated. Macs are really easy – they just work.”

Like most myths this may have started from an original truth, but is now a lie. I am it’s latest, but I suspect far from only, victim.

Let me explain. For over a year now I have been developing a plugin for the RAW developer Bibble and it’s recent successor, Corel AfterShot. These plugins are developed using c++ and the Nokia QT framework, which theoretically allows the same code and user interface design to compile and run on Windows, Linux and Mac.

As a died in the wool Windows developer, that’s where I started. There’s a QT add-in to Visual Studio, so with a bit of juggling I managed to get one of the examples to load into VS, build, and run using Bibble as the target executable, and I was off. I was on a fairly steep learning curve in respect of the programming model, but I had very few problems compiling and running things.

When it got to the stage that I had something to share with the Bibble community I published the Windows version, and another member of the community kindly cross- compiled for the other platforms. There was another learning curve to make sure my code compiled cleanly on the other platforms, but nothing too drastic. For over a year I sent code updates to Jonathan, and got compiled Linux and Mac libraries back.

Although Jonathan still provides a very helpful service, it became apparent that if I wanted to have full control over the application versions I support, and be able to verify my plugin’s portability, I needed the ability to compile and run each version myself. I wasn’t prepared to buy and carry extra hardware around, but maybe VM technology would work.

I started with Linux. I had a couple of false starts but quickly found a site which has pre-built VMs for most Linux distributions (http://www.trendsigma.net/vmware/), and homed in on Lubuntu – based on Ubuntu but with a quite Windows-like shell. I downloaded and installed AfterShot and QT Creator, loaded up a copy of my code, and clicked “build”. And it worked first time! Getting a completely slick solution took a bit more effort, but it works so well I don’t now even copy the Windows code, I just open the same directory from my Linux VM and run the Linux builds in place.

So far so good. Now for the Mac. What could go wrong?

Posted in Code & Development, Thoughts on the World, VMWare | Leave a comment

Mac OSX–A Third-Class OS?

Posted on 29 February 2012 by Andrew

A recent post on The Online Photographer (More Planned Obsolescence: Evil Lion) really chimed with me. Apple’s implacable opposition to virtualisation is a significant opportunity lost.

I’m a Windows user, spending much of the working week away from home. I get a vast amount of value from virtualisation. It allows me to carry just one PC with multiple “client specific” images, and enables me to keep running legacy software almost indefinitely. My main client uses the same technology to provide legacy support for essential software, which in long-cycle engineering businesses can easily be 20-30 years old, as physical assets in such businesses age many times more slowly than the computing equipment around them.

I also develop plugins for the Bibble RAW processor. The same code should work on Windows, Mac and Linux, but you have to compile and test on each platform to confirm this. I’ve recently added a Linux Virtual Machine to my kit. This was remarkably painless, just a few hours work, and I can now rapidly cross-compile and test my Windows-based developments under Linux. If there’s an issue which means having to support more than one flavour or version of Linux adding it would be trivial.

I just can’t do this for the Mac. I don’t want to buy and carry another laptop (which would be useless for any other purpose), and you can’t get virtualised OSX, either as a VM or as a service, through any legal and “safe” route. The result: as far as I am concerned OSX is a “third-class” OS, almost a “technical ghetto”, and I have to rely on the good offices of other developers to deliver my plugins for it.

People will put up with a lot in the name of love. Maybe Mac users “love” their computers enough to tolerate this behaviour. But looking in from outside I find Apple’s attitude perplexing and very annoying.

See http://theonlinephotographer.typepad.com/the_online_photographer/2012/02/more-planned-obsolescence-evil-lion.html

Posted in Code & Development, PCs/Laptops, Thoughts on the World, VMWare | 1 Comment

First Bibble Plugin Published

Full Article ------ Permalink and Comments

Posted on 11 December 2010 by Andrew

I’ve just published my first plugin for the popular image processing suite, Bibble. CAQuest manages chromatic aberration correction, so if you find yourself always having to apply correction for “purple fringes”, this is the tool you need.

To find out more, visit www.andrewj.com/plugins.

Read the full article

Posted in Code & Development, My Publications, Photography | 2 Comments

Integrating External Content with WordPress

Full Article ------ Permalink and Comments

Posted on 12 August 2010 by Andrew

I’ve been developing andrewj.com for about 15 years, and although I’m not that prolific I’ve built up quite a lot of content.

I recently converted my blog from an old bespoke (= “custom”, for my American friends) solution to one based on WordPress. However, this created a problem, in that the WordPress model is to hold all content in the database, and that wasn’t the right model for me.

Firstly, I have a number of articles which are very long for a blog post, and I had no interest in restructuring them. I also didn’t want to break external links to the existing articles.

Next, I decided that I wanted the freedom to continue to write in that style. Some of my writing takes several weeks, and it works for me to draft it as separate HTML pages. I also sometimes want to include active content or multiple images, and I don’t want to create a large and unwieldy WordPress database full of such stuff.

Finally, my online photo galleries are managed and generated using Jalbum, and I wanted to find a way of neatly integrating single images into my blog, complete with the watermarks and metadata extraction which Jalbum manages so well, without duplicating that functionality in WordPress.

This is probably typical of many older web sites, but WordPress doesn’t really embrace the integration of external content. This article describes how I solved this problem, and a WordPress plugin I have developed to make my solution reusable.

Read the full article

Posted in Code & Development, My Publications, Website & Blog | Leave a comment

In Damnation of PHP

Posted on 16 June 2010 by Andrew

<rant>Apologies if the title is a bit strong, but I think it’s the nearest I can get to the opposite of “In Praise of PHP”

I’ve just spent a week-end migrating my website to a new hosting server. As part of that process, I had to rewrite all my old ASP code using PHP. Here’s what I learned:

The Apache/Linux community have misleadingly changed the meaning of “ASP”. If you bought a Linux-based hosting service 5+ years ago with “ASP”, it meant a *nix port of Active Server Pages. That worked for me, as I could develop it on Windows. Now, if you buy a Linux hosting service with “ASP” it means “Apache Server Pages”, and the embedded language is Perl. Useless!
PHP has positively the worst combination of features for a language:
- A c-based language’s sensitivity to case, ending semicolons and curly bracket counts,
- None of the protections against errors in the latter that a C++/Java (or VB) language gives you, like strong typing and forced variable declaration,
- No single-step debugging. Now I accept that this may not be 100% true, so don’t all write in with the names of all the debuggers I didn’t find in a quick search for tools on Sunday morning, but certainly I don’t have one at the moment,
- It runs differently on Windows and Linux, and in a way I haven’t yet understood 100%, so I can only test by uploading to my live website.

That said, I’ve still got it! I’ve managed to convert my blog and my book reviews, and I’ve actually improved on my old code for the latter. Just please let me have VB.NET back for my next major project.

OK. </rant>

Posted in Code & Development, Thoughts on the World | Leave a comment

Using Volume Shadowing with Ntbackup Under Vista

Full Article ------ Permalink and Comments

Posted on 9 July 2007 by Andrew

The brain-dead backup function of Windows Vista is enormously annoying. There are known ways to get good old ntbackup working, but they have their limitations. Read this article about my attempts to get round some of those limitations.

Read the full article

Posted in Code & Development, Thoughts on the World | Leave a comment

Category Archives: Code & Development

Efficient Fuzzy Matching at Word Level

Caught by The Law!

Webkit, KitKat and Deadlocks!

My First Android App: Stash-It!

Developing for Android

The Micro Four Thirds Lens Correction Project

Macs Are Really Easy? Ha!

Mac OSX–A Third-Class OS?

First Bibble Plugin Published

Integrating External Content with WordPress

In Damnation of PHP

Using Volume Shadowing with Ntbackup Under Vista

Main Pages

Blog Contents

Recent Posts

Recent Comments

Blog Indexes

Search

Contact Me

Share

Feeds