And different people have written their own version in different languages, see at the end of the Norvig's page, including 3 different versions written in C# by Lorenzo Stoakes.
I thought it would interresting to see what kind of result we could get using C# and Dynamic Sugar.net.
My C# version is 69 lines of code including white line, the Norvig's version is 30 lines. A ratio of 2.3 which is actually pretty good, it tells me that using DynamicSugar.net allowed me to reduce the number of line.
Counting line is definitely not a very relevant indicator except that it give you a ball park.
So here it is
First the python code
import re, collections def words(text): return re.findall('[a-z]+', text.lower()) def train(features): model = collections.defaultdict(lambda: 1) for f in features: model[f] += 1 return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word): splits = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [a + b[1:] for a, b in splits if b] transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1] replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b] inserts = [a + c + b for a, b in splits for c in alphabet] return set(deletes + transposes + replaces + inserts) def known_edits2(word): return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words): return set(w for w in words if w in NWORDS) def correct(word): candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word] return max(candidates, key=NWORDS.get)
My C# code
class SpellCorrector { Dictionary<string, int> DWORDS; const string ALPHABET = "abcdefghijklmnopqrstuvwxyz"; public SpellCorrector() { DWORDS = this.Train(this.Words(System.IO.File.ReadAllText("big.txt"))); } List<string> Words(string text) { return Regex.Matches(text.ToLower(), "[a-z]+").ToList(); } Dictionary<string, int> Train(List<string> features) { var model = new Dictionary<string, int>(); foreach (var f in features) model[f] = model.Get(f, 0) + 1; return model; } class StringPair { public string a, b; } List<string> Edits1(string word) { var splits = DS.Range(word.Length+1).Map(i => new StringPair { a = word.Slice(0,i), b = word.Slice(i) } ); var deletes = splits.Map(s => s.a + s.b.Slice(1)); var transposes = splits.Map(s => s.b.Length > 1 ? s.a + s.b[1] + s.b[0] + s.b.Slice(2) : "**"); var replaces = new List<string>(); foreach (var s in splits) foreach (var c in ALPHABET) if(s.b.Length>0) replaces.Add(s.a + c + s.b.Slice(1, -1)); var inserts = new List<string>(); foreach (var s in splits) foreach (var c in ALPHABET) inserts.Add(s.a + c + s.b); return deletes.Add(transposes).Add(replaces).Add(inserts); } List<string> KnownEdits2(string word) { var l = new List<string>(); foreach(var e1 in this.Edits1(word)) foreach(var e2 in Edits1(e1)) if(this.DWORDS.ContainsKey(e2)) l.Add(e2); return l; } List<string> Known(List<string> words){ return words.Filter(w => this.DWORDS.ContainsKey(w)); } public string Correct(string word){ var candidateWords = this.Known(DS.List(word)); if(candidateWords.Count==0) candidateWords = this.Known(this.Edits1(word)); if(candidateWords.Count==0) candidateWords = this.KnownEdits2(word); if(candidateWords.Count==0) candidateWords = DS.List(word); return this.DWORDS.Max(candidateWords); } }
wow.. Great code and very readable. Thank You!
ReplyDeleteEmad, Yes the all point of http://www.DynamicSugar.net is to write shorter and very readable C# code.
ReplyDeleteHi
ReplyDeleteNot able to compile above C# code in .NET 4.5
Is there anything I'm missing here?
Install DynamicSugar via nuget
ReplyDelete(1) Menu Tools -> Library Package Manager -> Library Package Console
(2) Install-package DynamicSugar
(3) Paste the code in the file program.cs above the program class
(4) Add this 2 using
using System.Threading.Tasks;
using DynamicSugar;
it will compile.
To run it you need the file big.txt, from http://norvig.com/big.txt.
Hello,
ReplyDeleteFirst, thanks a lot for sharing your work, It's helping me a lot.
I'm trying to correct the word ab-1234, while the right word would be ab-12345.
I tried ALPHABET = "1234567890abcdefghijklmnopqrstuvwxyz-" and o f course, I put the word ab-12345 in the big.txt file, but it's not working.
Coul you give me a help? I'm a newbie.
Hello! Do you have any licensing restrictions I should know about if I want to include a derivative of this code in my academic project?
ReplyDeleteSpell Checker control can list the correct spelling
ReplyDeleteThis is a smart blog. I mean it. You have so much knowledge about this, and so much passion. You also know how to make people rally behind it, obviously from the responses.Well, I recommend a blog is my sentence correct for all students who wants to get best sentence corrector.
ReplyDeleteThank you a Lot!
ReplyDelete