A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

Inside Bitcoin: What is a Bitcoin?

Bitcoin is best described as an open source digital currency and decentralized/peer-to-peer payment system. Okay, but what does that mean? 

Let’s break it down:

  • Open Source: Every aspect of how Bitcoin works is public knowledge. The primary client software, called the QT wallet, is an open-source project hosted on GitHub, so any developer can review the source to look for bugs or code that introduces malicious behavior.
  • Digital Currency: The core purpose of Bitcoin is to serve as a currency, or form of money. Currency, in general terms, refers to a medium of exchange. That is, I give you something of value in exchange for something else of value. Both parties must recognize the value of each item being exchanged to be equal for the system to be valid.

    Bitcoin fulfills one half of that exchange, being a medium that holds value. It is digital currency because it exists only as numbers in cyberspace, protected by cryptographic information that only the owners of each Bitcoin are privy to (because of this, it is also commonly referred to as a cryptocurrency).  Ownership of Bitcoin can be transferred from one person to another.
  • Decentralized/Peer-to-Peer Payment System: Bitcoin is designed so that no one person or agency is in control of the system. It’s truly the people’s currency, where users that give Bitcoin its value also maintain the underlying system that allows value to be transferred between owners.

Value in Bitcoin is assigned by address. A person can own any number of addresses – they are very cheap to create at random, and the probability of your computer creating an address that collides with one owned by another person’s is practically 0%.

To spend a Bitcoin that you own, you transfer it to the recipient’s address. This is known as a transaction. A transaction has two parts: IN and OUT.

While it’s convenient to think of a Bitcoin address as being like a bank account, where all of the value is accumulated and stored, the reality is that the value is stored in transactions. You can only spend value that has been sent to you as the OUT part of a prior transaction, and you spend it by referencing that OUT as the IN of your new transaction. So, the balance of any Bitcoin address is the sum of all of the unspent transaction OUTs destined for that address (referred to as UTXO).

When you create a new transaction, using your Bitcoin-QT wallet or some other piece of client software, the transaction is broadcast to the entire network of nodes that make up the Bitcoin network. Eventually (usually within 10 minutes), that transaction is recorded into the permanent public ledger, and then everyone in the world knows that the Bitcoin that you spent has been transferred from your address to the recipient’s address (and they are then free to spend it themselves).

One important thing to note: Bitcoin is built upon the concept of distrust.

“What’s that!?” you say?

Since Bitcoin exists only on the Internet, and there is no one centralized authority to trust, then you must question and validate each piece of data that the system provides to ensure that it is real and not generated by a clever thief. This is why all of the details about how to verify data is public knowledge – as long as the majority of the users participating follow the same rules, then it becomes impossible for somebody to game the system by introducing data that does not conform to those rules.

As a result, every node that participates in the Bitcoin network must download and verify every transaction since the beginning of time (which was in 2009 for Bitcoin itself) to ensure that they are valid and have not been tampered with. For each transaction downloaded, the node must ensure that the IN part had not been previously spent within another transaction (known as a double spend). As more and more nodes confirm the validity of transactions, the confidence in the transaction increases and the risk of a double spend occurring decreases.

Note: Some things above have been generalized for the purpose of being a general overview. The goal of this blog series is to dive deeper into how Bitcoin works, which will clarify/expand upon some of the vague or technically incomplete parts listed here.

1KdGkhNSBPBp64HrqrNUWPYHoEjm3CXcfd

Inside Bitcoin: SHA256

SHA256 is a cryptographic hash function. Its purpose is to take in an arbitrary length of bytes (the “message”), perform number crunching, and then produce 32 bytes that uniquely represents the original message (the “hash value”, or “digest”).  A small change to the message, even toggling a single bit somewhere, results in a vastly different digest due to a property of cryptographic hash functions known as the “Avalanche Effect”. 

While it is easy to calculate the hash code for any given message, it is considered to be infeasible for somebody to generate a message that results in a particular hash code. This makes SHA256 a one-way function. Likewise, it is considered infeasible at the moment to find two different messages that have the same hash code (known as a “collision”). 

Note: This does not mean that collisions are impossible, though, because after all, the output is only 32 bytes for any length of input.  So, being fed enough bytes will eventually result in a repeated hash code.  Being infeasible in this case simply means that collisions are random events that cannot be used to create other collisions for other messages.

Because of these properties, hash functions like SHA256 are often used to verify that a message hasn’t been tampered with, or hasn’t become corrupted during transmission.  It is also convenient to store the hashes of passwords in a database instead of the passwords themselves, so that in the event of a security breech, the secrets are not revealed.

Calculating a SHA256 hash code consists of multiple rounds of bitwise operations: and, or, xor, shifts, and rotates. Because there are no conditional branching operations in the algorithm, SHA256 can be implemented entirely in hardware.  This is the basis behind FPGA and ASIC mining devices (a topic for a later blog post).

Try it yourself! Type a message to view the resulting SHA256 hash, and see how small changes to the input greatly impacts the results.  Also, try to find a message that results in one or more leading zeros.

SHA256 Sample

1KdGkhNSBPBp64HrqrNUWPYHoEjm3CXcfd

Namespacing in JavaScript

Consider this code:

var Configuration = function (model) { 
   this.name = ko.observable(model.Name); 
   this.description = ko.observable(model.Description); 
   this.enabled = ko.observable(model.Enabled || false); 
}; 

Because this was defined outside of the scope of a function, that “Configuration” variable gets promoted to the global object.  In web browsers, that means that it gets appended to the window object as a property (window.Configuration = function(model)…). 

That’s just the nature of global objects in JavaScript.  But, instead of creating a bunch of new custom properties on window, it’s common to wrap all of your app’s global objects in a namespace so that window only gets polluted by one new property (especially when creating a reusable library of objects).

var MyNS = MyNS || {};         

///        
/// Configuration        
///        
MyNS.Configuration = function (model) {            
   this.name = ko.observable(model.Name);            
   this.description = ko.observable(model.Description);            
   this.enabled = ko.observable(model.Enabled || false);        
};

Now, in reality, there’s no such thing as a namespace in JavaScript.  What this is actually doing is creating a new global object called “MyNS” if it doesn’t already exist, and then adding the new objects as properties of MyNS.

If you need a sub-namespace, just create another empty object in MyNS:

var MyNS = MyNS || {};        
MyNS.Entities = MyNS.Entities || {};

Now the namespace and its members can be extracted into their own .js file(s) and included when needed.

But, thinking about a shared usage, someone may want to instantiate a Configuration object without having a model to pass in.  In that case, the code above will blow up because there are no properties on undefined (i.e., undefined.Name would result in an error).  This can be fixed by initializing model itself to an empty object if it comes in undefined:

MyNS.Configuration = function (model) {            
   model = model || {};            
   this.name = ko.observable(model.Name);            
   this.description = ko.observable(model.Description);            
   this.enabled = ko.observable(model.Enabled || false);        
};

But, wait – there’s no .Name property of an empty object.  How does this work if model = {} ?

All object in JavaScript are actually associative arrays (think dictionary, property bag, etc).  So, model.Name is exactly the same as model["Name"].  If that named element doesn’t exist in the collection, then the value will be returned as undefined, which is fine for initializing a ko.observable.

Additional Suggestion

Jason Karns (@jasonkarns) writes:

I like to use a micro-library, extend.js, to make the namespace initialization a bit cleaner. https://github.com/searls/extend.js It's a tiny library that does, admittedly, very little.

When breaking components across multiple files, ensuring that the namespace (and sub-namespace) is already created becomes annoying. You either end up declaring a very specific file load order, or copying boilerplate namespace code at the top of every file.

extend.js let's you define the namespace you want as a string (to any depth) and it will create it if it doesn't exist, or merge if it does exist. Small utility, but much cleaner than boilerplate code.

Codestock Recap

Last week, I had the privilege of delivering two talks at Codestock in Knoxville, TN.  I was part of the very first Codestock years ago, and try to attend whenever I can (scheduling conflicts with other events made it so that I could only attend every other year).

One of the best parts of this conference is that it draws in speakers and attendees from such a wide geography.  The Great Lakes region was well represented, with the usual cast of characters from Michigan, Ohio, Kentucky, and Tennessee.  But, there were also a lot of presenters from New England, East Coast, and the Southern states as well!  I often run into these folks individually at various shows, but there's something about Codestock that brings everyone together in one venue.

The trip down started for me as many conference roadtrips do: picking up Mike Eaton, and then spending 8 hours in a car with him.  Along the way, we met another speaker from our area, Tim Wingfield, and provided him with transportation from Florence, KY in exchange for much needed harassment of Mike who was driving when we were nearly out of fuel and nowhere near a gas station.

I gave my first talk (Custom Graphics for your Web Application: The HTML5 Canvas and Kinetic.js) to a packed room at 8:30AM on Friday.  Despite having 70 minutes, I didn't have the delivery pace tuned well and really could have used about 5 more minutes this time to avoid rushing the last few demos.  The audience was great and asked many questions along the way.  One person shared details of a project that he was working on that involved the drag/drop of shapes into a process flow type of diagram within the browser.  The demos that I presented during this talk are hosted on GitHub: https://github.com/jfollas/CanvasKineticDemo

My second talk (Knave Blackjack: The Story of Writing a Window Store App for Sale) was another 8:30AM session on Saturday.  Despite the mantra "Pros Play Hurt", I did not partake in the normal conference tradition of drinking whisky until all hours of the evening the night before.  However, the lack of people in the venue that morning seemed to indicate that a lot of attendees did stay out late.  The handful of people that attended this talk, however, were very engaged and simply wanted to learn about my experience of submitting apps to the Windows Store because they were interested in writing apps themselves.

Since we needed to get home at a decent hour on Saturday, Mike, Tim, and I had to start back around lunchtime.  I managed to be in my bed by 10PM on Saturday, which is a really decent time for me at the end of a roadtrip like this.

This year's Codestock was organized by a new team of people (in the past, the effort was largely performed by Mike Neel himself, from what I understand).  There were little snafus leading up to the event, but the conference itself seemed to run very smoothly while I was there.  Kudos to the team, and I know that they'll nail all of the little details next year!

Product Evaluation: Aspose.Words

Introduction

In 2004, I was approached by a client to help write a web application.  A multi-page wizard collected information from the user, and at the end, generated a Word document containing pages of best practices and engineering specifications for installing the company’s products.  Our solution worked, and hundreds (if not thousands) of documents have been generated for their customers.  But, we used Office Automation in ASP.NET in a way that was not supported by Microsoft, at least at that time (since it was executing on the server).

Fast forward to the present day, and I was invited to evaluate Aspose.Words for .NET.  While I personally don’t get many requests for Office Automation-type projects these days, as a consultant, it is good to have a go-to library to use should the need ever arise.  So, I agreed to take a look at the product.

The Unboxing Experience

These days, software typically does not come in a box, so there is no unboxing experience involved to establish that first impression.  But, there is a lot to be said about how easy the installation process is for software, as well as how discoverable the documentation is.  Aspose.Words can be installed by means of a traditional MSI install wizard, or as a NuGet package.  

The MSI option provides a full experience, including:

  • Installs the Aspose.Words assemblies (multiple assemblies to support different versions of the .NET Framework)
  • Installs the demo projects with source code
  • Installs the documentation locally on the developer’s machine
  • Adds the assembly to the Add Reference dialog box in Visual Studio

After the installation, the developer must manually add a reference to the assembly.  Locally-installed documentation is available by means of Windows HTML Help, which is completely available and searchable while disconnected from the Internet.

The NuGet option is intended to be a bit more task-oriented.  Instead of modifying the developer’s machine to support general-purpose development with the Aspose.Words library, it merely copies the .NET 2.0 and .NET 3.5 Client Profile assembly versions to the project’s directory, and then adds the necessary reference to the project.  The package will need to be added to each project that uses the library.

There is no help file installed locally with the NuGet option, so the developer is left with the online documentation located at: http://www.aspose.com/docs/display/wordsnet/Home.

Hello World

One challenge that we had with writing the document generator all those years ago was finding an efficient way to insert formatted text into the document.  Specifically, the resulting document contained sections of text, each with paragraphs and special formatting.  Because of time constraints, the decision was made to not build a document on-the-fly by inserting text into it, but instead start with a document that had every possible section already in it (nicely formatted by a human), and then just delete sections based on the user’s input.

So, in evaluating Aspose.Words, I wanted to see how easy it would be to just insert whole sections of pre-formatted text, written as HTML, into a document.  As it turns out, besides the rich Document object (which is very similar to Microsoft Word’s object model), there’s also a DocumentBuilder object that abstracts away the layers of nodes that makes up a document, and lets you focus on the task at hand.

Conveniently, DocumentBuilder has an InsertHTML method that looks like it will do exactly what I want.  But, will it handle all aspects of HTML, like images and hyperlinks?  Let’s find out!

var doc = new Aspose.Words.Document(); 
var builder = new Aspose.Words.DocumentBuilder(doc); 
var html = @"<div> 
             <img src='http://thetabletshow.com/dnr_photos/JasonFollas.png'> 
             <a href='http://jasonfollas.com/'>Testing</a> 
             </div>"; 
			 
builder.InsertHtml(html); 
doc.Save(filename_or_Stream, Aspose.Words.SaveFormat.Docx);

 

The result?  I was expecting the image not to be included (since that’s another fetch that Aspose.Words would have to perform), but it worked flawlessly!

 


Note: I received a license, but had not yet applied it when this demo executed.  This screenshot shows the default behavior of the evaluation mode where text is inserted into the document.

Licensing

When not associated with an active license, Aspose.Words will run in an Evaluation Mode that injects red text into the documents that are produced (see the screenshot of my “Hello World” experiment).  Developers evaluating the product can obtain a 30-day license in order to work with the fully unlocked behavior.

Licenses are distributed as XML files, and applications must set the license before working with the API in order to disable the Evaluation Mode.  Though this may sound like an inconvenience, it’s actually not that bad.  For example, to unlock the product for use by an ASP.NET application, simply place the .lic file into the /bin directory, and add the following to global.asax:

protected void Application_Start(object sender, EventArgs e) { 
   Aspose.Words.License license = new Aspose.Words.License(); 
   license.SetLicense("Aspose.Words.Product.Family.lic"); 
}

File Types

One very impressive aspect of Aspose.Words is the vast number of document formats that are supported for loading and saving documents.  With two lines of code, the library could be used as a format converter to open a Word Document and save it as a PDF:

var doc = new Aspose.Words.Document("Document.doc"); 
doc.Save("Document.pdf", Aspose.Words.SaveFormat.Pdf);

 

Building upon this concept, you could start with a Word Document that was authored by a business person, open it on the web server, insert/modify/delete content within the document, and then send a PDF version to the user.

Load Formats

  • Microsoft Word 97-2003 document
  • Microsoft Word 97-2003 template
  • Office Open XML WordprocessingML Macro-Free Document
  • Office Open XML WordprocessingML Macro-Enabled Document
  • Office Open XML WordprocessingML Macro-Free Template
  • Office Open XML WordprocessingML Macro-Enabled Template
  • Flat OPC document
  • RTF format
  • Microsoft Word 2003 WordprocessingML format
  • HTML format
  • MHTML (Web archive) format
  • OpenDocument Text
  • OpenDocument Text Template
  • MS Word 6 or Word 95 format

Save Formats

  • Doc: Microsoft Word 97 - 2007 Document
  • Dot: Microsoft Word 97 - 2007 Template
  • Docx: Office Open XML WordprocessingML Document (macro-free))
  • Docm: Office Open XML WordprocessingML Macro-Enabled Document
  • Dotx: Office Open XML WordprocessingML Template (macro-free)
  • Dotm: Office Open XML WordprocessingML Macro-Enabled Template
  • FlatOpc: Office Open XML WordprocessingML stored in a flat XML file instead of a ZIP package
  • FlatOpcMacroEnabled: Office Open XML WordprocessingML Macro-Enabled Document stored in a flat XML file instead of a ZIP package
  • FlatOpcTemplate: Office Open XML WordprocessingML Template (macro-free) stored in a flat XML file instead of a ZIP package
  • FlatOpcTemplateMacroEnabled: Office Open XML WordprocessingML Macro-Enabled Template stored in a flat XML file instead of a ZIP package
  • RTF: Rich Text Format
  • WordML: Microsoft Word 2003 WordprocessingML format)
  • Pdf: Adobe Portable Document
  • Xps: XML Paper Specification
  • XamlFixed: Extensible Application Markup Language (XAML) format as a fixed document
  • Swf: Adobe Flash Player
  • Svg: Scalable Vector Graphics
  • Html
  • Mhtml: Web archive
  • Epub: IDPF EPUB format
  • Odt: ODF Text Document
  • Ott: ODF Text Document Template
  • Text: Plain text format
  • XamlFlow: Beta. Saves the document in the Extensible Application Markup Language (XAML) format as a flow document
  • XamlFlowPack: Beta. Saves the document in the Extensible Application Markup Language (XAML) package format as a flow document
  • Tiff: Renders a page or pages of the document and saves them into a single or multipage TIFF file
  • Png: Renders a page of the document and saves it as a PNG file
  • Bmp: Renders a page of the document and saves it as a BMP file
  • Emf: Renders a page of the document and saves it as a vector EMF (Enhanced Meta File) file
  • Jpeg: Renders a page of the document and saves it as a JPEG file

A More Complex Example

Since I was being asked to provide an honest evaluation of the product, I wanted to find a way to stress the document generation ability a little bit more.  So, after a bit of brainstorming, I came up with the idea of creating a PDF containing a Reddit post (http://reddit.com) along with its nested comments.  Note: This exercise is mostly academic in nature.

Reddit provides a RESTful API to permit third-party software access to its content.  For .NET languages, there is an open source library called RedditSharp (https://github.com/SirCmpwn/RedditSharp) that abstracts away the details of the networking and transport, and allows the developer to focus on the data.

var reddit = new RedditSharp.Reddit();
var iama = reddit.GetSubreddit("/r/IAmA");
var firstPost = iama.GetPosts()[0];
var comments = firstPost.GetComments();

 

Comments on Reddit can nest deeply at times, so in order to make the document readable, I made two design decisions:  Set the page orientation to Landscape, and use the Left Indent instead of the bullet list so that I could have better control over how much space each indent uses.

var doc = new Aspose.Words.Document();
var builder = new Aspose.Words.DocumentBuilder(doc);
builder.PageSetup.Orientation = Aspose.Words.Orientation.Landscape;

 

The post’s title and author are written at the top of the document.  The Heading 1 style is applied to the title, while a small italic font is used to display the author:

builder.ParagraphFormat.StyleIdentifier = 
	Aspose.Words.StyleIdentifier.Heading1;

builder.Writeln(firstPost.Title);

builder.ParagraphFormat.StyleIdentifier = 
	Aspose.Words.StyleIdentifier.BodyText;

builder.Font.Name = "Arial";
builder.Font.Size = 8;
builder.Font.Italic = true;
builder.Writeln(" - " + firstPost.Author.Name);

 

Next, the comments collection must be crawled.  I chose to use a recursive function to output each level of comments, and call itself if a given comment has comments of its own (i.e., nested comments).  Each level of comments is indented using the ParagraphFormat.LeftIndent property, setting a value in points.

iterate(comments, builder);

...

private void iterate(List<RedditSharp.Comment> comments, 
			   Aspose.Words.DocumentBuilder builder)
{
    indent++;
    builder.ParagraphFormat.LeftIndent = indent * 12;

    foreach (var c in comments)
    {
        if (c.ContentHtml != null)
        {
            builder.ParagraphFormat.Borders.Top.LineStyle = 
			Aspose.Words.LineStyle.Dot;

            builder.ParagraphFormat.Borders.Top.DistanceFromText = 6;

            var html = Server.HtmlDecode(c.ContentHtml)
                             .Replace("<div class=\"md\">", "")
                             .Replace("</div>", "")
                             .Replace("<p>", "")
                             .Replace("</p>", "<br/><br/>");

            builder.Font.Name = "Times New Roman";
            builder.Font.Size = 10;
            builder.Font.Italic = false;

            builder.InsertHtml(html);

            builder.Font.Name = "Arial";
            builder.Font.Size = 8;
            builder.Font.Italic = true;

            builder.Writeln(" - " + c.Author);

            if (c.Comments.Count > 0)
                iterate(c.Comments, builder);
        }
    }
    indent--;
}

 

When I first wrote this function, I tried using RedditSharp’s Comment.Content property (instead of .ContentHtml).  I had to replace hard returns (\n) in the text with vertical tabs (\v) to keep new paragraphs from being started in the document (otherwise, the indent was lost).  Even so, the result looked strange to me, since comments in Reddit are entered in plain text using Markdown (http://en.wikipedia.org/wiki/Markdown), but they are intended to be rendered as rich text.  

The rendered HTML version of the comments is wrapped in a <div> tag, and uses <p> elements for each paragraph.  However, similar to the hard return issue with Markdown, the indenting was lost when the DocumentBuilder.InsertHtml() function encountered the <div> and <p> elements, because these were handled as new paragraphs by Aspose.Words.  So, the easy solution seemed to be to clean the HTML before inserting it into the document (by removing the <div> and <p> tags, and inserting two <br> elements in place of the closing </p> tag, etc.).

Finally, after all of the comments are rendered, it is time to save the results.  I wrote my program as part of an ASP.NET web application, so saving the document is really just streaming it to the user’s browser.  The Document object’s save() method has an overload that accepts a HttpResponse object as the first parameter, and makes the task of saving/streaming to the user very straightforward:

doc.Save(Response, "iama.pdf", Aspose.Words.ContentDisposition.Inline, null);


Note: This method overload is not available in the 3.5 Client Profile assembly, but if you need to use the Client Profile, then chances are that you will just be saving the results to an actual file anyways.

Closing Thoughts

Overall, I was impressed by the power and ease provided by Aspose.Words.  While it didn’t always do everything in the way that I thought it should, it is probably due more to my lack of understanding of how the Word document model works rather than a flaw in this library.

The DocumentBuilder object is the real powerhouse of the library, and the InsertHtml function makes it a breeze for web developers to add entire chunks of content to Word Documents.  Though, while this technique may get someone like me to the 80% mark with very little effort, it still remains to be seen how much more effort would be required in the form of code tweaking in order to produce a 100% pixel-perfect document of content.

Disclosure of Material Connection: I received one or more of the products or services mentioned above for free in the hope that I would mention it on my blog. Regardless, I only recommend products or services I use personally and believe my readers will enjoy.  I am disclosing this in accordance with the Federal Trade Commission’s 16 CFR, Part 255:  “Guides Concerning the Use of Endorsements and Testimonials in Advertising.”