Elsewhere

Jim Birch: Drupal 7: Importing Tweets into Drupal using the Twitter Module

Planet Drupal - Tue, 21/04/2015 - 19:35

Why would you want to import tweets into a Drupal site?  For one, I want to own the content I create.  Unlike other social media sites, Twitter allows great access to the content I create on their platform.  Through their API, I can access all of my Tweets and Mentions for archiving and displaying on my own site.

I have had a couple of instances with clients where the archiving of Tweets came in handy.  One when a Twitter account was hacked, and one when someone said something that wasn't supposed to be said.  At the very least, it is an offsite backup of your content at Twitter, and that is never a bad thing.

I have used this module for building aggregated content.  If you have a site that is surrounded by topics, you can build lists of Twitter accounts or #hashtags.  Imagine if you were running a Drupal Camp, you could build a feed of all of the speakers and sponsors, or a feed of the camp's #hashtag, or both!

You could also build a Twitter feed of only your community.  This module allows each and every Drupal user account to associate with one or many twitter accounts.  The users just need to authorize themselves.  The possibilities seem endless.

OK, so on with the good stuff.  Importing Tweets into your Drupal 7 site is very quick and easy using the Drupal Twitter Module

Read more

Categories: Elsewhere

ThinkShout: Drupal and Salesforce Integrations Get Some (Data) Integrity

Planet Drupal - Tue, 21/04/2015 - 19:00

Hot on the heels of our all-hands-on-deck sprint to release RedHen Raiser, we decided to change gears to focus on some of our marquee open source contributions, namely the Salesforce Suite.

The Salesforce Suite has been around since Drupal 5 and it’s evolved quite a bit in order to keep up with the ever-changing Salesforce and Drupal landscapes. Several years ago, we found ourselves relying heavily upon the Salesforce Suite for our Salesforce-Drupal integrations. But there came a point where we realized the module could no longer keep up with our needs. So we, in collaboration with the maintainers of the module at the time, set out to rewrite the suite for Drupal 7.

We completely rewrote the module, leveraging Drupal's entity architecture, Salesforce's REST API, and OAUTH for authentication. We also added much-needed features such as a completely new user experience, the ability to synchronize any Drupal and Salesforce objects, and a number of performance enhancements. This was a heck of an undertaking, and there were dozens of other improvements we made to the suite that you can read about in this blog post. We’ve maintained this module ever since and have endeavored to add new features and enhancements as they become necessary. We realized this winter that it was time for yet another batch of improvements as the complexity and scale of our integrations has grown.

In addition to over 150 performance enhancements and bug fixes, this release features an all new Drupal entity mapping system which shows a log of all synchronization activity, including any errors. You can now see a log entry for every attempted data synchronization. If there’s a problem, the log will tell you where it is and why it’s an issue. There’s now a whole interface designed to help you pinpoint where these issues are so you can solve them quickly.

Administrators can even manually create or edit a connection between Drupal and Salesforce objects. Before this update, the only way to connect two objects was to create the mapping and then wait for an object to be updated or created in either Drupal or Salesforce. Now you can just enter the Salesforce ID and you’re all set.

Take the following example to understand why these improvements are so critical. Say that your constituents are volunteering through your Drupal site using the Registration module. The contacts are created or updated in RedHen and then synced to Salesforce. For some reason, you can see the new volunteers in Drupal, but they are not showing in Salesforce. It used to be that the only clue to a problem was buried in the error log. Now, all you have to do is go to the RedHen contact record, and then click “Salesforce activity,” and you’ll see a record of the attempted sync and an explanation of why it failed. Furthermore, you can manually connect the contact to Salesforce by entering the Salesforce ID.

Finally, you can now delete existing mappings, or map to an entirely different content type. The bottom line is that module users have more control of, and insights into, how their data syncs to Salesforce. You can download version 7.x-3.1 from Drupal.org and experience these improvements for yourself.

We’ve been hard at work polishing several other of our modules and tools, like the RedHen suite and Entity Registration, which also saw new releases. We’ll tell you more about what you can expect from those new versions in our upcoming blogs.

Want to chat about our module work at DrupalCon in LA? You can find us hanging out with our friends from MailChimp at their booth. We’d love to talk to you more about what we’re working on.

Categories: Elsewhere

Drupal Watchdog: Drupal People

Planet Drupal - Tue, 21/04/2015 - 18:39
Column

Drupal people are good people. They are the recipe’s secret ingredient, and conferences are the oven. Mix and bake.

March 2007, Sunnyvale, California, the Yahoo campus and a Sheraton.

OSCMS, my second Drupal event and my first conference.

Dries gave the State of Drupal keynote, with a survey of developers and a vision for future work. His hair was still a bit punk and he was a bit younger. Dries has the best slides. Where does he find those amazing slides?

I like Dries a lot.

I wish I had created Drupal.

In 1999, I created my own CMS named Frameworks. I remember showing my friend Norm an "edit" link for changing text and how cool that was. Back then, I didn't even know about Open Source – despite being a fanboy of Richard Stallman and the FSF – and I was still using a mix of C/C++, Perl, and IIS. (If you wanted to eat in the 1990's, Windows was an occupational hazard.)

But I didn't create Drupal. I didn't have the hair, I've never had those amazing slides, and I will never be able to present that well.

But mainly, I didn't have the vision.

Rasmus Lerdorf gave a talk on the history of PHP. I was good with computer languages. I had written a compiler in college, developed my first interpretive language in the late 1980's and another one in the early 1990's. I wondered why I hadn't created PHP. At the time, most web apps were written in Perl. I loved Perl. It was so concise. It was much better than AWK, which in itself was also pretty awesome.

(Note: AWK does not stand for awkward. It’s named after Aho, Weinberger, and Kernighan – of K&R fame).

So I didn't see the need for PHP, we had Perl!

Again, no vision.

Meanwhile: 2007, Sunnyvale, California, OSCMS.

Categories: Elsewhere

Julien Danjou: Gnocchi 1.0: storing metrics and resources at scale

Planet Debian - Tue, 21/04/2015 - 17:00

A few months ago, I wrote a long post about what I called back then the "Gnocchi experiment". Time passed and we – me and the rest of the Gnocchi team – continued to work on that project, finalizing it.

It's with a great pleasure that we are going to release our first 1.0 version this month, roughly at the same time that the integrated OpenStack projects release their Kilo milestone. The first release candidate numbered 1.0.0rc1 has been released this morning!

The problem to solve

Before I dive into Gnocchi details, it's important to have a good view of what problems Gnocchi is trying to solve.

Most of the IT infrastructures out there consists of a set of resources. These resources have properties: some of them are simple attributes whereas others might be measurable quantities (also known as metrics).

And in this context, the cloud infrastructures make no exception. We talk about instances, volumes, networks… which are all different kind of resources. The problems that are arising with the cloud trend and is the scalability of storing all this data and be able to request them later, for whatever usage.

What Gnocchi provides is a REST API that allows the user to manipulate resources (CRUD) and their attributes, while preserving the history of those resources and their attributes.

Gnocchi is fully documented and the documentation is available online. We are the first OpenStack project to require patches to integrate the documentation. We want to raise the bar, so we took a stand on that. That's part of our policy, the same way it's part of the OpenStack policy to require unit tests.

I'm not going to paraphrase the whole Gnocchi documentation, which covers things like installation (super easy), but I'll guide you through some basics of the features provided by the REST API. I will show you some example so you can have a better understanding of what you could leverage Gnocchi!

Handling metrics

Gnocchi provides a full REST API to manipulate time-series that are called metrics. You can easily create a metric using a simple HTTP request:

POST /v1/metric HTTP/1.1
Content-Type: application/json
 
{
"archive_policy_name": "low"
}
 
HTTP/1.1 201 Created
Location: http://localhost/v1/metric/387101dc-e4b1-4602-8f40-e7be9f0ed46a
Content-Type: application/json; charset=UTF-8
 
{
"archive_policy": {
"aggregation_methods": [
"std",
"sum",
"mean",
"count",
"max",
"median",
"min",
"95pct"
],
"back_window": 0,
"definition": [
{
"granularity": "0:00:01",
"points": 3600,
"timespan": "1:00:00"
},
{
"granularity": "0:30:00",
"points": 48,
"timespan": "1 day, 0:00:00"
}
],
"name": "low"
},
"created_by_project_id": "e8afeeb3-4ae6-4888-96f8-2fae69d24c01",
"created_by_user_id": "c10829c6-48e2-4d14-ac2b-bfba3b17216a",
"id": "387101dc-e4b1-4602-8f40-e7be9f0ed46a",
"name": null,
"resource_id": null
}


The archive_policy_name parameter defines how the measures that are being sent are going to be aggregated. You can also define archive policies using the API and specify what kind of aggregation period and granularity you want. In that case , the low archive policy keeps 1 hour of data aggregated over 1 second and 1 day of data aggregated to 30 minutes. The functions used for aggregations are the mathematical functions standard deviation, minimum, maximum, … and even 95th percentile. All of that is obviously customizable and you can create your own archive policies.

If you don't want to specify the archive policy manually for each metric, you can also create archive policy rule, that will apply a specific archive policy based on the metric name, e.g. metrics matching disk.* will be high resolution metrics so they will use the high archive policy.

It's also worth noting Gnocchi is precise up to the nanosecond and is not tied to the current time. You can manipulate and inject measures that are years old and precise to the nanosecond. You can also inject points with old timestamps (i.e. old compared to the most recent one in the timeseries) with an archive policy allowing it (see back_window parameter).

It's then possible to send measure to this metric:

POST /v1/metric/387101dc-e4b1-4602-8f40-e7be9f0ed46a/measures HTTP/1.1
Content-Type: application/json
 
[
{
"timestamp": "2014-10-06T14:33:57",
"value": 43.1
},
{
"timestamp": "2014-10-06T14:34:12",
"value": 12
},
{
"timestamp": "2014-10-06T14:34:20",
"value": 2
}
]

HTTP/1.1 204 No Content


These measure are synchronously aggregated and stored into the storage backend configured. Our most scalable storage drivers for now are either based on Swift or Ceph which are both scalable storage objects systems.

It's then possible to retrieve these values:

GET /v1/metric/387101dc-e4b1-4602-8f40-e7be9f0ed46a/measures HTTP/1.1
 
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
 
[
[
"2014-10-06T14:30:00.000000Z",
1800.0,
19.033333333333335
],
[
"2014-10-06T14:33:57.000000Z",
1.0,
43.1
],
[
"2014-10-06T14:34:12.000000Z",
1.0,
12.0
],
[
"2014-10-06T14:34:20.000000Z",
1.0,
2.0
]
]


As older Ceilometer users might notice here, metrics are only storing points and values, nothing fancy such as metadata anymore.

By default, values eagerly aggregated using mean are returned for all supported granularities. You can obviously specify a time range or a different aggregation function using the aggregation, start and stop query parameter.

Gnocchi also supports doing aggregation across aggregated metrics:

GET /v1/aggregation/metric?metric=65071775-52a8-4d2e-abb3-1377c2fe5c55&metric=9ccdd0d6-f56a-4bba-93dc-154980b6e69a&start=2014-10-06T14:34&aggregation=mean HTTP/1.1
 
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
 
[
[
"2014-10-06T14:34:12.000000Z",
1.0,
12.25
],
[
"2014-10-06T14:34:20.000000Z",
1.0,
11.6
]
]


This computes the mean of mean for the metric 65071775-52a8-4d2e-abb3-1377c2fe5c55 and 9ccdd0d6-f56a-4bba-93dc-154980b6e69a starting on 6th October 2014 at 14:34 UTC.

Indexing your resources

Another object and concept that Gnocchi provides is the ability to manipulate resources. There is a basic type of resource, called generic, which has very few attributes. You can extend this type to specialize it, and that's what Gnocchi does by default by providing resource types known for OpenStack such as instance, volume, network or even image.

POST /v1/resource/generic HTTP/1.1
 
Content-Type: application/json
 
{
"id": "75C44741-CC60-4033-804E-2D3098C7D2E9",
"project_id": "BD3A1E52-1C62-44CB-BF04-660BD88CD74D",
"user_id": "BD3A1E52-1C62-44CB-BF04-660BD88CD74D"
}
 
HTTP/1.1 201 Created
Location: http://localhost/v1/resource/generic/75c44741-cc60-4033-804e-2d3098c7d2e9
ETag: "e3acd0681d73d85bfb8d180a7ecac75fce45a0dd"
Last-Modified: Fri, 17 Apr 2015 11:18:48 GMT
Content-Type: application/json; charset=UTF-8
 
{
"created_by_project_id": "ec181da1-25dd-4a55-aa18-109b19e7df3a",
"created_by_user_id": "4543aa2a-6ebf-4edd-9ee0-f81abe6bb742",
"ended_at": null,
"id": "75c44741-cc60-4033-804e-2d3098c7d2e9",
"metrics": {},
"project_id": "bd3a1e52-1c62-44cb-bf04-660bd88cd74d",
"revision_end": null,
"revision_start": "2015-04-17T11:18:48.696288Z",
"started_at": "2015-04-17T11:18:48.696275Z",
"type": "generic",
"user_id": "bd3a1e52-1c62-44cb-bf04-660bd88cd74d"
}


The resource is created with the UUID provided by the user. Gnocchi handles the history of the resource, and that's what the revision_start and revision_end fields are for. They indicates the lifetime of this revision of the resource. The ETag and Last-Modified headers are also unique to this resource revision and can be used in a subsequent request using If-Match or If-Not-Match header, for example:

GET /v1/resource/generic/75c44741-cc60-4033-804e-2d3098c7d2e9 HTTP/1.1
If-Not-Match: "e3acd0681d73d85bfb8d180a7ecac75fce45a0dd"
 
HTTP/1.1 304 Not Modified


Which is useful to synchronize and update any view of the resources you might have in your application.

You can use the PATCH HTTP method to modify properties of the resource, which will create a new revision of the resource. The history of the resources are available via the REST API obviously.

The metrics properties of the resource allow you to link metrics to a resource. You can link existing metrics or create new one dynamically:

POST /v1/resource/generic HTTP/1.1
Content-Type: application/json
 
{
"id": "AB68DA77-FA82-4E67-ABA9-270C5A98CBCB",
"metrics": {
"temperature": {
"archive_policy_name": "low"
}
},
"project_id": "BD3A1E52-1C62-44CB-BF04-660BD88CD74D",
"user_id": "BD3A1E52-1C62-44CB-BF04-660BD88CD74D"
}
 
HTTP/1.1 201 Created
Location: http://localhost/v1/resource/generic/ab68da77-fa82-4e67-aba9-270c5a98cbcb
ETag: "9f64c8890989565514eb50c5517ff01816d12ff6"
Last-Modified: Fri, 17 Apr 2015 14:39:22 GMT
Content-Type: application/json; charset=UTF-8
 
{
"created_by_project_id": "cfa2ebb5-bbf9-448f-8b65-2087fbecf6ad",
"created_by_user_id": "6aadfc0a-da22-4e69-b614-4e1699d9e8eb",
"ended_at": null,
"id": "ab68da77-fa82-4e67-aba9-270c5a98cbcb",
"metrics": {
"temperature": "ad53cf29-6d23-48c5-87c1-f3bf5e8bb4a0"
},
"project_id": "bd3a1e52-1c62-44cb-bf04-660bd88cd74d",
"revision_end": null,
"revision_start": "2015-04-17T14:39:22.181615Z",
"started_at": "2015-04-17T14:39:22.181601Z",
"type": "generic",
"user_id": "bd3a1e52-1c62-44cb-bf04-660bd88cd74d"
}


Haystack, needle? Find!

With such a system, it becomes very easy to index all your resources, meter them and retrieve this data. What's even more interesting is to query the system to find and list the resources you are interested in!

You can search for a resource based on any field, for example:

POST /v1/search/resource/instance HTTP/1.1
Content-Type: application/json
 
{
"=": {
"user_id": "bd3a1e52-1c62-44cb-bf04-660bd88cd74d"
}
}


That query will return a list of all resources owned by the user_id bd3a1e52-1c62-44cb-bf04-660bd88cd74d.

You can do fancier queries such as retrieving all the instances started by a user this month:

POST /v1/search/resource/instance HTTP/1.1
Content-Type: application/json
Content-Length: 113
 
{
"and": [
{
"=": {
"user_id": "bd3a1e52-1c62-44cb-bf04-660bd88cd74d"
}
},
{
">=": {
"started_at": "2015-04-01"
}
}
]
}


And you can even do fancier queries than the fancier ones (still following?). What if we wanted to retrieve all the instances that were on host foobar the 15th April and who had already 30 minutes of uptime? Let's ask Gnocchi to look in the history!

POST /v1/search/resource/instance?history=true HTTP/1.1
Content-Type: application/json
Content-Length: 113
 
{
"and": [
{
"=": {
"host": "foobar"
}
},
{
">=": {
"lifespan": "1 hour"
}
},
{
"<=": {
"revision_start": "2015-04-15"
}
}
 
]
}


I could also mention the fact that you can search for value in metrics. One feature that I will very likely include in Gnocchi 1.1 is the ability to search for resource whose specific metrics matches some value. For example, having the ability to search for instances whose CPU consumption was over 80% during a month.

Cherries on the cake

While Gnocchi is well integrated and based on common OpenStack technology, please do note that it is completely able to function without any other OpenStack component and is pretty straight-forward to deploy.

Gnocchi also implements a full RBAC system based on the OpenStack standard oslo.policy and which allows pretty fine grained control of permissions.

There is also some work ongoing to have HTML rendering when browsing the API using a Web browser. While still simple, we'd like to have a minimal Web interface served on top of the API for the same price!

Ceilometer alarm subsystem supports Gnocchi with the Kilo release, meaning you can use it to trigger actions when a metric value crosses some threshold. And OpenStack Heat also supports auto-scaling your instances based on Ceilometer+Gnocchi alarms.

And there are a few more API calls that I didn't talked about here, so don't hesitate to take a peek at the full documentation!

Towards Gnocchi 1.1!

Gnocchi is a different beast in the OpenStack community. It is under the umbrella of the Ceilometer program, but it's one of the first projects that is not part of the (old) integrated release. Therefore we decided to have a release schedule not directly linked to the OpenStack and we'll do release more often that the rest of the old OpenStack components – probably once every 2 months or the like.

What's coming next is a close integration with Ceilometer (e.g. moving the dispatcher code from Gnocchi to Ceilometer) and probably more features as we have more requests from our users. We are also exploring different backends such as InfluxDB (storage) or MongoDB (indexer).

Stay tuned, and happy hacking!

Categories: Elsewhere

Dirk Eddelbuettel: Introducing ghrr: GitHub Hosted R Repository

Planet Debian - Tue, 21/04/2015 - 16:34

Background

R relies on package repositories for initial installation of a package via install.packages(). A crucial second step is update.packages(): For all currently installed packages, a list of available updates is constructed or offered for either one-by-one or bulk updates. This keeps the local packages in sync with upstream, and provides for a very convenient way to obtain new features, bug fixes and other improvements. So by installing from a repository, we automatically have the ability to track the repository for updates.

Enter drat

Fairly recently, the drat package was added to the R ecosystem. It makes both aspects of package distribution easy: providing a package (if you are an author) as well as installing it (if you are a user). Now, because drat is at the same time source code (as it is also a package providing the functionality), and a repository (using what drat provides ib features), the "namespace" becomes a little cluttered.

But because a key feature of drat is the "one variable" unique identification via the GitHub, I opted to create a drat repository in the name of a new organisation: ghrr. This is a simple acronym for GitHub Hosted R Repository.

Use cases

We can outline several use case for packages in ghrr:

  • packages not published in a repo by their authors: I already use two like that:
    • fasttime, an impeccably fast parser for ISO datetimes by Simon Urbanek which was however never released into a repo by Simon, and
    • RcppR6, a very nice extension to both R6 (by Winston) and Rcpp, by Rich FitzJohn; similarly never released beyond GitHub;
  • packages possibly unsuitable for mainline repos:
    • Rblpapi is a great package by Whit Armstong and John Laing to which I have been contributing quite a bit of late. As it requires a free-to-use but not open source library and headers from Bloomberg, it will never make it to the mainline repository for R, but hosting it in ghrr is perfect as I can easily update several machines at work once I cut a new development release;
    • winsorize is a small package I needed a few weeks ago; it is spun out of robustHD but does not yet contain new code so Andreas and I are content to keep it in this drat for now;
  • packages in pre-relase mode:
    • RcppArmadillo where I announced both a release candidate before Armadillo 5.000 came out, as well as the actual RcppArmadillo 0.500.0.0 which is not (yet) on the mainline repository as two affected packages need a small update first. Users, however, can get RcppArmadillo already from the sibling Rcpp drat repo.
    • RcppToml is a new package I am currently working on implementing a toml parser based on cpptoml. It works, but it not quite ready for public announcements yet, and hence perfect for ghrr.
Going forward

ghrr is meant to be open. While anybody can open a drat repository, particularly on GitHub, it may be beneficial to somehow group packages. This is however not something that can be planned ex-ante: it may just happen if others who see similar benefits in this can in fact contribute. In that spirit, I strongly encourage pull requests.

Early on, I made my commit messages conform to a pattern of package version sha1 repourl to make code provenance of every commit very clear. Ideally, subsequent commits would conform to such a scheme, or replace it with a better one.

Some Resources

A few links to learn more about drat and ghrr:

Comments and questions via email or issue tickets are more than welcome. We hope that others find ghrr to be a useful tool for easy repository management and use via GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: Elsewhere

ERPAL: 6 rules to follow to promote communication in projects

Planet Drupal - Tue, 21/04/2015 - 15:26

In the last part of our blog series, we dealt with the specifications of a project. Today we discuss the issue of responsibilities and ongoing communication. Ensure that the infrastructure exists to support project communication and that everyone has access to it – and uses it! Keep up the communication in the project and make sure that there’s a central communication tool, especially if you’re working in distributed teams. Define which communications are task-related and should be persisted on the task. This is mostly about teamwork: there’s nothing more damaging to a project than the stagnation of communication after a certain length of the project time, because then everyone in the project team makes different assumptions, leading into different directions. Recurring meetings, such as dailies and weeklies, help to build a culture of ongoing communication.

In addition to the above, also note the following as important points to take into account:

1) Provide a contact person Project work is teamwork. Both suppliers and customers need to meet their obligations in the project. One of the key things is the obligation to cooperate in making decisions. In particular, this includes the acceptance of partial results or the overall project. To make such decisions, a person needs the skills to take this decision and the authority to do so. If there is no clear contact person who’s responsible for all decision areas – financially, professionally and organizationally – the process may stagnate. This can delay the whole project and applies to both sides, vendors and customers alike. 2) What might happen if this contact person doesn’t exist (from the perspective of customer)? Everyone has something to say. The change requests will be considered by all project participants, but remember that only one of them gets the bill at the end of the day. Naturally, the customer may wonder about the amount, but no one ever wants to be responsible. Maybe the wishes were contradictory and mutually canceled each other out in the progression of things, but now, of course, one individual is surprised that – wouldn’t you know – exactly HIS wishes were swallowed up by the ominous “project monster”! This role is usually awarded to the responsible project manager, who in turn must prove that he’s only done what was required. 3) And what can happen from the perspective of the provider? During the project, customer requirements are discussed with various people. If these requirement changes don’t end up in a pot and if reviews are always subjective, it can lead to unwanted side effects. Example: Contact A builds a new forum feature. Person B says: "We’ll delete the forum" while Person C is under the belief that everything is ready and, hence, plans to start the acceptance process.   4) Emphasize all duties right from the beginning As a provider you and your customers should be clear on what obligations each respective party has. Ensure that the project is clear through transparent project management. Don’t drop tasks and always record decisions so that, later, you can see how they were made and how they’ve influenced the overall project. 5) Communicate problems early Of course, it can also come to additional expenses or delays in the project. Make sure that you identify these problems early and communicate them throughout the responsible team. If problems are addressed and resolved constructively in your project culture, as opposed to having long debates about who’s to blame for the problem, even large hurdles can be overcome together. 6) It’s no shame to question things, so ask often! If there are any questions in the project – and there will be – ask them! Anyone who’s too afraid or lazy to ask when there are ambiguities does harm to the project. Assumptions and certainties contradict every form of transparent communication. So provide the team with a centralized and transparent communication (a chat isn’t sufficient as it doesn’t persit the communication). Also, ensure that information and agreements actually make it to the people who need to have access to it. In the next part of our series, we’ll focus on unrealistic budgets and deadlines.
Categories: Elsewhere

Drupalize.Me: Learning To Debug: Stop Making Assumptions

Planet Drupal - Tue, 21/04/2015 - 15:10

Here's an example of an assumption; The sun will rise tomorrow. An assumption is something that is accepted as true or as certain to happen, without proof. This kind of thinking, while convenient, is prone to concealing facts, and troublesome when debugging code. This article defines what an assumption is, and provides some techniques for helping to eliminate them during debugging.

Categories: Elsewhere

Olivier Berger: How to publish an HTML5+RDFa Web site from org-mode

Planet Debian - Tue, 21/04/2015 - 13:49

I’m a big fan of org-mode (see previous posts), and I’ve started maintaining (sic) my professional webpage(s) with it.

But I’ve also recently tried and publish some more Semantic/Linked Data aware documents too (again, previous posts).

Ideally, I think my preferred workflow for publishing articles or documents of some importance, would be to author them in org-mode, and then publish them as HTML5 including RDFa meta-data and annotations. Instead, I’ve more frequently been doing conversions of org-mode to LaTeX, in order to submit a printable version, and later-on decided to convert the LaTeX to HTML5+RDFa…

But one of the issues is how to properly embed the RDF meta-data inside the org-mode documents, so that the syntax is both compact and expressive enough.

I doubt there’s a universal solution, given that RDF tends to be complex, and graphs may not project easilly along a mainly linear structure of an org-mode document, but anyway, there seems to be possible middle grounds that are practically good enough.

I’ve tried and implement a solution, which reuses the principles set by John Kitchin in Extending the org-mode link syntax with attributes, i.e. implementing an HTML exporter for a particular custom link type, which will convert the plist-like syntax to some RDFa constructs.

Here’s a description of the whole solution : http://www-public.telecom-sudparis.eu/~berger_o/test-org-publishing-rdfa.html

The nice thing about org-mode, and its litterate programming babel environment, is that it allows to embed the code of the links exporter inside the org document, avoiding to dissociate the converter from the document’s source, making it auto-complete.

Next step will probably be to author a paper (or convert back a “preprint” of mines) with org-mode, in order to provide Linked Research meta-data.

Stay tuned for more details, and in the meantime, I welcome any improvement to the org/babel/elisp setup.

Categories: Elsewhere

J-P Stacey: Creating a rapidly redeployable booking system for UNOLS

Planet Drupal - Tue, 21/04/2015 - 13:09

I recently worked with Blue Dot Lab to build a rapidly redeployable, interactive booking system for the University-National Oceanographic Laboratory System (UNOLS) based in the US. UNOLS is a consortium of over 60 academic institutions involved in oceanographic research, and individual institutions can require their own system for organizing research expeditions and booking the necessary equipment and boats. Such new systems need to be ready to go with the minimum of fuss and at reasonably short notice.

Read more of "Creating a rapidly redeployable booking system for UNOLS"

Categories: Elsewhere

Jonathan Dowland: Useful script

Planet Debian - Tue, 21/04/2015 - 10:41

Here's a useful shell procedure:

vigg () { git grep --color=auto -lz "$1" | xargs -r0 sh -c "echo vi +/\"$1\" \"\$@\" < /dev/tty" }

git grep is a very effective way to run a recursive grep over a git repository, or part of a git repository (by default, it limits its search to the sub-tree you are currently sitting in). I quite often find myself wanting to edit every file that matched a search, and so wrote this snippet.

The /dev/tty-ugliness is to work around vi complaining that it's standard input was set to /dev/null by xargs. BSD xargs has a command, -o, which sorts this out; GNU xargs doesn't, but the manual suggests the above portable workaround. This marks the first time a BSD tool has had a feature I've wanted and the GNU equivalent doesn't.

Categories: Elsewhere

Triquanta Web Solutions: Going Dutch: stemming in Apache Solr

Planet Drupal - Tue, 21/04/2015 - 08:14
Stemming

So, stemming, what is stemming? Generally speaking, stemming is finding the basic form of a word. For example, in the sentence "he walks" the verb is inflicted by adding a "s" to it. In this case the stem is "walk" which, in English, also happens to be the infinitive of the verb.

We will first present a few examples of stemming in natural language, and since Dutch is my native language I will concentrate on Dutch examples.

After that we will show the results of a number of stemmers present in Solr and give a few pointers about what to do if the results of these stemmers are not good enough for your application.

Plurals

On of the things you absolutely want your user to be able to, is to find results which contain the single form of a word while searching for the plural and vice versa, e.g.: finding "cat" when looking for "cats" and finding "cats" when searching for "cat".

Although in English there are well-defined rules for creating the plural form (suffix with "s", "es" or change "y" to "ie" and suffix "s"), there also are a number of irregular nouns ("woman" -> "women") and nouns for which the single and plural form are the same ("sheep", "fish").

In Dutch more or less the same situation exists, be it with different suffixes ("s", "'s", "en") and, of course, other exceptions.

Furthermore, in Dutch if the stem ends on a consonant directly preceded by a vowel, this consonant is doubled (otherwise, in the plural form, the vowel would sound like a long vowel instead of a short vowel), e.g.:

kat (cat) -> katten (cats)

But, to this rule there also are exceptions, like

monnik (monk) -> monniken (monks)

in contrasts with:

krik (car jack)-> krikken (car jacks) Verb conjugation

Conjugation of verbs in Dutch is, to be blunt, a bit of a mess.

In Dutch, for forming the past tense of a verb, two types of conjugation co-exist: the (pre-) medieval system, now called strong and the more recent system, called weak. When I say the systems co-exist, one should note that most (native) Dutch speakers are not aware of the fact that the strong-system is a system at all: the they consider the strong verbs to be exceptions, best learned by heart.

An example of a strong verb is "lopen" (to walk):

hij loopt (he walks) -> hij liep (he walked)

While an example of a weak verb is "rennen" (to run):

hij rent (he runs) -> hij rende (he ran)

These examples make clear that determening which verb is strong and which verb weak is indeed a case of learning by heart.

Furthermore the change from strong to weak verbs is a ongoing process. One exampe of a verb which is currently in transition from strong to weak is the verb "graven" (to dig) of which both the form "hij groef" (he digged) and "hij graafde" can also be found, although most language-purist would consider the last form as "wrong".

NB: if you are interested in this kind of things, a classic book about language changes is Jean Aitchisons Language change: progress or decay (1981, yes, it is a bit pre-internet...)

Diminutives

In a number of languages, like Dutch, German, Polish and many more, diminutives are created by inflicting the word. In English you form the diminutive by adding an adjective like 'little', but in Dutch the general rule to form a diminutive is to add the suffix "je" to the word, e.g.:

huis (house) -> huisje (little house)

This is the general rule, because the suffix can als be inflicted, like in

bloem (flower) -> bloempje (little flower)

And in some words te ending consonant is changed to keep the word pronouncable:

hemd (shirt) -> hempje (little shirt)

It is however also possible in Dutch to use an adjective like 'klein' (little) and even to combine both:

kleine bloem (little flower) -> klein bloempje (small little flower)

A last peculiarity I should mention is that in Dutch (but also in many other languages) there are words which only have a diminutive form, like 'meisje' (girl).

Homographs and homonyms

For some words it is not possible to find the correct stemming without knowing the semantics or context, e.g. kantelen which if pronounced like kantélen means "battlements" but when pronounced kántelen means "tip over". Or zij kust ("she kisses") versus kust like in de Noordzeekust ("the North sea coast").

Why bother?

So maybe by now you are asking yourself: why bother? Well, you should be bothered because stemming will make it easier for the visitors of your site to find what they are looking for.

For example, you can almost be sure that when a visitor is interested in articles about houses (in Dutch 'huizen'), he will also be interested in articles which mention a house ('huis').

So when using the search term 'huizen' it would be nice if results which contain 'huis' would automatically be shown.

Of course searching a verb is much less common, and the chance that a visitor will use the dimunitive is also not very great, but still it happens and if it takes only a minimal effort to make sure the visitor finds what he is searching for, then why not?

Solr

Starting form Solr version 3.1, for English (and a number of other languages) there is a standard filter "EnglishMinimalStemFilterFactory" which has the ability to stem English words. For Dutch however, such a simple filter factory is not available.

There are however a number of default languages that can be used with the SnowballPorterFilterFactory and in the default schema included in Solr 5 a number of such fields are predefined.

Solr 5 default schema

In Solr 5 the default schema defines a list of language specific fieldtypes. For Dutch the fieldtype 'text_nl' is defined as follows:

<dynamicField name="*_txt_nl" type="text_nl" indexed="true" stored="true"/> <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" /> <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/> <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/> </analyzer> </fieldType>

So in short, in the SnowballPorterFilterFactory the language is set to Dutch.

There is however a alternative stemming algorithm avilable, the Kraaij-Pohlmann algorithm, see Porter’s stemming algorithm for Dutch, known in Solr as Kp

To compare both algorithms, we define a new Dutch fieldtype as follows:

<dynamicField name="*_txt_nlkp" type="text_nlkp" indexed="true" stored="true"/> <fieldType name="text_nlkp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" /> <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/> <filter class="solr.SnowballPorterFilterFactory" language="Kp"/> </analyzer> </fieldType>

To complete our analysis we will also use the default English language field, defined as:

<dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true"/> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> Comparison

In the data shown next we compare the three above defined fields with the correct values.

Fieldtype text_en Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output katten monniken meisj hempj krikken huizen huisj bloempj loopt   Input liep lopen rent rende rennen kust kussen kantelen Output liep lopen rent rend rennen kust kussen kantel   Fieldtype text_nl (language = dutch) Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output kat monnik meisj hempj krik huiz huisj bloempj loopt   Input liep lopen rent rende rennen kust kussen kantelen   Output liep lop rent rend renn kust kuss kantel     Fieldtype text_nlkp (language = kp) Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output kat monnik meis hem krik huis huis bloem loop   Input liep lopen rent rende rennen kust kussen kantelen   Output liep loop rent rend ren kust kus kantel     Correct values Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output kat monnik meisje hemd krik huis huis bloem loop   Input liep lopen rent rende rennen kust kussen kantelen Output loop loop ren ren ren kus (verb)
kust (noun) kus kantel (verb)
kanteel (noun)

The most noticable conclusion of above comparison is that the output of the text_nl-field does not differ much from the text_en-field.

It seems that the 'Dutch'-language implementation of the SnowballPorterFilter has no way of stemmming dimunitives, results like "huisj" and "bloempj" are just plain wrong, while the Kraaij-Pohlmann correctly returns "huis" en "bloem".

The same holds for the plural "huizen" which is correclty stemt by Kraaij-Pohlmann to "huis".

The dimunitive "meisje" is stemt by Kraaij-Pohlmann to "meis" which in some dialects of Dutch, like the dialect spoken in De Zaanstreek, is actually correct. There is however a way to correct this, see the section about KeywordMarkerFilterFactory under "a bit disappointed?".

And "hempje" is wrongly stemt to "hem", which seems a too general application of the rule which correctly stems "bloempje" to "bloem"

None of the algorithms knows how to handle homographs like "kust" and "kantelen" but this was to expected.

A bit disappointed?

Well, maybe your expectations were a bit high then... Natural language processing is notoriously hard and, for that part that requires background knowledge, as good as impossible when working on single words or short phrases.

But in general the Kraaij-Pohlmann algorithm does a rather good job stemming Dutch words. Sometimes however, like with the word "meisje" it is a bit over-enthusiastic.

But there are a number of ways to improve stemming if, for some reason, the results of Kraaij-Pohlmann algorithms are not good enough.

KeywordMarkerFilterFactory

The KeywordMarkerFilter makes it possible to exclude words from a (UTF-8) text file from stemming. A word like "meisje" would be a good candidate for this. To use it, add a filter to your fieldtype like this:

<filter class="solr.KeywordMarkerFilterFactory" protected="notStemmed.txt" />

The file "notStemmed.txt" should be in the same directory as the schema.xml.

StemmerOverrideFilterFactory

The StemmerOverrideFilterFactory is a variation on the KeywordMarkerFilterFactory filter, but instead only saying "do not stem these words" you must provide a file which defines the stemming for given words. To use it, add a filter to the field type like this:

<filter class="solr.StemmerOverrideFilterFactory" dictionary="dictionary.txt" ignoreCase="true"/>

and make sure the file "dictionary.txt" is present in the conf-directory. In this file (which, like all others has to be encoded UTF-8) you add 1 ine per word, each lne consisting of the word an the stemmed word seperated by 1 tab, like this:

hempje hemd

Both KeywordMarkerFilterFactory and StemmerOverrideFilterFactory should be used as addition to the default stemming,

HunspellStemFilterFactory

Hunspell is the open source spellchecker used in a number of open source projects like LibreOffice, Mozilla Thunderbird etc.

It is possible to use Hunspell if it supports your language. To do so add a filter to the fieltype like this:

<filter class="solr.HunspellStemFilterFactory" dictionary="nl_NL.dic" affix="nl_NL.aff" ignoreCase="true" />

And make sure the files "nl_NL.dic" and "nl_NL.aff" are present in the conf-directory.

Creating your own stemming algorithm

Of course if you are really ambitious you can start from scratch and write your own Snowball implementation, from the Snwoball website:

Snowball is a language in which stemming algorithms can be easily represented. The Snowball compiler translates a Snowball script (a .sbl file) into either a thread-safe ANSI C program or a Java program. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions). The language has a full manual, and the various stemming scripts act as example programs.

But be warned: no natural language or natural language phenomenon is easy to fit in an algorithm and you have to be sure to have all quirks and exceptions absolutely clear before you start.

Links and literature
Categories: Elsewhere

Manuel A. Fernandez Montecelo: About the Debian GNU/Linux port for OpenRISC or1k

Planet Debian - Tue, 21/04/2015 - 02:16

In my previous post I mentioned my involvement with the OpenRISC or1k port. It was the technical activity in which I spent most time during 2014 (Debian and otherwise, day job aside).

I thought that it would be nice to talk a bit about the port for people who don't know about it, and give an update for those who do know and care. So this post explains a bit how it came to be, details about its development, and finally the current status. It is going to be written as a rather personal account, for that matter, since I did not get involved enough in the OpenRISC community at large to learn much about its internal workings and aspects that I was not directly involved with.

There is not much information about all of this elsewhere, only bits and pieces scattered here and there, but specially not much public information at all about the development of the Debian port. There is an OpenRISC entry in the Debian wiki, but it does not contain much information yet. Hopefully, this piece will help a bit to preserve history and give an insight for future porters.

First Things First

I imagine that most people reading this post will be familiar with the terminology, but just in case, to create a new Debian port means to get a Debian system (GNU/Linux variant, in this case) to run in the OpenRISC or1k computer architecture.

Setting to one side all differences between hardware and software, and as described in their site:

“The aim of the OpenRISC project is to create free and open source computing platforms”

It is therefore a good match for the purposes of Debian and Free Software world in general.

The processor has not been produced in silicon, or not available for the masses in any case. People with the necessary know-how can download the hardware description (Verilog) and synthesise it in a FPGA, or otherwise use simulators. It is not some piece of hardware that people can purchase yet, and there are no plans to mass-produce it in the near future either.

The two people (including me) involved in this Debian port did not have the hardware, so we created the port entirely through cross-compiling from other architectures, and then compiling inside Qemu. In a sense, we were creating a Debian port for hardware that "does not [phisically] exist". The software that we built was tested by people who had hardware available in FPGA, though, so it was at least usable. I understand that people working in the arm64 port had to work similarly in the initial phases, working in the dark without access to real hardware to compile or test.

The Spark

The first time that I heard about the initiative to create the port was in late February of 2014, in a post which appeared in Linux Weekly News (sent by Paul Wise) and Slashdot. The original post announcing it was actually from late January, from Christian Svensson (blueCmd):

“Some people know that I've been working on porting Glibc and doing some toolchain work. My evil master plan was to make a Debian port, and today I'm a happy hacker indeed!

Below is a link to a screencast of me installing Debian for OpenRISC, installing python2.7 via apt-get (which you shouldn't do in or1ksim, it takes ages! (but it works!)) and running a small Python script. http://asciinema.org/a/7362

So, now, what can a Debian Hacker do when reading this? (Even if one's Hackery Level is not that high, as it is my case). And well, How Hard Can It Be? I mean, Really?

Well, in my own defence, I knew that the answer to the last two questions would be a resounding Very. But for some reason the idea grabbed me and I couldn't help but think that it would be a Really Exciting Project, and that somehow I would like to get involved. So I wrote to Christian offering my help after considering it for a few days, around mid March, and he welcomed me aboard.

The Ball Was Already Rolling

Christian had already been in contact with the people behind DebianBootstrap, and he had already created the repository http://openrisc.debian.net/ with many packages of the base system and beyond (read: packages name_version_or1k.deb available to download and install). Still nowadays the packages are not signed with proper keys, though, so use your judgement if you want to try them.

After a few weeks, I got up to speed with the status of the project and got my system working with the necessary tools. This meant basically sbuild/schroot to compile new packages, with the base system that Christian already got working, installed in a chroot, probably with the help of debootstrap, and qemu-system-or1k to simulate the system.

Only a few of the packages were different from the version in Debian, like gcc, binutils or glibc -- they had not been upstreamed yet. sbuild ran through qemu-system-or1k, so the compilation of new packages could happen "natively" (running inside Qemu) rather than cross-compiling the packages, pulling _or1k.deb packages for dependencies from the repository that he had prepared, and _all.deb packages from snapshots.debian.org.

I started by trying to get the packages that I [co-]maintain in Debian compiled for this architecture, creating the corresponding _or1k.deb. For most of them, though, I needed many dependencies compiled before I could even compile my packages.

The GNU autotools / autoreconf Problem

Since very early, many of the packages failed to build with messages such as:

Invalid configuration 'or1k-linux-gnu': machine 'or1k' not recognized
configure: error: /bin/bash ../config.sub or1k-linux-gnu failed

This means that software packages based on GNU autotools and using configure scripts need recent versions of the files config.sub and config.guess that they ship in their root directory, to be able to detect the architecture and generate the code accordingly.

This is counter-intuitive, having into account that GNU autotools were designed to help with portability. Yet, in the case of creating new Debian ports, it meant that unless upstream had very recent versions of config.{guess,sub}, it would prevent the package to compile straight away in the new architectures -- even if invoking gcc without ado would have worked without problems in most cases for native compilation.

Of course this did not only affect or1k, and there was already the autoreconf effort underway as a way to update these files automatically when building Debian packages, pushed by people porting Debian to the new architectures added in 2013/2014 (mips64el, arm64, ppc64el), which encountered the same roadblock. This affected around a thousand source packages in unstable. A Royal Pain. Also, all of their reverse dependencies (packages that depended on those to be built) could not be compiled straight away.

The bugs were not Release Critical, though (none of these architectures were officially accepted at the time), so for people not concerned with the new ports there was no big incentive to get them fixed. This problem, which conceptually is easily solvable, prevented new ports to even attempt compile vast portions of the archive straight away (cleanly, without modifications to the package or to the host system), for weeks or months.

The GNU autotools / autoreconf Solution

We tackled this problem mainly in two ways.

First, more useful for Debian in general, was to do as other porters were doing and submit bug reports and patches to Debian packages requesting them to use autoreconf, and NMUing packages (uploading changes to the archive without the official maintainers' intervention). A few NMUs were made for packages which had bug reports with patches available for a while, that were in the critical path to get many other packages compiled, and that were orphan or had almost no maintainer activity.

The people working in the other new ports, and mainly Ubuntu people which helped with some of those ports and wanted to support them, had submitted a large amount of requests since late 2013, so there was no shortage of NMUs to be made. Some porters, not being Debian Developers, could not easily get the changes applied; so I also helped a bit the porters of other architectures, specially later on before the freeze of Jessie, to get as many packages compiled in those architectures as possible.

The second way was to create dpkg-buildpackage hooks that updated unconditionally config.{guess,sub} before attempting to build the package in the local build system. This local and temporary solution allowed us in the or1k port to get many _or1k.deb packages in the experimental repository, which in turn would allow many more packages to compile. After a few weeks, I set up many sbuilds in a multi-core machine attempting to build uninterruptedly packages that were not previously built and which had their dependencies available. Every now and then (typically several times per day in peak times) I pushed the resulting _or1k.deb files to the repository, so more packages would have the necessary dependencies ready to attempt to build.

Christian was doing something similar, and by April at peak times, among the two of us, we were compiling some days more than a hundred packages -- a huge amount of packages did not need any change other than up-to-date config.{guess,sub} files. At some point, late April, Christian set up wanna-build in a few hosts to do this more elegantly and smartly than my method, and more effectively as well.

Ugly Hacks, Bugs and Shortcomings in the Toolchain and Qemu

Some packages are extremely important because many other packages need them to compile (like cmake, Qt or GTK+), and they are themselves very complex and have dependency loops. They had deeper problems than the autoreconf issue and needed some seriously dirty hacking to get them built.

To try to get as many packages compiled as possible, we sometimes compiled these important packages with some functionality disabled, disabling some binary packages (e.g. Java bindings) or specially disabling documentation (using DEB_BUILD_OPTIONS=nodoc when possible, and more aggressively when needed by removing chunks of debian/rules). I tried to use the more aggressive methods in as few packages as possible, though, about a dozen in total. We also used DEB_BUILD_OPTIONS=nocheck for speeding up compilation and avoiding build failures -- many packages' tests failed due to qemu-system-or1k not supporting multi-threading, which we could do nothing about at the time, but otherwise the packages mostly passed tests fine.

Due to bugs and shortcomings in Qemu and the toolchain --like the compiler lacking atomics, missing functionality in glibc, Qemu entering in endless loops, or programs segfaulting (especially gettext, used by many packages and causing the packages failing to build)--, we had to resort to some very creative ways or time-consuming dull work to edit debian/rules, or to create wrappers of the real programs avoiding or forcing certain options (like gcc -O0, since -O2 made buggy binaries too often).

To avoid having a mix of cleanly compiled and hacked packages in the same repository, Christian set up a two-tiered repository system -- the clean one and the dirty one. In the dirty one we dumped all of the packages that we got built, no matter how. The packages in the clean one could use packages from the dirty one to build, but they themselves were compiled without any hackery. Of course this was not a completely airtight solution, since they could contain code injected at build time from the "dirty repository" (e.g. by static linking), and perhaps other quirks. We hoped to get rid of these problems later by rebuilding all packages against clean builds of all their dependencies.

In addition, Christian also spent significant amounts of time working inside the OpenRISC community, debugging problems, testing and recompiling special versions of the toolchain that we could use to advance in our compilation of the whole archive. There were other people in the OpenRISC community implementing the necessary bits in the toolchain, but I don't know the details.

Good Progress

Olof Kindgren wrote the OpenRISC health report April 2014 (actually posted in May), explaining the status at the time of projects in the broad OpenRISC community, and talking about the software side, Debian port included. Sadly, I think that there have been no more "health reports" since then. There was also a new post in Slashdot entitled OpenRISC Gains Atomic Operations and Multicore Support shortly thereafter.

In the side of the Debian port, from time to time new versions of packages entered unstable and we started to use those newer versions. Some of them had nice fixes, like the autoreconf updates, so they did not require local modifications. In other cases, the new versions failed to build when old ones had worked (e.g. because the newer versions added support and dependencies of new versions of gnutls, systemd or other packages not yet available for or1k), and we had to repeat or create more nasty hacks to get the packages built again.

But in general, progress was very good. There were about 10k arch-dependent packages in Debian at the time, and we got about half of them compiled by the beginning of May, counting clean and dirty. And, if I recall correctly, there were around the same number of arch=all (which can be installed in any architecture, after the package is built in one of them). Counting both, it meant that systems using or1k got about 15k packages available, or 75% of the whole Debian archive (at least "main", we excluded "contrib" and "non-free"). Not bad.

By the middle to end of May, we had about 6k arch-dependent packages compiled, and 4k to go. The count of packages eventually reached ~6.6k at its peak (I think that in June/July). Many had been built with hacks and not rebuilt cleanly yet, but everything was fine until the amount of packages built plateaued.

Plateauing

There were multiple reasons for that. One of them is that after having fixed the autoreconf issue locally in some packages, new versions were uploaded to Debian without fixing that problem (in many cases there was no bug report or patch yet, so it was understandable; in other cases the requests were ignored). The wanna-build for the clean repository set up by Christian rightly considered the package out-of-date and prepared to build the more recent version, that failed. Then, other packages entering the unstable archive and build-depending on newer versions of those could not be built ("BD-Uninstallable"), until we built the newer versions of the dependencies in the dirty repository with local hacks. Consequently, the count of cleanly built packages went back-and-forth, when not backwards.

More challenging was the fact that when creating a new port, language compilers which are written in that same language need to be built for that architecture first. Sometimes it is not the compiler, but compile-time or run-time support for modules of a language are not ported yet. Obviously, as with other dependencies, large amounts of packages written in those languages are bound to remain uncompiled for a long time. As Colin Watson explained in porting Haskell's GHC to arm64 and ppc64el, untangling some of the chicken-and-egg problems of language compilers for new ports is extremely challenging.

Perl and Python are pretty much a pre-requisite of the base Debian system, and Christian got them working early on. But for example in May, 247 packages depended on r-base-dev (GNU R) for building, and 736 on ghc, and we did not have these dependencies compiled. Just counting those two, 1k source packages of the remaining 4k to 5k to be compiled for the new architecture would have to wait for a long time. Then there was Java, Mono, etc...

Even more worrying problems were the pending issues with the toolchain, like atomics in glibc, or make check failing for some packages in the clean repository built with wanna-build. Christian continued to work on the toolchain and liasing with the rest of the OpenRISC community, I continued to request more changes to the Debian archive through a few requests to use autoreconf, and pushing a few more NMUs. Though many requests were attended, I soon got negative replies/reactions and backed off a bit. In the meantime, the porters of other new architectures at the time were mostly submitting requests to support them and not NMUing much either.

Upstreaming

Things continued more or less in the same state until the end of the summer.

The new ports needed as many packages built as possible before the evaluation of which official ports to accept (in early September, I think, the final decision around the time of the freeze). Porters of the other new architectures (and maintainers, and other helpful Debian Developers) were by then more active in pushing for changes, specially remaining autoreconf issues, many of which benefited or1k. As I said before, I also kept pushing NMUs now and then, specially during summer, for packages which were not of immediate benefit for our port but helped the others (e.g. ppc64el needed updates to ltmain.sh for libtool which were not necessary for or1k, in addition to config.{guess,sub}).

In parallel in the or1k camp, there were patches that needed changes to be sent upstream, like for example Python's NumPy, that I submitted in May to the Debian package and upstream, and was uploaded to Debian in September with a new upstream release. Similar paths were followed between May and September for packages such as jemalloc, ocaml, gstreamer0.10, libgc, mesa, X.org's cf module and cmake (patch created by Christian).

In April, Christian had reached the amazing milestone of tracking and getting all of the contributors of the port of GNU binutils to assign copyright to the Free Software Foundation (FSF), all of the work was refreshed and upstreamed. In July or August, he started to gather information about the contributors of the GCC port, which had started more than a decade ago.

After that, nothing much happened (from the outside) until the end of the year, when Christian sent a message about the status of upstreaming GCC to the OpenRISC community. There was only one remaining person to assign the copyright to the FSF, but it was a blocker. In addition, there was the need to find one or more maintainers to liaise with upstream, review the patches, fix the remaining failures in the test suite and keeping the port in good shape. A few months after that and from what I could gather, the status remains the same.

Current Status, and The Future?

In terms of the Debian port, there have not been huge visible changes since the end of the summer, and not only because of the Jessie freeze.

It seems that for this effort to keep going forward and be sustainable, sorting out the issues with GCC and Glibc is essential. That means having the toolchain completely pushed upstream and in good shape, and particularly completing the copyright assignment. Debian will not accept private forks of those essential packages without a very good reason even in unofficially supported ports; and from the point of view of porters, working in the remaining not-yet-built packages with continuing problems in the toolchain is very frustrating and time-consuming.

Other than that, there is already a significant amount of software available that could run in an or1k system, so I think that overall the project has achieved a significant amount of success. Granted, KDE and LibreOffice are not available yet, neither are the tools depending on Haskell or Java. But a lot of software is available (including things high in the stack, like XFCE), and in many aspects it should provide a much more functional system that the one available in Linux (or other free software) systems in the late 1990s. If the blocking issues are sorted out in the near future, the effort needed to get a very functional port, on par with the unofficial Debian ports, should not be that big.

In my opinion, and looking at the big picture, not bad at all for an architecture whose hardware implementation is not easy to come by, and in which the port was created almost solely with simulators. That it has been possible to get this far with such meagre resources, it's an amazing feat of Free Software and Debian in particular.

As for the future, time will tell, as usual. I will try to keep you posted if there is any significant change in the future.

Categories: Elsewhere

Jim Birch: Drupal 7: Integrating Disqus Comments

Planet Drupal - Mon, 20/04/2015 - 22:04

When choosing whether or not to have commenting on your site, one of the factors that I always discuss with clients is whether or not the site has a need for logged in users.  If the site has users who log in, Drupal core's commenting system is great and can be tweaked to fit pretty much every need.

If users don't need to have users log in, there are a few solutions for "outsourcing" your comment system

  • Using an external commenting system has benefits like:
  • better security
  • better/cheaper spam management
  • more full page caching opportunities.

I chose to go with Disqus for this site.  Disqus' social login includes all of the necessary platforms.  If not, people can sign up with an account in a minute.  The design is light, and these days very common.  Their spam detection, and moderation/flagging rules are simple and easy to maintain.

Read more

Categories: Elsewhere

Drupal Watchdog: VIDEO: DrupalCon Amsterdam Interview: Fabian Franz

Planet Drupal - Mon, 20/04/2015 - 20:18

Cruising down the Amstel River on a blissfully warm Amsterdam evening – drink in hand – the fabulous FABIAN FRANZ (Senior Performance Engineer & Technical Lead, Tag1 Consulting) opens up about DrupalCons, chance encounters, caching, salsa dancing, love, and what to listen to when you’re programming.

Tags:  Video DrupalCon DrupalCon Amsterdam Video: 
Categories: Elsewhere

Drupal for Government: Path aliased restful services with restws & restws alias

Planet Drupal - Mon, 20/04/2015 - 19:23

Almost a year ago we started putting together a site that needed to integrate with our main librarie's search engine.  We used Drupal's restful services to expose our content, but ran in to a problem with getting aliased paths to link up correctly.  What this meant was that while http://www.bioconnector.virginia.edu/content/introduction-allen-mouse-brain-atlas-online-tutorial-suite worked fine, http://www.bioconnector.virginia.edu/content/introduction-allen-mouse-br... didn't... this became really problematic when we were trying to create linked data, and traversing was just obnoxious... https://www.bioconnector.virginia.edu/node/36.json just doesn't roll off the digital tongue... as a workaround we used views to do some wonkiness.... it worked, but certainly was not "the drupal way."

Categories: Elsewhere

SitePoint PHP Drupal: Drupal goes Social: Building a “Liking” Module in Drupal

Planet Drupal - Mon, 20/04/2015 - 18:00

In this article, we are going to look at how we can create a Drupal module which will allow your users to like your posts. The implementation will use jQuery to make AJAX calls and save this data asynchronously.

Creating your Drupal like module

Let’s start by creating the new Drupal module. To do that we should first create a folder called likepost in the sites\all\modules\custom directory of your Drupal installation as shown below:

Inside this folder, you should create a file called likepost.info with the following contents:

name = likepost description = This module allows the user to like posts in Drupal. core = 7.x

This file is responsible for providing metadata about your module. This allows Drupal to detect and load its contents.

Continue reading %Drupal goes Social: Building a “Liking” Module in Drupal%

Categories: Elsewhere

Drupal Association News: A great big thank you to our Members

Planet Drupal - Mon, 20/04/2015 - 17:50

I want to give a big thank you to all of our new and renewing members who gave funds to continue the work of the Drupal Association in the first quarter of this year. We couldn't do much without your support. Shout outs to all of you!

Membership Makes a Difference

We had several recap blog posts a few weeks ago, but just as a reminder, your membership is incredibly important not only to us, but to the project too! Dues from memberships go to fund intiaitves like our community cultivation grants, which help people around the world build their local Drupal communities and improve the project. For more information on how membership makes a difference, check out this infographic or see what changes are coming in 2015.

Thank You, Members!!

There are 845 fantastic members on our list of first quarter donors. You can find the list here. Let's give them a big thank you all together!

 

Personal blog tags: Membership
Categories: Elsewhere

Jonathan Wiltshire: Jessie Countdown: 5

Planet Debian - Mon, 20/04/2015 - 16:49

Five contributors became uploading Debian Developers so far in 2015 (source: https://nm.debian.org/public/people).

Jessie Countdown: 5 is a post from: jwiltshire.org.uk | Flattr

Categories: Elsewhere

Drupal Association News: Blink Reaction and Propeople Are Joining Forces

Planet Drupal - Mon, 20/04/2015 - 16:27

The following blog was written by Drupal Association Supporting Partner and DrupalCon Los Angeles Diamond Sponsor Blink Reaction and Propeople.

DrupalCons are a very important time of the year for the Drupal community. It is a time for us to come together and continue our collaboration that we share throughout the year in a virtual space and establish goals and plans to move forward in a way that is in the community’s best interest. It is also a time to take stock of our accomplishments, and who we are as a community. One of our favorite moments in DrupalCons is the group picture: it’s always amazing to see how the community stands together and continues to grow.

This year’s DrupalCon in Los Angeles is especially important to us because this is where we will unveil the name of the new Drupal agency that consists of the companies formerly known as Blink Reaction and Propeople. We have come together to create a new digital agency, the largest one in the world with a focus on Drupal, and we are very excited about what this means.

Our combined agency is part of the Intellecta Group (listed on the NASDAQ OMX) , and consists of 400+ employees in 19 offices in 11 countries on 4 continents. This is an amazing reach for an organization that is so passionate about Drupal! We’re excited for this unique opportunity to support the Drupal project and the community in ways that would have been impossible prior to the merger.

For example, we’re eager to begin promoting Drupal as a solution for the biggest enterprises on a global scale. Locally, we can influence awareness and excitement in our 19 local communities, helping the next generation find opportunity and excitement in Drupal.

We now have the ability to affect change in a multitude of cultures, in the many diverse communities where each of our offices are located. Where there aren’t yet camps, we can lead their initiation. Where there are Cons, we can help to inspire the next generation of Drupal leaders. We are committed to building up the next generation of talent via our orchestrated public and private training efforts, and look forward to beginning that work at DrupalCon Los Angeles.

So please, stop by booth 300 to say hello and learn more about the new company, and our future within the community. We look forward to seeing all of our friends in the Drupal community, old and new, and are even more excited to discuss how we’ll work with the community for many years to come.

About us.

Blink Reaction and Propeople are joining forces to create a new digital agency built on technology, driven by data, and focused on user experience. The two companies have delivered state-of-the-art Drupal solutions for a variety of the open-source platform’s largest customers. The agencies’ collective portfolio includes brands such as Pfizer, NBC, Stanford University, the City of Copenhagen and General Electric.

Blink Reaction and Propeople are a part of the Intelleca Group. The companies in the group are Blink Reaction LLC, Bysted AB, Bysted A/S, Hilanders AB, Intellecta Corporate AB, ISBIT GAMES AB, Propeople Group ApS, Rewir AB, River Cresco AB, Unreel AB and Wow Events AB. Intellecta AB is noted on NASDAQ OMX Stockholm and employs around 550 people in Sweden, Denmark, Austria, Germany, the Netherlands, the United Kingdom, Bulgaria, Moldavia, Ukraine, Brazil, USA, Vietnam and China.

Categories: Elsewhere

TimOnWeb.com: Happy birthday to me and Devel form debug module to you all

Planet Drupal - Mon, 20/04/2015 - 14:36
I’m turning 32 today. People love birthdays, to me it’s just another line number in a messed stack trace output (philosophy mode enabled).   Two years ago I released a drupal module called Get form id (deprecated from now on) that does one small task - it tells you any form's id ...

Read now

Categories: Elsewhere

Pages

Subscribe to jfhovinne aggregator - Elsewhere