As the potential release date for Drupal 8 slowly creeps up we've launched our first Drupal 8 site and are planning to kick off several more in the next few months. Through this process we've learned a lot about the reality of what it means to launch a site on beta software and what that means for your next project.
When do you need it done by?
Drupal 8 will be here soon, but your project may not need to be. If you are just starting to think about the strategy for your project now, and aren't planning on going into heavy development until later this summer, you should definitely be considering Drupal 8 as an option.
Author and software consultant LORNA JANE MITCHELL, fresh from her DrupalCon book-signing (PHP Web Services) and talk, is a frequent visitor to Amsterdam – she lives in Leeds, an hour flight away, and loves the city.Tags: Video DrupalCon DrupalCon Amsterdam Video:
Happy Earth Day! Since the last Drupal Core Update, the Drupal Developer Days event brought lots of exciting progress: we (briefly) reduced the number of critical issues to 35, and a week-long performance sprint made Drupal 8 2—20 times faster! Also, Gwendolyn Anello at DrupalEasy announced that DrupalEasy is partnering with Stetson University to offer Drupal courses!
Some other highlights of the month were:
- Pratomo Ardianto at X-Team developed an in-depth tutorial on theming in Drupal 8, Christopher Hall continued his series on theming in Drupal 8 with a breakdown of responsive breakpoints, and Wim Leers posted a tool to visualize Drupal 8's render tree.
- On the front-end, Phase 1 of the Consensus Banana initiative (moving CSS clases from preprocess functions to Twig templates) was completed, the active class was changed to is-active, and we fixed a bug where CSS in libraries would override CSS in themes.
- To improve developer experience, the user_name handler is now handled by the Field API, and the Entity display class no longer depends on the field module.
- Joe Shindelar shared some tips & resources to get started with Drupal 8, Jimmy Berry unveiled a Drupal testbot command-line tool so you can run the full Drupal test suite on your local machine, Mark Ferree at Chapter Three shared his presentation on Drupal 8 module development, and Ray Saltini at Blink Reaction wrote a brief introduction to the Drupal Console project.
- Internal page caching was moved into a module so it can be easily turned off for development, core now uses the APC Classloader by default if it's available, and showing comments no longer requires generating temporary users for anonymous users.
- Mike Potter at Phase2 announced the Features module for Drupal 8.
- To clean up the API, update_project_storage(), element_info(), element_child() and element_children() (and element_info_property() was deprecated), drupal_form_submit() were all removed, and entity_load('image_style') and entity_load_multiple('image_style') were replaced with static method calls.
- Lauri Eskola at Druid wrote about his team's experiences launching a site on Drupal 8, John Locke at Freelock explained when Freelock recommends using Drupal 8 now, Kristof Van Tomme at Pronovix explained the Drupal 8 Accelerate funding program and how Pronovix is encouraging their customers to help, Chris Smith at OPIN Software Inc. interviewed Chris Luckhardt about Drupal 8, it's release date, and it's impact on the community, and Steve Burge at OSTraining announced a kickstarter to create free Drupal 8 training videos.
- Also, PHP7 EngineExceptions can now be caught in the general error handler and we can now perform front-end testing with the Mink driver.
See Help get Drupal 8 released! for updated information on the current state of the release and more information on how you can help.
We're also looking for more contributors to help compile these posts. Contact mparker17 if you'd like to help!Drupal 8 In Real Life
- Cornell Drupal Camp starts tomorrow, April 23–24 in Ithaca, NY, USA with sessions on Drupal 8 for site builders, unit testing with PHPUnit, an introduction to Drupal 8 for end-users, and migrating Drupal 6 sites to Drupal 8.
- On Friday, April 24, there is a code sprint to port Views GeoJSON to Drupal 8 in Durham, NC, USA.
- The Village of Oak Park Drupal User Group is hosting a Drupal 8 core sprint on May 3 in Oak Park, IL, USA.
- DrupalJam will be taking place on April 30 in Utrecht, Netherlands with sessions on headless Drupal 8, responsive images, and continuous integration.
- DrupalCon Los Angeles in Los Angeles, CA, USA is only 19 days away (May 11–15, although sprints start on the 9th)! Regular ticket pricing ended last week, but late tickets are still available.
- DrupalCamp Spain will be May 22–24 in Jerez de la Frontera, Cádiz, Spain! There are still tickets left, and members of the Asociación española de Drupal get a 10% discount! There are sessions on Drupal 8 forms, theming, site-building, and building multilingual sites in D8.
- The Drupal North Regional Summit in Toronto, Ontario, Canada will be June 25–28. Registration is free, but you should register now so the event organizers can plan for you to be there. The event’s theme is Drupal 8!
- DrupalCamp Ottawa announced their new date and location: Friday, July 24 at University of Ottawa, Ottawa, Ontario, Canada.
Do you follow Drupal Planet with devotion, or keep a close eye on the Drupal event calendar, or git pull origin 8.0.x every morning without fail before your coffee? We're looking for more contributors to help compile these posts. You could either take a few hours once every six weeks or so to put together a whole post, or help with one section more regularly. If you'd like to volunteer for helping to draft these posts, please follow the steps here!
This week we’ll talk about Block Class, a very cute module to insert custom classes for every block we create.
Working on a project, in these days, , I had the need to have different classes for every block available on the layout.
Searching on a drupal.org, I...
When you save (precisely for an update) an entity Drupal does a massive job:More articles...
- Git shell on Windows reports “sh.exe has stopped working (APPCRASH)”
- Decent PDF generation in Drupal
- Benchmarking Drupal 7 on PHP 7-dev
- Installing Drupal on Windows and SQL Server
- Hiding the fact that your site runs Drupal
- Calling .Net Framework and .Net Assemblies from PHP
- When PHP crashes: how to collect meaningful information and what to do with it
- Setting up Code Syntax Higlighting with Drupal
- Distinct options in a views exposed filter: The Views Selective Filters Module
- How to use NetPhp
The next beta release for Drupal 8 will be beta 10! (Read more about beta releases.) The beta is scheduled for Wednesday, April 29, 2015.
To ensure a reliable release window for the beta, there will be a Drupal 8 commit freeze from 00:00 to 23:30 UTC on April 29.
Drupal 7 is by far my favorite CMS to date and Zurb Foundation is currently my go to theme. Although, I wouldn't really call Foundation a theme, but more of a responsive front-end framework that you can use to build your themes from.
Here is how to setup a fresh copy of Drupal 7 and configure a Foundation sub-theme quickly to get your project up and running:Install Drupal using Drush
Although you can do this all the old fashion way, I prefer to use drush for this. Here are the drush commands to make this all happen:drush dl drupal --drupal...
Why would you want to import tweets into a Drupal site? For one, I want to own the content I create. Unlike other social media sites, Twitter allows great access to the content I create on their platform. Through their API, I can access all of my Tweets and Mentions for archiving and displaying on my own site.
I have had a couple of instances with clients where the archiving of Tweets came in handy. One when a Twitter account was hacked, and one when someone said something that wasn't supposed to be said. At the very least, it is an offsite backup of your content at Twitter, and that is never a bad thing.
I have used this module for building aggregated content. If you have a site that is surrounded by topics, you can build lists of Twitter accounts or #hashtags. Imagine if you were running a Drupal Camp, you could build a feed of all of the speakers and sponsors, or a feed of the camp's #hashtag, or both!
You could also build a Twitter feed of only your community. This module allows each and every Drupal user account to associate with one or many twitter accounts. The users just need to authorize themselves. The possibilities seem endless.
OK, so on with the good stuff. Importing Tweets into your Drupal 7 site is very quick and easy using the Drupal Twitter Module.
The Salesforce Suite has been around since Drupal 5 and it’s evolved quite a bit in order to keep up with the ever-changing Salesforce and Drupal landscapes. Several years ago, we found ourselves relying heavily upon the Salesforce Suite for our Salesforce-Drupal integrations. But there came a point where we realized the module could no longer keep up with our needs. So we, in collaboration with the maintainers of the module at the time, set out to rewrite the suite for Drupal 7.
We completely rewrote the module, leveraging Drupal's entity architecture, Salesforce's REST API, and OAUTH for authentication. We also added much-needed features such as a completely new user experience, the ability to synchronize any Drupal and Salesforce objects, and a number of performance enhancements. This was a heck of an undertaking, and there were dozens of other improvements we made to the suite that you can read about in this blog post. We’ve maintained this module ever since and have endeavored to add new features and enhancements as they become necessary. We realized this winter that it was time for yet another batch of improvements as the complexity and scale of our integrations has grown.
In addition to over 150 performance enhancements and bug fixes, this release features an all new Drupal entity mapping system which shows a log of all synchronization activity, including any errors. You can now see a log entry for every attempted data synchronization. If there’s a problem, the log will tell you where it is and why it’s an issue. There’s now a whole interface designed to help you pinpoint where these issues are so you can solve them quickly.
Administrators can even manually create or edit a connection between Drupal and Salesforce objects. Before this update, the only way to connect two objects was to create the mapping and then wait for an object to be updated or created in either Drupal or Salesforce. Now you can just enter the Salesforce ID and you’re all set.
Take the following example to understand why these improvements are so critical. Say that your constituents are volunteering through your Drupal site using the Registration module. The contacts are created or updated in RedHen and then synced to Salesforce. For some reason, you can see the new volunteers in Drupal, but they are not showing in Salesforce. It used to be that the only clue to a problem was buried in the error log. Now, all you have to do is go to the RedHen contact record, and then click “Salesforce activity,” and you’ll see a record of the attempted sync and an explanation of why it failed. Furthermore, you can manually connect the contact to Salesforce by entering the Salesforce ID.
Finally, you can now delete existing mappings, or map to an entirely different content type. The bottom line is that module users have more control of, and insights into, how their data syncs to Salesforce. You can download version 7.x-3.1 from Drupal.org and experience these improvements for yourself.
We’ve been hard at work polishing several other of our modules and tools, like the RedHen suite and Entity Registration, which also saw new releases. We’ll tell you more about what you can expect from those new versions in our upcoming blogs.
Drupal people are good people. They are the recipe’s secret ingredient, and conferences are the oven. Mix and bake.
March 2007, Sunnyvale, California, the Yahoo campus and a Sheraton.
OSCMS, my second Drupal event and my first conference.
Dries gave the State of Drupal keynote, with a survey of developers and a vision for future work. His hair was still a bit punk and he was a bit younger. Dries has the best slides. Where does he find those amazing slides?
I like Dries a lot.
I wish I had created Drupal.
In 1999, I created my own CMS named Frameworks. I remember showing my friend Norm an "edit" link for changing text and how cool that was. Back then, I didn't even know about Open Source – despite being a fanboy of Richard Stallman and the FSF – and I was still using a mix of C/C++, Perl, and IIS. (If you wanted to eat in the 1990's, Windows was an occupational hazard.)
But I didn't create Drupal. I didn't have the hair, I've never had those amazing slides, and I will never be able to present that well.
But mainly, I didn't have the vision.
Rasmus Lerdorf gave a talk on the history of PHP. I was good with computer languages. I had written a compiler in college, developed my first interpretive language in the late 1980's and another one in the early 1990's. I wondered why I hadn't created PHP. At the time, most web apps were written in Perl. I loved Perl. It was so concise. It was much better than AWK, which in itself was also pretty awesome.
(Note: AWK does not stand for awkward. It’s named after Aho, Weinberger, and Kernighan – of K&R fame).
So I didn't see the need for PHP, we had Perl!
Again, no vision.
Meanwhile: 2007, Sunnyvale, California, OSCMS.
In the last part of our blog series, we dealt with the specifications of a project. Today we discuss the issue of responsibilities and ongoing communication. Ensure that the infrastructure exists to support project communication and that everyone has access to it – and uses it! Keep up the communication in the project and make sure that there’s a central communication tool, especially if you’re working in distributed teams. Define which communications are task-related and should be persisted on the task. This is mostly about teamwork: there’s nothing more damaging to a project than the stagnation of communication after a certain length of the project time, because then everyone in the project team makes different assumptions, leading into different directions. Recurring meetings, such as dailies and weeklies, help to build a culture of ongoing communication.
In addition to the above, also note the following as important points to take into account:1) Provide a contact person Project work is teamwork. Both suppliers and customers need to meet their obligations in the project. One of the key things is the obligation to cooperate in making decisions. In particular, this includes the acceptance of partial results or the overall project. To make such decisions, a person needs the skills to take this decision and the authority to do so. If there is no clear contact person who’s responsible for all decision areas – financially, professionally and organizationally – the process may stagnate. This can delay the whole project and applies to both sides, vendors and customers alike. 2) What might happen if this contact person doesn’t exist (from the perspective of customer)? Everyone has something to say. The change requests will be considered by all project participants, but remember that only one of them gets the bill at the end of the day. Naturally, the customer may wonder about the amount, but no one ever wants to be responsible. Maybe the wishes were contradictory and mutually canceled each other out in the progression of things, but now, of course, one individual is surprised that – wouldn’t you know – exactly HIS wishes were swallowed up by the ominous “project monster”! This role is usually awarded to the responsible project manager, who in turn must prove that he’s only done what was required. 3) And what can happen from the perspective of the provider? During the project, customer requirements are discussed with various people. If these requirement changes don’t end up in a pot and if reviews are always subjective, it can lead to unwanted side effects. Example: Contact A builds a new forum feature. Person B says: "We’ll delete the forum" while Person C is under the belief that everything is ready and, hence, plans to start the acceptance process. 4) Emphasize all duties right from the beginning As a provider you and your customers should be clear on what obligations each respective party has. Ensure that the project is clear through transparent project management. Don’t drop tasks and always record decisions so that, later, you can see how they were made and how they’ve influenced the overall project. 5) Communicate problems early Of course, it can also come to additional expenses or delays in the project. Make sure that you identify these problems early and communicate them throughout the responsible team. If problems are addressed and resolved constructively in your project culture, as opposed to having long debates about who’s to blame for the problem, even large hurdles can be overcome together. 6) It’s no shame to question things, so ask often! If there are any questions in the project – and there will be – ask them! Anyone who’s too afraid or lazy to ask when there are ambiguities does harm to the project. Assumptions and certainties contradict every form of transparent communication. So provide the team with a centralized and transparent communication (a chat isn’t sufficient as it doesn’t persit the communication). Also, ensure that information and agreements actually make it to the people who need to have access to it. In the next part of our series, we’ll focus on unrealistic budgets and deadlines.
Here's an example of an assumption; The sun will rise tomorrow. An assumption is something that is accepted as true or as certain to happen, without proof. This kind of thinking, while convenient, is prone to concealing facts, and troublesome when debugging code. This article defines what an assumption is, and provides some techniques for helping to eliminate them during debugging.
I recently worked with Blue Dot Lab to build a rapidly redeployable, interactive booking system for the University-National Oceanographic Laboratory System (UNOLS) based in the US. UNOLS is a consortium of over 60 academic institutions involved in oceanographic research, and individual institutions can require their own system for organizing research expeditions and booking the necessary equipment and boats. Such new systems need to be ready to go with the minimum of fuss and at reasonably short notice.
So, stemming, what is stemming? Generally speaking, stemming is finding the basic form of a word. For example, in the sentence "he walks" the verb is inflicted by adding a "s" to it. In this case the stem is "walk" which, in English, also happens to be the infinitive of the verb.
We will first present a few examples of stemming in natural language, and since Dutch is my native language I will concentrate on Dutch examples.
After that we will show the results of a number of stemmers present in Solr and give a few pointers about what to do if the results of these stemmers are not good enough for your application.Plurals
On of the things you absolutely want your user to be able to, is to find results which contain the single form of a word while searching for the plural and vice versa, e.g.: finding "cat" when looking for "cats" and finding "cats" when searching for "cat".
Although in English there are well-defined rules for creating the plural form (suffix with "s", "es" or change "y" to "ie" and suffix "s"), there also are a number of irregular nouns ("woman" -> "women") and nouns for which the single and plural form are the same ("sheep", "fish").
In Dutch more or less the same situation exists, be it with different suffixes ("s", "'s", "en") and, of course, other exceptions.
Furthermore, in Dutch if the stem ends on a consonant directly preceded by a vowel, this consonant is doubled (otherwise, in the plural form, the vowel would sound like a long vowel instead of a short vowel), e.g.:kat (cat) -> katten (cats)
But, to this rule there also are exceptions, likemonnik (monk) -> monniken (monks)
in contrasts with:krik (car jack)-> krikken (car jacks) Verb conjugation
Conjugation of verbs in Dutch is, to be blunt, a bit of a mess.
In Dutch, for forming the past tense of a verb, two types of conjugation co-exist: the (pre-) medieval system, now called strong and the more recent system, called weak. When I say the systems co-exist, one should note that most (native) Dutch speakers are not aware of the fact that the strong-system is a system at all: the they consider the strong verbs to be exceptions, best learned by heart.
An example of a strong verb is "lopen" (to walk):hij loopt (he walks) -> hij liep (he walked)
While an example of a weak verb is "rennen" (to run):hij rent (he runs) -> hij rende (he ran)
These examples make clear that determening which verb is strong and which verb weak is indeed a case of learning by heart.
Furthermore the change from strong to weak verbs is a ongoing process. One exampe of a verb which is currently in transition from strong to weak is the verb "graven" (to dig) of which both the form "hij groef" (he digged) and "hij graafde" can also be found, although most language-purist would consider the last form as "wrong".
NB: if you are interested in this kind of things, a classic book about language changes is Jean Aitchisons Language change: progress or decay (1981, yes, it is a bit pre-internet...)Diminutives
In a number of languages, like Dutch, German, Polish and many more, diminutives are created by inflicting the word. In English you form the diminutive by adding an adjective like 'little', but in Dutch the general rule to form a diminutive is to add the suffix "je" to the word, e.g.:huis (house) -> huisje (little house)
This is the general rule, because the suffix can als be inflicted, like inbloem (flower) -> bloempje (little flower)
And in some words te ending consonant is changed to keep the word pronouncable:hemd (shirt) -> hempje (little shirt)
It is however also possible in Dutch to use an adjective like 'klein' (little) and even to combine both:kleine bloem (little flower) -> klein bloempje (small little flower)
A last peculiarity I should mention is that in Dutch (but also in many other languages) there are words which only have a diminutive form, like 'meisje' (girl).Homographs and homonyms
For some words it is not possible to find the correct stemming without knowing the semantics or context, e.g. kantelen which if pronounced like kantélen means "battlements" but when pronounced kántelen means "tip over". Or zij kust ("she kisses") versus kust like in de Noordzeekust ("the North sea coast").Why bother?
So maybe by now you are asking yourself: why bother? Well, you should be bothered because stemming will make it easier for the visitors of your site to find what they are looking for.
For example, you can almost be sure that when a visitor is interested in articles about houses (in Dutch 'huizen'), he will also be interested in articles which mention a house ('huis').
So when using the search term 'huizen' it would be nice if results which contain 'huis' would automatically be shown.
Of course searching a verb is much less common, and the chance that a visitor will use the dimunitive is also not very great, but still it happens and if it takes only a minimal effort to make sure the visitor finds what he is searching for, then why not?Solr
Starting form Solr version 3.1, for English (and a number of other languages) there is a standard filter "EnglishMinimalStemFilterFactory" which has the ability to stem English words. For Dutch however, such a simple filter factory is not available.
There are however a number of default languages that can be used with the SnowballPorterFilterFactory and in the default schema included in Solr 5 a number of such fields are predefined.Solr 5 default schema
In Solr 5 the default schema defines a list of language specific fieldtypes. For Dutch the fieldtype 'text_nl' is defined as follows:<dynamicField name="*_txt_nl" type="text_nl" indexed="true" stored="true"/> <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" /> <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/> <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/> </analyzer> </fieldType>
So in short, in the SnowballPorterFilterFactory the language is set to Dutch.
There is however a alternative stemming algorithm avilable, the Kraaij-Pohlmann algorithm, see Porter’s stemming algorithm for Dutch, known in Solr as Kp
To compare both algorithms, we define a new Dutch fieldtype as follows:<dynamicField name="*_txt_nlkp" type="text_nlkp" indexed="true" stored="true"/> <fieldType name="text_nlkp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" /> <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/> <filter class="solr.SnowballPorterFilterFactory" language="Kp"/> </analyzer> </fieldType>
To complete our analysis we will also use the default English language field, defined as:<dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true"/> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> Comparison
In the data shown next we compare the three above defined fields with the correct values.Fieldtype text_en Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output katten monniken meisj hempj krikken huizen huisj bloempj loopt Input liep lopen rent rende rennen kust kussen kantelen Output liep lopen rent rend rennen kust kussen kantel Fieldtype text_nl (language = dutch) Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output kat monnik meisj hempj krik huiz huisj bloempj loopt Input liep lopen rent rende rennen kust kussen kantelen Output liep lop rent rend renn kust kuss kantel Fieldtype text_nlkp (language = kp) Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output kat monnik meis hem krik huis huis bloem loop Input liep lopen rent rende rennen kust kussen kantelen Output liep loop rent rend ren kust kus kantel Correct values Input katten monniken meisje hempje krikken huizen huisje bloempje loopt Output kat monnik meisje hemd krik huis huis bloem loop Input liep lopen rent rende rennen kust kussen kantelen Output loop loop ren ren ren kus (verb)
kust (noun) kus kantel (verb)
The most noticable conclusion of above comparison is that the output of the text_nl-field does not differ much from the text_en-field.
It seems that the 'Dutch'-language implementation of the SnowballPorterFilter has no way of stemmming dimunitives, results like "huisj" and "bloempj" are just plain wrong, while the Kraaij-Pohlmann correctly returns "huis" en "bloem".
The same holds for the plural "huizen" which is correclty stemt by Kraaij-Pohlmann to "huis".
The dimunitive "meisje" is stemt by Kraaij-Pohlmann to "meis" which in some dialects of Dutch, like the dialect spoken in De Zaanstreek, is actually correct. There is however a way to correct this, see the section about KeywordMarkerFilterFactory under "a bit disappointed?".
And "hempje" is wrongly stemt to "hem", which seems a too general application of the rule which correctly stems "bloempje" to "bloem"
None of the algorithms knows how to handle homographs like "kust" and "kantelen" but this was to expected.A bit disappointed?
Well, maybe your expectations were a bit high then... Natural language processing is notoriously hard and, for that part that requires background knowledge, as good as impossible when working on single words or short phrases.
But in general the Kraaij-Pohlmann algorithm does a rather good job stemming Dutch words. Sometimes however, like with the word "meisje" it is a bit over-enthusiastic.
But there are a number of ways to improve stemming if, for some reason, the results of Kraaij-Pohlmann algorithms are not good enough.KeywordMarkerFilterFactory
The KeywordMarkerFilter makes it possible to exclude words from a (UTF-8) text file from stemming. A word like "meisje" would be a good candidate for this. To use it, add a filter to your fieldtype like this:<filter class="solr.KeywordMarkerFilterFactory" protected="notStemmed.txt" />
The file "notStemmed.txt" should be in the same directory as the schema.xml.StemmerOverrideFilterFactory
The StemmerOverrideFilterFactory is a variation on the KeywordMarkerFilterFactory filter, but instead only saying "do not stem these words" you must provide a file which defines the stemming for given words. To use it, add a filter to the field type like this:<filter class="solr.StemmerOverrideFilterFactory" dictionary="dictionary.txt" ignoreCase="true"/>
and make sure the file "dictionary.txt" is present in the conf-directory. In this file (which, like all others has to be encoded UTF-8) you add 1 ine per word, each lne consisting of the word an the stemmed word seperated by 1 tab, like this:hempje hemd
Both KeywordMarkerFilterFactory and StemmerOverrideFilterFactory should be used as addition to the default stemming,HunspellStemFilterFactory
Hunspell is the open source spellchecker used in a number of open source projects like LibreOffice, Mozilla Thunderbird etc.
It is possible to use Hunspell if it supports your language. To do so add a filter to the fieltype like this:<filter class="solr.HunspellStemFilterFactory" dictionary="nl_NL.dic" affix="nl_NL.aff" ignoreCase="true" />
And make sure the files "nl_NL.dic" and "nl_NL.aff" are present in the conf-directory.Creating your own stemming algorithm
Of course if you are really ambitious you can start from scratch and write your own Snowball implementation, from the Snwoball website:
Snowball is a language in which stemming algorithms can be easily represented. The Snowball compiler translates a Snowball script (a .sbl file) into either a thread-safe ANSI C program or a Java program. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions). The language has a full manual, and the various stemming scripts act as example programs.
But be warned: no natural language or natural language phenomenon is easy to fit in an algorithm and you have to be sure to have all quirks and exceptions absolutely clear before you start.Links and literature
When choosing whether or not to have commenting on your site, one of the factors that I always discuss with clients is whether or not the site has a need for logged in users. If the site has users who log in, Drupal core's commenting system is great and can be tweaked to fit pretty much every need.
If users don't need to have users log in, there are a few solutions for "outsourcing" your comment system
- Using an external commenting system has benefits like:
- better security
- better/cheaper spam management
- more full page caching opportunities.
- Some popular commenting system providers are:
- Disqus (Drupal Module)
- Facebook comments (Drupal Module)
- Livefyre (Drupal Module)
I chose to go with Disqus for this site. Disqus' social login includes all of the necessary platforms. If not, people can sign up with an account in a minute. The design is light, and these days very common. Their spam detection, and moderation/flagging rules are simple and easy to maintain.
Cruising down the Amstel River on a blissfully warm Amsterdam evening – drink in hand – the fabulous FABIAN FRANZ (Senior Performance Engineer & Technical Lead, Tag1 Consulting) opens up about DrupalCons, chance encounters, caching, salsa dancing, love, and what to listen to when you’re programming.Tags: Video DrupalCon DrupalCon Amsterdam Video:
Almost a year ago we started putting together a site that needed to integrate with our main librarie's search engine. We used Drupal's restful services to expose our content, but ran in to a problem with getting aliased paths to link up correctly. What this meant was that while http://www.bioconnector.virginia.edu/content/introduction-allen-mouse-brain-atlas-online-tutorial-suite worked fine, http://www.bioconnector.virginia.edu/content/introduction-allen-mouse-br... didn't... this became really problematic when we were trying to create linked data, and traversing was just obnoxious... https://www.bioconnector.virginia.edu/node/36.json just doesn't roll off the digital tongue... as a workaround we used views to do some wonkiness.... it worked, but certainly was not "the drupal way."